Scaling Study
Last updated on 2025-07-02 | Edit this page
Overview
Questions
- How can I decide the amount of resources I should request for my job?
- How do I know how my application behaves at different scales?
Objectives
After completing this episode, participants should be able to …
- Perform a simple scaling study for a given application.
- Identify good working points for the job configuration.
What do we look at?
- Amdahl’s vs. Gustavsons’s law / strong and weak scaling
- Walltime, Speedup, efficiency
Discussion: What dimensions can we look at?
- CPUs
- Nodes
- Workload/problem size
- Define example payload
- Long enough to be significant
- Short enough to be feasible for a quick study
- Identify dimension for scaling study, e.g.
- number of processes (on a single node)
- number of processes (across nodes)
- number of nodes involved (network-communication boundary)
- size of workload
- Decide on number of processes across node, fixed workload size
- Choose limits (e.g. 1, 2, 4, … cores), within reasonable size for given Cluster
- Beyond nodes? Set to one node?
Parameter Scan
- Take measurements
- Use
time
and repeating measurements (something like 3 or 10) - Vary scaling parameter
- Use
Exercise: Run the Example with different -n
- 1, 2, 4, 8, 16, 32, … cores and same workload
- Take
time
measurements (ideally multiple and with--exclusive
)
Analyzing results
Exercise: Plot the scaling
- Plot it against
time
- Calculate speedup with respect to baseline with 1 core
- What’s a good working point? How
- Overhead
- Efficiency: not wasting cores if adding them doesn’t do much
Summary
What’s a good working point for our example (at a given workload)?
Key Points
- Jobs behave differently with varying resources and workloads
- Scaling study is necessary to proof a certain behavior of the application
- Good working points defined by sections where more cores still provide sufficient speedup, but no costs due to overhead etc. occurs