Scaling Study

Last updated on 2025-09-24 | Edit this page

Overview

Questions

  • How to decide the amount of resources for a job?
  • How does my application behave at different scales?

Objectives

After completing this episode, participants should be able to …

  • Perform a simple scaling study for a given application.
  • Identify good working points for the job configuration.

What do we look at?


  • Amdahl’s vs. Gustavsons’s law / strong and weak scaling
  • Walltime, Speedup, efficiency
Discussion

Discussion: What dimensions can we look at?

  • CPUs
  • Nodes
  • Workload/problem size
Discussion

Exercise: Factors effecting scaling

  • How serial portion of the code effects the scaling? (May be a numerical would help)
  • If we have a infinte number of workers or processes doing a higy parallel code which is 99% is parallized but 1% is serial execution. The speedup will be 100. What is a ideal limit to the speedup.
  • How the communication effects the scaling?
  • Define example payload
    • Long enough to be significant
    • Short enough to be feasible for a quick study
  • Identify dimension for scaling study, e.g.
    • number of processes (on a single node)
    • number of processes (across nodes)
    • number of nodes involved (network-communication boundary)
    • size of workload
    • Decide on number of processes across node, fixed workload size
  • Choose limits (e.g. 1, 2, 4, … cores), within reasonable size for given Cluster
  • Beyond nodes? Set to one node?

Parameter Scan


  • Take measurements
    • Use time and repeating measurements (something like 3 or 10)
    • Vary scaling parameter
Discussion

Exercise: Run the Example with different -n

  • 1, 2, 4, 8, 16, 32, … cores and same workload
  • Take time measurements (ideally multiple and with --exclusive)

Analyzing results


Discussion

Exercise: Plot the scaling

  • Plot it against time
  • Calculate speedup with respect to baseline with 1 core
  • What’s a good working point? How
  • Overhead
  • Efficiency: not wasting cores if adding them doesn’t do much

Summary


What’s a good working point for our example (at a given workload)?

Leading question: time and scheduler tools still don’t provide a complete picture, what other ways are there? -> Introduce third party tools to get a good performance overview

Key Points
  • Jobs behave differently with varying resources and workloads
  • Scaling study is necessary to proof a certain behavior of the application
  • Good working points defined by sections where more cores still provide sufficient speedup, but no costs due to overhead etc. occurs