Scaling Study

Last updated on 2025-09-24 | Edit this page

Estimated time: 10 minutes

Overview

Questions

  • How to decide the amount of resources for a job?
  • How does my application behave at different scales?

Objectives

After completing this episode, participants should be able to …

  • Perform a simple scaling study for a given application.
  • Identify good working points for the job configuration.

Narrative:

  • We panic, maybe we need more resources to meet the deadline with our title picture!
  • Requesting resources with bigger systems requires a project proposal with an estimate of the resource demand
  • Colleague told us that this can be answered with a scaling study
  • What is it? How could we do one?

What we’re doing here:

  • Vary number of cores
  • Which metrics are most useful?
  • Define speedup
  • Visualize results

What do we look at?


  • Amdahl’s vs. Gustavsons’s law / strong and weak scaling
  • Walltime, Speedup, efficiency
Discussion

Discussion: What dimensions can we look at?

  • CPUs
  • Nodes
  • Workload/problem size
Discussion

Exercise: Factors effecting scaling

  • How serial portion of the code effects the scaling? (May be a numerical would help)
  • If we have a infinte number of workers or processes doing a higy parallel code which is 99% is parallized but 1% is serial execution. The speedup will be 100. What is a ideal limit to the speedup.
  • How the communication effects the scaling?
  • Define example payload
    • Long enough to be significant
    • Short enough to be feasible for a quick study
  • Identify dimension for scaling study, e.g.
    • number of processes (on a single node)
    • number of processes (across nodes)
    • number of nodes involved (network-communication boundary)
    • size of workload
    • Decide on number of processes across node, fixed workload size
  • Choose limits (e.g. 1, 2, 4, … cores), within reasonable size for given Cluster
  • Beyond nodes? Set to one node?

Parameter Scan


  • Take measurements
    • Use time and repeating measurements (something like 3 or 10)
    • Vary scaling parameter

The below excercise could be the best place to also introduce about mpre detailed pinning options The results of below challenge are also dpendent on the pinning options

Discussion

Exercise: Run the Example with different -n

  • 1, 2, 4, 8, 16, 32, … cores and same workload
  • Take time measurements (ideally multiple and with --exclusive)

Analyzing results


Discussion

Exercise: Plot the scaling

  • Plot it against time
  • Calculate speedup with respect to baseline with 1 core
  • What’s a good working point? How
  • Overhead
  • Efficiency: not wasting cores if adding them doesn’t do much

Summary


What’s a good working point for our example (at a given workload)?

Note on compute time application that need estimate of required compute resources and touch on scaling behavior here? Could be important for one type of learner, if this is given in a context like HPC.NRW. Optional for many others, but maybe interesting.

Leading question: time and scheduler tools still don’t provide a complete picture, what other ways are there? -> Introduce third party tools to get a good performance overview

Key Points
  • Jobs behave differently with varying resources and workloads
  • Scaling study is necessary to proof a certain behavior of the application
  • Good working points defined by sections where more cores still provide sufficient speedup, but no costs due to overhead etc. occurs