Introduction


  • Job performance affects you as a user
  • Different perspectives on efficiency
    • Definitions: wall/human-time, compute-time, time-to-solution, energy (costs / environment), Money, opportunity cost (less research output)
  • Relationship between performance and computer hardware
  • Absolute vs. relative performance measurements
    • time to establish a baseline
    • Estimating energy consumption

Resource Requirements


  • Estimate resource requirements and request them in terms the scheduler understands
  • Be aware of your job in relation to the whole system (available hardware, size)
  • Aim for a good match between requested and utilized resources
  • Optimal time-to-solution by minimizing batch queue times and maximizing parallelism

Scaling Study


  • Jobs behave differently with varying resources and workloads
  • Scaling study is necessary to proof a certain behavior of the application
  • Good working points defined by sections where more cores still provide sufficient speedup, but no costs due to overhead etc. occurs

Scheduler Tools


  • sacct and seff for first results
  • Small scaling study, maximum of X% overhead is “still good” (larger resource req. vs. speedup)
  • Getting a feel for scale of the HPC system, e.g. “is 64 cores a lot?”, how large is my job in comparison?
  • CPU and Memory Utilization
  • Core-h and relationship to power efficiency

Workflow of Performance Measurements


  • First things first, second things second, …
  • Profiling, tracing
  • Sampling, summation
  • Different HPC centers may provide different approaches to this workflow
  • Performance reports offer more insight into the job and application behavior

How to identify a bottleneck?


  • General advice on the workflow
  • Performance reports may provide an automated summary with recommendations
  • Performance metrics can be categorized by the underlying hardware, e.g. CPU, memory, I/O, accelerators.
  • Bottlenecks can appear by metrics being saturated at the physical limits of the hardware or indirectly by other metrics being far from what the physical limits are.
  • Interpreting bottlenecks is closely related to what the application is supposed to do.
  • Relative measurements (baseline vs. change)
    • system is quiescent, fixed CPU freq + affinity, warmups, …
    • Reproducibility -> link to git course?
  • Scanning results for smoking guns
  • Any best practices etc.

Special Aspects of Accelerators


  • Tools to measure GPU/FPGA performance of a job
  • Common symptoms of GPU/FPGA problems

Next Steps


  • There are many profilers, some are language-specific, others are vendor-related, …
  • Simple profile with exclusive resources
  • Repeated measurements for reliability