How to identify a bottleneck?
Last updated on 2025-09-24 | Edit this page
Overview
Questions
- How can I find the bottlenecks in a given job?
- What are common workflows to evaluate performance?
- What are some common types of bottlenecks?
Objectives
After completing this episode, participants should be able to …
- Choose between multiple workflows to evaluate job performance.
- Name typical performance issues.
- Determine if their job is affected by one of these issues.
How to identify a bottleneck?
Summary
Leading question: We were looking at a standard configuration with CPU, Memory, Disks, Network, so far. What about GPU applications, which are very common these days?
- General advice on the workflow
- Performance reports may provide an automated summary with recommendations
- Performance metrics can be categorized by the underlying hardware, e.g. CPU, memory, I/O, accelerators.
- Bottlenecks can appear by metrics being saturated at the physical limits of the hardware or indirectly by other metrics being far from what the physical limits are.
- Interpreting bottlenecks is closely related to what the application is supposed to do.
- Relative measurements (baseline vs. change)
- system is quiescent, fixed CPU freq + affinity, warmups, …
- Reproducibility -> link to git course?
- Scanning results for smoking guns
- Any best practices etc.