Scheduler Tools
Last updated on 2025-07-02 | Edit this page
Estimated time: 10 minutes
Overview
Questions
- What information can the scheduler provide about my jobs performance?
- What’s the meaning of the collected metrics?
Objectives
After completing this episode, participants should be able to …
- Explain basic performance metrics.
- Use tools provided by the scheduler to collect basic performance metrics of their jobs.
Scheduler Tools
-
sacct
-
MaxRSS
,AvgRSS
-
MaxPages
,AvgPages
-
AvgCPU
,AllocCPUS
- `ElapsedI
-
MaxDiskRead
, AvgDiskRead`, -
MaxDiskWrite
,AvgDiskWrite
energy
-
-
seff
- Utilization of time allocation
- Utilization of allocated CPUs (is 100% <=> efficient? Not if calculations are redundant etc.!)
- Utilization of allocated memory
Shortcomings
- Not enough info about e.g. I/O, no timeline of metrics during job
execution, …
- I/O may be available, but likely only for local disks
- => no parallel FS
- => no network
- Energy demand may be missing or wrong
- Depends on available features
- Doesn’t estimate energy for network switches, cooling, etc.
- => trying other tools! (motivation for subsequent episodes)
Summary
Key Points
-
sacct
andseff
for first results - Small scaling study, maximum of X% overhead is “still good” (larger resource req. vs. speedup)
- Getting a feel for scale of the HPC system, e.g. “is 64 cores a lot?”, how large is my job in comparison?
- CPU and Memory Utilization
- Core-h and relationship to power efficiency