Pre-Alpha

Job Efficiency

Scheduler Tools

Last updated on 2025-07-02 | Edit this page

Estimated time: 10 minutes

Overview

Questions

What information can the scheduler provide about my jobs performance?
What’s the meaning of the collected metrics?

Objectives

After completing this episode, participants should be able to …

Explain basic performance metrics.
Use tools provided by the scheduler to collect basic performance metrics of their jobs.

Scheduler Tools

sacct
- MaxRSS, AvgRSS
- MaxPages, AvgPages
- AvgCPU, AllocCPUS
- `ElapsedI
- MaxDiskRead, AvgDiskRead`,
- MaxDiskWrite, AvgDiskWrite
- energy
seff
- Utilization of time allocation
- Utilization of allocated CPUs (is 100% <=> efficient? Not if calculations are redundant etc.!)
- Utilization of allocated memory

Shortcomings

Not enough info about e.g. I/O, no timeline of metrics during job execution, …
- I/O may be available, but likely only for local disks
- => no parallel FS
- => no network
Energy demand may be missing or wrong
- Depends on available features
- Doesn’t estimate energy for network switches, cooling, etc.
=> trying other tools! (motivation for subsequent episodes)

Summary

Key Points

sacct and seff for first results
Small scaling study, maximum of X% overhead is “still good” (larger resource req. vs. speedup)
Getting a feel for scale of the HPC system, e.g. “is 64 cores a lot?”, how large is my job in comparison?
CPU and Memory Utilization
Core-h and relationship to power efficiency