Instructor Notes
This is a placeholder file. Please add content here.
Introduction
Intention: Step into the narrative
Set up narrative:
- Important upcoming conference presentation
- Time is ticking, the deadline is approaching way too fast
- The talk is almost done, but, critically, we’re missing a picture for the title slide
- It should contain three snowmen, and we’ve exhausted our credits for all generative AI models in previous chats with colleagues
- => Ray tracing a scene to the rescue!
- Issue: we need to try many different iterations of the scene to find the exact right picture. How can we maximise the number of raytraced snowman images before our conference deadline?
- Ray tracing is expensive, but luckily we have access to an HPC system
What we’re doing here:
- Run workflow example for the first time
- Simple
time
measurement to get started - Introduce different perspectives on efficiency
- Core-h and correlation to cost in energy/money
- Either set up the first Slurm job here or in the next episode
Resource Requirements
Objective: Vary one parameter and compare to baseline
Narrative:
- We didn’t use the HPC system in a while
- Not sure how many resources we need and what our HPC system even offer
- How could I look it up again?
- What were the parameters of scheduled jobs again?
- What’s a good first guess for the resources I need for an individual render?
What we’re doing here:
- Learn about Slurm job parameters
- Develop intuition about job size with respect to the cluster
- First impression of whats a “good” amount of resources to request for a job
ToDo
This section is just an info dump, how do we make it useful and approachable? What’s a useful exercise? Maybe put info here in other sections?
Scheduler Tools
Intention: Introduce more basic performance metrics
Narrative:
- Okay, so first couple of jobs ran, but were they “quick enough”?
- How many renders could I generate per minute/hour/day according to the current utilization
- Our cluster uses certain hardware, maybe we didn’t use it as much as we could have?
- But I couldn’t see all metrics (may be cluster dependent) (Energy, Disk I/O, Network I/O?)
What we’re doing here:
- What
seff
andsacct
have to offer - Introduce simple relation to hardware, what does RSS, CPU, Disk read/write and their utilization mean?
- Point out what’s missing from a complete picture
ToDo
Can / should we cover I/O and energy metrics at this point?
E.g. use something like beegfs-ctl
to get a rough
estimate of parallel FS performance. Use pidstat etc. to get numbers on
node-local I/O (and much more)
Scaling Study
Intention: Introduce/Recollect concept of Speedup and do a simple scaling study
Narrative:
- We panic, maybe we need more resources to meet the deadline with our title picture!
- Requesting resources with bigger systems requires a project proposal with an estimate of the resource demand
- Colleague told us that this can be answered with a scaling study
- What is it? How could we do one?
What we’re doing here:
- Vary number of cores
- Which metrics are most useful?
- Define speedup
- Visualize results
ToDo: Advanced details in pinning
The below excercise could be the best place to also introduce about mpre detailed pinning options The results of below challenge are also dpendent on the pinning options
ToDo
Note on compute time application that need estimate of required compute resources and touch on scaling behavior here? Could be important for one type of learner, if this is given in a context like HPC.NRW. Optional for many others, but maybe interesting.
Performance Overview
Intention: Introduce third party tools for performance reports
Narrative:
- Scaling study, scheduler tools, project proposal is written and handed in
- Maybe I can squeeze out more from my current system by trying to understand better how it behaves
- Another colleague told us about performance measurement tools
- We are learning more about our application
- Aha, there IS room to optimize! Compile with vectorization
What we’re doing here:
- Get a complete picture
- Introduce missing metrics / definitions
- Relate to hardware on the same level of detail
ToDo: Connect Hardware to Performance Measurements
Introduce hardware on the same level of detail and with the same terms as the performance reports by ClusterCockpit, LinaroForge, etc., as soon as they appear. Only introduce what we need, to avoid info dump. But point to additional information that gives a complete overview -> hpc-wiki!
ToDo: Clarify relation to hardware in this course
Maybe we should either focus on components (CPUs, memory, disk, accelerators, network cards) or functional entities (compute, data hierarchy, bandwidth, latency, parallel timelines)
We shouldn’t go into too much detail here. Define broad categories where performance can be good or bad. (calculations, data transfers, application logic, research objective (is the calculation meaningful?))
Reuse categories in the same order and fashion throughout the course, i.e. point out in what area a discovered inefficiency occurs.
Introduce detail about hardware later where it is needed, e.g. NUMA for pinning and hints.
Pinning
Intention: Go deeper in performance and hardware relationship
Narrative:
- We get the feeling, that hardware has a lot to offer, but the rabbit hole is deep!
- What are the “dimensions” in which we can optimize the throughput of snowman pictures per hour?
- Can we improve how the work maps to certain CPUs / Memory regions?
What we’re doing here:
- Introduce pinning and slurm hint options
- Relate to hardware effects
- Use third party performance tools to observe effects!
ToDo: Extract episode about pinning
Stick to simple options here. Put more complex options for pinning / hints, etc. into its own episode somewhere later in the course
Pinning is an important part of job optimization, but requires some knowledge, e.g. about the hardware hierarchies in a cluster, NUMA, etc. So it should be done after we’ve introduced different performance reports and their perspective on hardware
Maybe point to JSC pinning simulator and have similar diagrams as an independent “offline” version in this course
How to identify a bottleneck?
Intention: Uncover one or two issues in the application
Narrative:
- Okay, what’s slowest with creating snowman pictures?
- Where does our system choke?
What we’re doing here:
- What’s a bottleneck?
- How can we identify a bottleneck?
- “Online” and “after the fact” workflows of performance measurements (trace, accumulated results, attached to the process (live), or after it ran)
- Point to additional resources of common performance/bottleneck issues, e.g. on hpc-wiki
Maybe something like this already occurred before in 4. Scaling Study, or 5. Performance Overview
Performance of Accelerators
Intention: Jump onto accelerator with the example application
Narrative:
- The deadline is creeping up, only few ways to go!
- Hey, we have a GPU partition! Maybe this will help us speed up the process!
What we’re doing here:
- What changes?
- New metrics
- Transfer to/from accelerator
- Different options/requirements to scheduler & performance measurement tools
ToDo
Don’t mention FPGAs too much, maybe just a node what accelerators could be, besides GPU. Goal is to keep it simple and accessible, focus on what’s common in most HPC systems these days
ToDo
Explain how to decide where to run something. CPU vs. small GPU vs. high-end GPUs. Touches on transfer overhead etc.
Next Steps
Intention: Provide a roadmap learners could follow
Narrative:
- Start with picture of beautiful title slide of the talk with the snowman picture
- Next time we want to tackle the issue way in advance
- Approach our raytracing application more systematically, such that we can get the title slide done much quicker
- What could we do to dive deeper in optimizing the raytracer?
- Where can we go from here?
What we’re doing here:
- Learning important programming concepts (parallel programming on many levels)
- Deeper application profiling & tools to use