Instructor Notes

This is a placeholder file. Please add content here.

Introduction

Intention: Step into the narrative

Set up narrative:

Important upcoming conference presentation
Time is ticking, the deadline is approaching way too fast
The talk is almost done, but, critically, we’re missing a picture for the title slide
It should contain three snowmen, and we’ve exhausted our credits for all generative AI models in previous chats with colleagues
=> Ray tracing a scene to the rescue!
Issue: we need to try many different iterations of the scene to find the exact right picture. How can we maximise the number of raytraced snowman images before our conference deadline?
Ray tracing is expensive, but luckily we have access to an HPC system

What we’re doing here:

Run workflow example for the first time
Simple time measurement to get started
Introduce different perspectives on efficiency
Core-h and correlation to cost in energy/money
Either set up the first Slurm job here or in the next episode

Resource Requirements

Objective: Vary one parameter and compare to baseline

Narrative:

We didn’t use the HPC system in a while
Not sure how many resources we need and what our HPC system even offer
How could I look it up again?
What were the parameters of scheduled jobs again?
What’s a good first guess for the resources I need for an individual render?

What we’re doing here:

Learn about Slurm job parameters
Develop intuition about job size with respect to the cluster
First impression of whats a “good” amount of resources to request for a job

ToDo

This section is just an info dump, how do we make it useful and approachable? What’s a useful exercise? Maybe put info here in other sections?

Scheduler Tools

Intention: Introduce more basic performance metrics

Narrative:

Okay, so first couple of jobs ran, but were they “quick enough”?
How many renders could I generate per minute/hour/day according to the current utilization
Our cluster uses certain hardware, maybe we didn’t use it as much as we could have?
But I couldn’t see all metrics (may be cluster dependent) (Energy, Disk I/O, Network I/O?)

What we’re doing here:

What seff and sacct have to offer
Introduce simple relation to hardware, what does RSS, CPU, Disk read/write and their utilization mean?
Point out what’s missing from a complete picture

ToDo

Can / should we cover I/O and energy metrics at this point?

E.g. use something like beegfs-ctl to get a rough estimate of parallel FS performance. Use pidstat etc. to get numbers on node-local I/O (and much more)

Scaling Study

Intention: Introduce/Recollect concept of Speedup and do a simple scaling study

Narrative:

We panic, maybe we need more resources to meet the deadline with our title picture!
Requesting resources with bigger systems requires a project proposal with an estimate of the resource demand
Colleague told us that this can be answered with a scaling study
What is it? How could we do one?

What we’re doing here:

Vary number of cores
Which metrics are most useful?
Define speedup
Visualize results

ToDo: Advanced details in pinning

The below excercise could be the best place to also introduce about mpre detailed pinning options The results of below challenge are also dpendent on the pinning options

Note on compute time application that need estimate of required compute resources and touch on scaling behavior here? Could be important for one type of learner, if this is given in a context like HPC.NRW. Optional for many others, but maybe interesting.

Performance Overview

Intention: Introduce third party tools for performance reports

Narrative:

Scaling study, scheduler tools, project proposal is written and handed in
Maybe I can squeeze out more from my current system by trying to understand better how it behaves
Another colleague told us about performance measurement tools
We are learning more about our application
Aha, there IS room to optimize! Compile with vectorization

What we’re doing here:

Get a complete picture
Introduce missing metrics / definitions
Relate to hardware on the same level of detail

ToDo: Connect Hardware to Performance Measurements

Introduce hardware on the same level of detail and with the same terms as the performance reports by ClusterCockpit, LinaroForge, etc., as soon as they appear. Only introduce what we need, to avoid info dump. But point to additional information that gives a complete overview -> hpc-wiki!

ToDo: Clarify relation to hardware in this course

Maybe we should either focus on components (CPUs, memory, disk, accelerators, network cards) or functional entities (compute, data hierarchy, bandwidth, latency, parallel timelines)

We shouldn’t go into too much detail here. Define broad categories where performance can be good or bad. (calculations, data transfers, application logic, research objective (is the calculation meaningful?))

Reuse categories in the same order and fashion throughout the course, i.e. point out in what area a discovered inefficiency occurs.

Introduce detail about hardware later where it is needed, e.g. NUMA for pinning and hints.

Pinning

Intention: Go deeper in performance and hardware relationship

Narrative:

We get the feeling, that hardware has a lot to offer, but the rabbit hole is deep!
What are the “dimensions” in which we can optimize the throughput of snowman pictures per hour?
Can we improve how the work maps to certain CPUs / Memory regions?

What we’re doing here:

Introduce pinning and slurm hint options
Relate to hardware effects
Use third party performance tools to observe effects!

ToDo: Extract episode about pinning

Stick to simple options here. Put more complex options for pinning / hints, etc. into its own episode somewhere later in the course

Pinning is an important part of job optimization, but requires some knowledge, e.g. about the hardware hierarchies in a cluster, NUMA, etc. So it should be done after we’ve introduced different performance reports and their perspective on hardware

Maybe point to JSC pinning simulator and have similar diagrams as an independent “offline” version in this course

How to identify a bottleneck?

Intention: Uncover one or two issues in the application

Narrative:

Okay, what’s slowest with creating snowman pictures?
Where does our system choke?

What we’re doing here:

What’s a bottleneck?
How can we identify a bottleneck?
“Online” and “after the fact” workflows of performance measurements (trace, accumulated results, attached to the process (live), or after it ran)
Point to additional resources of common performance/bottleneck issues, e.g. on hpc-wiki

Maybe something like this already occurred before in 4. Scaling Study, or 5. Performance Overview

Performance of Accelerators

Intention: Jump onto accelerator with the example application

Narrative:

The deadline is creeping up, only few ways to go!
Hey, we have a GPU partition! Maybe this will help us speed up the process!

What we’re doing here:

What changes?
New metrics
Transfer to/from accelerator
Different options/requirements to scheduler & performance measurement tools

Start with picture of beautiful title slide of the talk with the snowman picture
Next time we want to tackle the issue way in advance
Approach our raytracing application more systematically, such that we can get the title slide done much quicker
What could we do to dive deeper in optimizing the raytracer?
Where can we go from here?

What we’re doing here:

Learning important programming concepts (parallel programming on many levels)
Deeper application profiling & tools to use

Instructor Notes

Introduction

Intention: Step into the narrative

Resource Requirements

Objective: Vary one parameter and compare to baseline

ToDo

Scheduler Tools

Intention: Introduce more basic performance metrics

ToDo

Scaling Study

Intention: Introduce/Recollect concept of Speedup and do a simple scaling study

ToDo: Advanced details in pinning

ToDo

Performance Overview

Intention: Introduce third party tools for performance reports

ToDo: Connect Hardware to Performance Measurements

ToDo: Clarify relation to hardware in this course

Pinning

Intention: Go deeper in performance and hardware relationship

ToDo: Extract episode about pinning

How to identify a bottleneck?

Intention: Uncover one or two issues in the application

Performance of Accelerators

Intention: Jump onto accelerator with the example application

ToDo

ToDo

Next Steps

Intention: Provide a roadmap learners could follow