Introduction

Last updated on 2025-07-01 | Edit this page

Estimated time: 10 minutes

Overview

Questions

Why should I care about my jobs performance?
How is efficiency defined?
How do I start measuring?

Objectives

After completing this episode, participants should be able to …

Understand the benefits of efficient jobs.
Identify which areas of computer hardware may affect performance.
Use the time command for a first measurement.

Why Care About Performance?

Reasons from the perspective of learners (see profiles)

Faster output, shorter iteration-/turn-around-time
- More research per time
- Opportunity costs when “accepting” worse performance
- Trade of between time spent on optimizing vs. doing actual research
Potentially less wasted energy
- Core-h / device-h directly correlate to wattage
- Production of hardware and its operation costs energy (even when idle)
- => Buy as little hardware as possible and use it as much as you can, if you have meaningful computations
Applying for HPC resources in a larger center
- Need estimate for expected resources
- Jobs need to be sufficiently efficient
- Is provided hardware a good fit for the applied computational workload?

Exercise: Why care about performance?

maybe true-false statements as warmup exercise? E.g. something like

Better performance allows for more research
Application performance matters less on new computer hardware
Computations directory correlate to energy consumption
Good performance does not matter on my own hardware

Show me the solution

True, shorter turn around times, more results per time, more nobel prices per second!
False, new hardware might make performance issues less pressing, but it is still important (opportunity costs, wasted energy, shared resources)
True, device-hours consume energy (variable depending on utilized features, amount of communication, etc.), but there is a direct correlation to W
False, performance is especially important on shared systems, but energy and opportunity costs also affect researchers on their own hardware and exclusive allocations.

Discussion: How important is performance?

Did you change your opinion about the importance of good performance?
How much time do you want to/can you spend on assessing your jobs performance?

What is Efficient?

Challenge: Many perspectives on Efficiency

Write down your current definition or understanding of efficiency with respect to HPC jobs. (Shared document?)

(Exercise as think, pair, share?)

Give me a hint

E.g. shortest time from submission to job completion.

Show me the solution

Many definitions of efficiency (see below)

Discussion: Which definition should we take?

Are these perspectives equally useful? Is one particularly suited to our discussion?

Many definitions of efficiency (to be ordered and discussed):

Minimal wall-/human-time of the job
Minimal compute-time
Minimal time-to-solution (like 1, including queue wait times, potentially multiple jobs for combined results)
Minimal cost in terms of energy/someones money
With regards to opportunity costs. Amount of research per job (including waiting times, computation time, slowdown through larger iteration cycles (turn around times))

Assuming only “useful” computations, no redundancies.

Which definition do refer to by default in the following episodes? (Do we need a default?)

How Does Performance Relate to Hardware?

(Following this structure throughout the course, trying to understand the performance in these terms)

Broad dimensions of performance:

CPU (Front- and Backend, FLOPS)
- Frontend: decoding instructions, branch prediction, pipeline
- Backend: getting data from memory, cache hierarchy & alignment
- Raw calculations
- Vectorization
- Out-of-order execution
Accelerators (e.g. GPUs)
- More calculations
- Offloading
- Memory & communication models
Memory (data hierarchy)
- Working memory, reading data from/to disk
- Bandwidth of data
I/O (broader data hierarchy: disk, network)
- Stored data
- Local disk (caching)
- Parallel fs (cluster-wide)
- MPI-Communiction
Parallel timeline (synchronization, etc.)
- Application logic

ToDo

Maybe we should either focus on components (CPUs, memory, disk, accelerators, network cards) or functional entities (compute, data hierarchy, bandwidth, latency, parallel timelines)

Exercise: Match application behavior to hardware

Which part of the computer hardware may become an issue for the following application patterns:

Calculating matrix multiplications
Reading data from processes on other computers
Calling many different functions from many equally likely if/else branches
Writing very large files (TB)
Comparing many different strings if they match
Constructing a large simulation model
Reading thousands of small files for each iteration

Maybe not the best questions, also missing something for accelerators.

Show me the solution

CPU (FLOPS) and/or Parallel timeline
I/O (network)
CPU (Front-End)
I/O (disk)
(?) CPU-Backend, getting strings through the cache?
Memory (size)
I/O (disk)

Setting the Baseline

Absolute performance is hard to determine:

In comparison to current hardware (theoretical limits vs. real usage)
Still important, if long way from theoretical limits
Always limited by something

During optimization, performance is often expressed in relative terms to a baseline measurement. Define “baseline”. Comparison between before and after a change.

Exercise: Baseline Measurement with `time`

Simple measurement with time of example application. Maybe also with hyperfine ?

Observe system, user, and wall time.

Repeat measurements somewhere 3-10 times to reduce noise

Average time
Minimum (observed best case)

Maybe make a simple/obvious change to compare change to baseline. How much relative improvement?

Discuss meaning of system, user, wall-time. Relate to efficiencies (minimal wall-time vs. minimal compute-time)

Define core-h. Device usage for X seconds correlates to estimated power draw. Real power usage depends on:

Utilized features of the device (some more power-hungry than others)
Amount of data movement through memory, data, network
Cooling (rule of thumb factor \(\times 2\))

Exercise: Core-h and Energy consumption

Figure out your current hardware (docu, cpuinfo, websearch, LLM)
Calculate core-h for above test (either including or excluding repetitions)
Estimate power usage with TDP

Summary

Exercise: Recollecting efficiency

Exercise to raise the question if example workload is efficient or not. Do we know yet? -> No, we can only tell how long it takes, estimate how much time/resources it consumes, and if there is a relative improvement on a change

Key Points

Job performance affects you as a user
Different perspectives on efficiency
- Definitions: wall/human-time, compute-time, time-to-solution, energy (costs / environment), Money, opportunity cost (less research output)
Relationship between performance and computer hardware
Absolute vs. relative performance measurements
- time to establish a baseline
- Estimating energy consumption

Introduction

Overview

Questions

Objectives

Why Care About Performance?

Exercise: Why care about performance?

Show me the solution

Discussion: How important is performance?

What is Efficient?

Challenge: Many perspectives on Efficiency

Give me a hint

Show me the solution

Discussion: Which definition should we take?

How Does Performance Relate to Hardware?

ToDo

Exercise: Match application behavior to hardware

Show me the solution

Setting the Baseline

Exercise: Baseline Measurement with time

Exercise: Core-h and Energy consumption

Summary

Exercise: Recollecting efficiency

Key Points

Exercise: Baseline Measurement with `time`