Job Efficiency: All in One View

Content from Introduction

Last updated on 2025-09-24 | Edit this page

Estimated time: 10 minutes

Overview

Questions

Why should I care about job performance?
How is efficiency defined?
How do I start measuring?
Is my job fast enough?

Objectives

After completing this episode, participants should be able to …

Use the time command for a first measurement.
Understand the benefits of efficient jobs.
Roughly estimate a job energy consumption based on core-h.

Intention: Step into the narrative

Set up narrative:

Important upcoming conference presentation
Time is ticking, the deadline is approaching way too fast
The talk is almost done, but, critically, we’re missing a picture for the title slide
It should contain three snowmen, and we’ve exhausted our credits for all generative AI models in previous chats with colleagues
=> Ray tracing a scene to the rescue!
Issue: we need to try many different iterations of the scene to find the exact right picture. How can we maximise the number of raytraced snowman images before our conference deadline?
Ray tracing is expensive, but luckily we have access to an HPC system

What we’re doing here:

Run workflow example for the first time
Simple time measurement to get started
Introduce different perspectives on efficiency
Core-h and correlation to cost in energy/money
Either set up the first Slurm job here or in the next episode

Setting the Baseline

Absolute performance is hard to determine:

In comparison to current hardware (theoretical limits vs. real usage)
Still important, if long way from theoretical limits
Always limited by something (one optimization just shifts to the next saturated bottleneck)

During optimization, performance is often expressed in relative terms to a baseline measurement. Define “baseline”. Comparison between before and after a change.

Challenge

Exercise: Baseline Measurement with `time`

Simple measurement with time of example application. Maybe also with hyperfine ?

Observe system, user, and wall time.

Repeat measurements somewhere 3-10 times to reduce noise

Average time
Minimum (observed best case)

Maybe make a simple/obvious change to compare change to baseline. How much relative improvement?

Show me the solution

Example of how to run it and what the result looks like

Discuss meaning of system, user, wall-time. Relate to efficiencies (minimal wall-time vs. minimal compute-time)

Why Care About Performance?

Reasons from the perspective of learners (see profiles)

Faster output, shorter iteration-/turn-around-time
- More research per time
- Opportunity costs when “accepting” worse performance
- Trade of between time spent on optimizing vs. doing actual research
Potentially less wasted energy
- Core-h / device-h directly correlate to wattage
- Production of hardware and its operation costs energy (even when idle)
- => Buy as little hardware as possible and use it as much as you can, if you have meaningful computations
Applying for HPC resources in a larger center
- Need estimate for expected resources
- Jobs need to be sufficiently efficient
- Is provided hardware a good fit for the applied computational workload?

Challenge

Exercise: Why care about performance?

maybe true-false statements as warmup exercise? E.g. something like

Better performance allows for more research
Application performance matters less on new computer hardware
Computations directory correlate to energy consumption
Good performance does not matter on my own hardware

All statements should be connected to the example job & narrative!

Show me the solution

True, shorter turn around times, more results per time, more nobel prices per second!
False, new hardware might make performance issues less pressing, but it is still important (opportunity costs, wasted energy, shared resources)
True, device-hours consume energy (variable depending on utilized features, amount of communication, etc.), but there is a direct correlation to W
False, performance is especially important on shared systems, but energy and opportunity costs also affect researchers on their own hardware and exclusive allocations.

Core-h and Energy

Define core-h. Device usage for X seconds correlates to estimated power draw. Real power usage depends on:

Utilized features of the device (some more power-hungry than others)
Amount of data movement through memory, data, network
Cooling (rule of thumb factor \(\times 2\))

Looking at energy is one perspective on “efficiency”.

Challenge

Exercise: Core-h and Energy consumption

Figure out your current hardware (docu, cpuinfo, websearch, LLM)
Calculate core-h for above test (either including or excluding repetitions)
Estimate power usage with TDP
Keep it simple, back of the envelope calculations

Show me the solution

Example for an existing cluster. Stick to CPU TDP, maybe rough number for whole node from somewhere, multiply factor 2 for cooling, mention not-covered network and storage infrastructure

What is Efficient?

Challenge

Challenge: Many perspectives on Efficiency

Write down your current definition or understanding of efficiency with respect to HPC jobs. (Shared document?)

(Exercise as think, pair, share?)

Give me a hint

E.g. shortest time from submission to job completion.

Show me the solution

Many definitions of efficiency (see below)

Discussion

Discussion: Which definition should we take?

Are these perspectives equally useful? Is one particularly suited to our discussion?

Many definitions of efficiency (to be ordered and discussed):

Minimal wall-/human-time of the job
Minimal compute-time
Minimal time-to-solution (like 1, including queue wait times, potentially multiple jobs for combined results)
Minimal cost in terms of energy/someones money
With regards to opportunity costs. Amount of research per job (including waiting times, computation time, slowdown through larger iteration cycles (turn around times))

Assuming only “useful” computations, no redundancies.

Which definition do refer to by default in the following episodes? (Do we need a default?)

Summary

Discussion

Exercise: Recollecting efficiency

Exercise to raise the question if example workload is efficient or not. Do we know yet? -> No, we can only tell how long it takes, estimate how much time/resources it consumes, and if there is a relative improvement on a change

Leading question: Single baseline measurement doesn’t say much about the application performance, how can I get an understanding of performance? -> Vary a parameter in the next episode and touch on Slurm options

Key Points

Absolute vs. relative performance measurements
- time to establish a baseline
- Estimating energy consumption
Job performance affects you as a user
Core-h and very rough energy estimate
Different perspectives on efficiency
- Definitions: wall/human-time, compute-time, time-to-solution, energy (costs / environment), Money, opportunity cost (less research output)
Relationship between performance and computer hardware

Content from Resource Requirements

Last updated on 2025-09-24 | Edit this page

Estimated time: 10 minutes

Overview

Questions

How many resources should I request initially?
What scheduler options exist to request resources?
How do I know if they are used well?
How large is my HPC cluster?

Objectives

After completing this episode, participants should be able to …

Identify the size of their jobs in relation to the HPC system.
Request a good amount of resources from the scheduler.
Change the parameters to see how the execution time changes.

Objective: Vary one parameter and compare to baseline

Narrative:

We didn’t use the HPC system in a while
Not sure how many resources we need and what our HPC system even offer
How could I look it up again?
What were the parameters of scheduled jobs again?
What’s a good first guess for the resources I need for an individual render?

What we’re doing here:

Learn about Slurm job parameters
Develop intuition about job size with respect to the cluster
First impression of whats a “good” amount of resources to request for a job

Starting Somewhere

Didactic path: I have no idea how many resources to ask for -> just guess and start with some combinations. Next identify slower, or failed (OOM, timelimit) and choose the best What does that say about efficiency?

Discussion

Exercise: Starting Somewhere

Run job with a timelimit of 1 minute -> Trigger timelimit. What’s a good timelimit for our task?
Run job with few cores, but too much memory/core -> Trigger OOM. What’s a good memory limit for our task?
Run job with requesting way too many cores -> Endless waiting or not accepted due to account limits. What’s a good CPU limit for our task?
squeue to learn about scheduling issues / reasons

Summarize dimensions in which a job has to be sized correctly (time, cores, memory, gpus, …).

Compared to the HPC System

What’s the relationship between your job and existing hardware of the system?
- What hardware does your HPC system offer?
- Documentation and Slurm commands
Is my job large or small?
- What’s considered large, medium, small? Maybe as percentage of whole system?
- Issues of large jobs: long waiting times
- Issues of many (thousands) small jobs:
How many resources are currently free?
How long do I have to wait? (looking up scheduler estimate + apply common sense)

Discussion

Exercise: Comparing to the system

sinfo to learn about partitions and free resources
scontrol to learn about nodes in those partitions
lscpu and cat /proc/cpuinfo
Submit a job with a reasonable number of resources and use squeue and/or scontrol show job to learn about Slurms estimated start time

Answer questions about number and type of CPUs, HT/SMT, memory/core, timelimits.

Summarize with a well sized job that’s a good start for the example.

Requesting Resources

ToDo

This section is just an info dump, how do we make it useful and approachable? What’s a useful exercise? Maybe put info here in other sections?

More detail about what Slurm provides (among others):

-t, --time=<time>: Time limit of the job
-N, --nodes: Number of nodes
-n, --ntasks: Number of tasks/processes
-c, --cpus-per-task: Number of CPUs per task/process
--threads-per-core=<threads>: Select nodes with at least the number threads per CPU
--mem=<size>[units]: Memory, but can also be as --mem-per-cpu, …
-G, --gpus: Number of GPUs
--exclusive

Maybe discuss:

Minimizing/maximizing involved number of nodes
- Shared nodes: longer waiting times until a whole node is empty
- Min/max number of nodes min/maximizes communication
Different wait times for certain configurations
- Few tasks on many shared nodes might schedule faster than many tasks on few exclusive nodes.
What is a task / process – Difference?
Requesting memory, more than mem/core -> idle cores

Changing requirements

Motivate why requirements might change (resolution in simulation, more data, more complex model, …)
How to change requested resources if application should run differently? (e.g. more processes)
Considerations & estimates for
- changing compute-time (more/less workload)
- changing memory requirements (smaller/larger model)
- changing number of processes / nodes
- changing I/O -> more/less or larger/smaller files

Discussion

Exercise: Changing requirements

Walk through how to estimate increase in CPU cores / memory, etc.
Run previous job with larger workload
Check if and how it behaves differently than the smaller job

Summary

Discussion

Discussion: Recollection

Circle back to efficiency. What’s considered good/efficient in context of job requirements and parameters?

Leading question: time doesn’t give much information, is there an easy way to get more? -> See what Slurm tools can tell about our previous jobs

Key Points

Estimate resource requirements and request them in terms the scheduler understands
Be aware of your job in relation to the whole system (available hardware, size)
Aim for a good match between requested and utilized resources
Optimal time-to-solution by minimizing batch queue times and maximizing parallelism

Content from Scheduler Tools

Last updated on 2025-09-24 | Edit this page

Estimated time: 10 minutes

Overview

Questions

What can the scheduler tell about job performance?
What’s the meaning of collected metrics?

Objectives

After completing this episode, participants should be able to …

Explain basic performance metrics.
Use tools provided by the scheduler to collect basic performance metrics of their jobs.

Intention: Introduce more basic performance metrics

Narrative:

Okay, so first couple of jobs ran, but were they “quick enough”?
How many renders could I generate per minute/hour/day according to the current utilization
Our cluster uses certain hardware, maybe we didn’t use it as much as we could have?
But I couldn’t see all metrics (may be cluster dependent) (Energy, Disk I/O, Network I/O?)

What we’re doing here:

What seff and sacct have to offer
Introduce simple relation to hardware, what does RSS, CPU, Disk read/write and their utilization mean?
Point out what’s missing from a complete picture

Scheduler Tools

sacct
- MaxRSS, AvgRSS
- MaxPages, AvgPages
- AvgCPU, AllocCPUS
- `ElapsedI
- MaxDiskRead, AvgDiskRead`,
- MaxDiskWrite, AvgDiskWrite
- energy
seff
- Utilization of time allocation
- Utilization of allocated CPUs (is 100% <=> efficient? Not if calculations are redundant etc.!)
- Utilization of allocated memory

Shortcomings

Not enough info about e.g. I/O, no timeline of metrics during job execution, …
- I/O may be available, but likely only for local disks
- => no parallel FS
- => no network
Energy demand may be missing or wrong
- Depends on available features
- Doesn’t estimate energy for network switches, cooling, etc.
=> trying other tools! (motivation for subsequent episodes)

ToDo

Can / should we cover I/O and energy metrics at this point?

E.g. use something like beegfs-ctl to get a rough estimate of parallel FS performance. Use pidstat etc. to get numbers on node-local I/O (and much more)

Summary

Leading question: Is there a systematic approach to study a jobs performance at different scales? -> Scaling study

Key Points

sacct and seff for first results
Small scaling study, maximum of X% overhead is “still good” (larger resource req. vs. speedup)
Getting a feel for scale of the HPC system, e.g. “is 64 cores a lot?”, how large is my job in comparison?
CPU and Memory Utilization
Core-h and relationship to power efficiency

Content from Scaling Study

Last updated on 2025-09-24 | Edit this page

Estimated time: 10 minutes

Overview

Questions

How to decide the amount of resources for a job?
How does my application behave at different scales?

Objectives

After completing this episode, participants should be able to …

Perform a simple scaling study for a given application.
Identify good working points for the job configuration.

Intention: Introduce/Recollect concept of Speedup and do a simple scaling study

Narrative:

We panic, maybe we need more resources to meet the deadline with our title picture!
Requesting resources with bigger systems requires a project proposal with an estimate of the resource demand
Colleague told us that this can be answered with a scaling study
What is it? How could we do one?

What we’re doing here:

Vary number of cores
Which metrics are most useful?
Define speedup
Visualize results

What do we look at?

Amdahl’s vs. Gustavsons’s law / strong and weak scaling
Walltime, Speedup, efficiency

Discussion

Discussion: What dimensions can we look at?

Show me the solution

CPUs
Nodes
Workload/problem size

Discussion

Exercise: Factors effecting scaling

How serial portion of the code effects the scaling? (May be a numerical would help)
If we have a infinte number of workers or processes doing a higy parallel code which is 99% is parallized but 1% is serial execution. The speedup will be 100. What is a ideal limit to the speedup.
How the communication effects the scaling?

Define example payload
- Long enough to be significant
- Short enough to be feasible for a quick study
Identify dimension for scaling study, e.g.
- number of processes (on a single node)
- number of processes (across nodes)
- number of nodes involved (network-communication boundary)
- size of workload
- Decide on number of processes across node, fixed workload size
Choose limits (e.g. 1, 2, 4, … cores), within reasonable size for given Cluster
Beyond nodes? Set to one node?

Parameter Scan

Take measurements
- Use time and repeating measurements (something like 3 or 10)
- Vary scaling parameter

ToDo: Advanced details in pinning

The below excercise could be the best place to also introduce about mpre detailed pinning options The results of below challenge are also dpendent on the pinning options

Discussion

Exercise: Run the Example with different -n

1, 2, 4, 8, 16, 32, … cores and same workload
Take time measurements (ideally multiple and with --exclusive)

Analyzing results

Discussion

Exercise: Plot the scaling

Plot it against time
Calculate speedup with respect to baseline with 1 core

What’s a good working point? How
Overhead
Efficiency: not wasting cores if adding them doesn’t do much

Summary

What’s a good working point for our example (at a given workload)?

ToDo

Note on compute time application that need estimate of required compute resources and touch on scaling behavior here? Could be important for one type of learner, if this is given in a context like HPC.NRW. Optional for many others, but maybe interesting.

Leading question: time and scheduler tools still don’t provide a complete picture, what other ways are there? -> Introduce third party tools to get a good performance overview

Key Points

Jobs behave differently with varying resources and workloads
Scaling study is necessary to proof a certain behavior of the application
Good working points defined by sections where more cores still provide sufficient speedup, but no costs due to overhead etc. occurs

Content from Performance Overview

Last updated on 2025-09-24 | Edit this page

Estimated time: 10 minutes

Overview

Questions

Why are tools like seff and sacct not enough?
What steps can I take to assess a jobs performance?
What popular types of reports exist? (e.g. Roofline)

Objectives

After completing this episode, participants should be able to …

Explain different approaches to performance measurements.
Understand common terms and concepts in performance analyses.
Create a performance report through a third-party tool.
Describe what a performance report is meant for (establish baseline, documentation of issues and improvements through optimization, publication of results, finding the next thread to pull in a quest for optimization)
Measure the performance of central components of underlying hardware (CPU, Memory, I/O, …) (split episode?)
Identify which general areas of computer hardware may affect performance.

Intention: Introduce third party tools for performance reports

Narrative:

Scaling study, scheduler tools, project proposal is written and handed in
Maybe I can squeeze out more from my current system by trying to understand better how it behaves
Another colleague told us about performance measurement tools
We are learning more about our application
Aha, there IS room to optimize! Compile with vectorization

What we’re doing here:

Get a complete picture
Introduce missing metrics / definitions
Relate to hardware on the same level of detail

Workflow

Define sampling and tracing
Describe common approaches

Tools

Performance counters and permissions, may require --exclusive, depends on system! Look at documentation / talk to your administrators / support.

cap_perfmon,cap_sys_ptrace,cap_syslog=ep
kernel.perf_event_paranoid

General report

General reports show direction in which to continue
- Specialized tools may be necessary

How Does Performance Relate to Hardware?

ToDo: Connect Hardware to Performance Measurements

Introduce hardware on the same level of detail and with the same terms as the performance reports by ClusterCockpit, LinaroForge, etc., as soon as they appear. Only introduce what we need, to avoid info dump. But point to additional information that gives a complete overview -> hpc-wiki!

(Following this structure throughout the course, trying to understand the performance in these terms)

Broad dimensions of performance:

CPU (Front- and Backend, FLOPS)
- Frontend: decoding instructions, branch prediction, pipeline
- Backend: getting data from memory, cache hierarchy & alignment
- Raw calculations
- Vectorization
- Out-of-order execution
Accelerators (e.g. GPUs)
- More calculations
- Offloading
- Memory & communication models
Memory (data hierarchy)
- Working memory, reading data from/to disk
- Bandwidth of data
I/O (broader data hierarchy: disk, network)
- Stored data
- Local disk (caching)
- Parallel fs (cluster-wide)
- MPI-Communiction
Parallel timeline (synchronization, etc.)
- Application logic

ToDo: Clarify relation to hardware in this course

Maybe we should either focus on components (CPUs, memory, disk, accelerators, network cards) or functional entities (compute, data hierarchy, bandwidth, latency, parallel timelines)

We shouldn’t go into too much detail here. Define broad categories where performance can be good or bad. (calculations, data transfers, application logic, research objective (is the calculation meaningful?))

Reuse categories in the same order and fashion throughout the course, i.e. point out in what area a discovered inefficiency occurs.

Introduce detail about hardware later where it is needed, e.g. NUMA for pinning and hints.

Challenge

Exercise: Match application behavior to hardware

Which part of the computer hardware may become an issue for the following application patterns:

Calculating matrix multiplications
Reading data from processes on other computers
Calling many different functions from many equally likely if/else branches
Writing very large files (TB)
Comparing many different strings if they match
Constructing a large simulation model
Reading thousands of small files for each iteration

Maybe not the best questions, also missing something for accelerators.

Show me the solution

CPU (FLOPS) and/or Parallel timeline
I/O (network)
CPU (Front-End)
I/O (disk)
(?) CPU-Backend, getting strings through the cache?
Memory (size)
I/O (disk)

Summary

Leading question: Connection to hardware is quite deep, why does it matter? -> Drill deeper, e.g. on NUMA & pinning

Key Points

First things first, second things second, …
Profiling, tracing
Sampling, summation
Different HPC centers may provide different approaches to this workflow
Performance reports offer more insight into the job and application behavior

Content from Pinning

Last updated on 2025-09-24 | Edit this page

Estimated time: 10 minutes

Overview

Questions

What is “pinning” of job resources?
How can pinning improve the performance?
How can I see, if pinning resources would help?
What requirement hints can I give to the scheduler?

Objectives

After completing this episode, participants should be able to …

Define the concept of “pinning” and how it can affect job performance.
Name Slurms options for memory- and cpu- binding.
Use hints to tell Slurm how to optimize their job allocation.

Intention: Go deeper in performance and hardware relationship

Narrative:

We get the feeling, that hardware has a lot to offer, but the rabbit hole is deep!
What are the “dimensions” in which we can optimize the throughput of snowman pictures per hour?
Can we improve how the work maps to certain CPUs / Memory regions?

What we’re doing here:

Introduce pinning and slurm hint options
Relate to hardware effects
Use third party performance tools to observe effects!

ToDo: Extract episode about pinning

Stick to simple options here. Put more complex options for pinning / hints, etc. into its own episode somewhere later in the course

Pinning is an important part of job optimization, but requires some knowledge, e.g. about the hardware hierarchies in a cluster, NUMA, etc. So it should be done after we’ve introduced different performance reports and their perspective on hardware

Maybe point to JSC pinning simulator and have similar diagrams as an independent “offline” version in this course

Binding / pinning:

--mem-bind=[{quiet|verbose},]<type>
-m, --distribution={*|block|cyclic|arbitrary|plane=<size>}[:{*|block|cyclic|fcyclic}[:{*|block|cyclic|fcyclic}]][,{Pack|NoPack}]
--hint=: Hints for CPU- (compute_bound) and memory-bound (memory_bound), but also multithread, nomultithread
--cpu-bind=[{quiet|verbose},]<type> (srun)
Mapping of application <-> job resources

Why what how?

Summary

Leading question: Pinning is very specific, but was it really limiting the performance of out application? How can I identify the biggest issue?

Key Points

Content from How to identify a bottleneck?

Last updated on 2025-09-24 | Edit this page

Estimated time: 10 minutes

Overview

Questions

How can I find the bottlenecks in a given job?
What are common workflows to evaluate performance?
What are some common types of bottlenecks?

Objectives

After completing this episode, participants should be able to …

Choose between multiple workflows to evaluate job performance.
Name typical performance issues.
Determine if their job is affected by one of these issues.

Intention: Uncover one or two issues in the application

Narrative:

Okay, what’s slowest with creating snowman pictures?
Where does our system choke?

What we’re doing here:

What’s a bottleneck?
How can we identify a bottleneck?
“Online” and “after the fact” workflows of performance measurements (trace, accumulated results, attached to the process (live), or after it ran)
Point to additional resources of common performance/bottleneck issues, e.g. on hpc-wiki

Maybe something like this already occurred before in 4. Scaling Study, or 5. Performance Overview

How to identify a bottleneck?

Summary

Leading question: We were looking at a standard configuration with CPU, Memory, Disks, Network, so far. What about GPU applications, which are very common these days?

Key Points

General advice on the workflow
Performance reports may provide an automated summary with recommendations
Performance metrics can be categorized by the underlying hardware, e.g. CPU, memory, I/O, accelerators.
Bottlenecks can appear by metrics being saturated at the physical limits of the hardware or indirectly by other metrics being far from what the physical limits are.
Interpreting bottlenecks is closely related to what the application is supposed to do.
Relative measurements (baseline vs. change)
- system is quiescent, fixed CPU freq + affinity, warmups, …
- Reproducibility -> link to git course?
Scanning results for smoking guns
Any best practices etc.

Content from Performance of Accelerators

Last updated on 2025-09-24 | Edit this page

Estimated time: 10 minutes

Overview

Questions

What are accelerators?
How do they affect my jobs performance?
How can I measure accelerator utilization?

Objectives

After completing this episode, participants should be able to …

Understand difference of performance measurements on accelerators (GPUs, FPGAs) to CPUs.
Understand how batch systems and performance measurements tools treat accelerators.

Intention: Jump onto accelerator with the example application

Narrative:

The deadline is creeping up, only few ways to go!
Hey, we have a GPU partition! Maybe this will help us speed up the process!

What we’re doing here:

What changes?
New metrics
Transfer to/from accelerator
Different options/requirements to scheduler & performance measurement tools

Introduction

Run the same example workload on GPU and compare.

ToDo

Don’t mention FPGAs too much, maybe just a node what accelerators could be, besides GPU. Goal is to keep it simple and accessible, focus on what’s common in most HPC systems these days

ToDo

Explain how to decide where to run something. CPU vs. small GPU vs. high-end GPUs. Touches on transfer overhead etc.

Summary

Leading question: Performance optimization is a deep topic and we are not done learning. How could I continue exploring the topic?

Key Points

Tools to measure GPU/FPGA performance of a job
Common symptoms of GPU/FPGA problems

Content from Next Steps

Last updated on 2025-09-24 | Edit this page

Estimated time: 10 minutes

Overview

Questions

What are other patterns of performance bottlenecks?
How to evaluate an application in more detail?

Objectives

After completing this episode, participants should be able to …

Find collection of performance patterns on hpc-wiki.info
Identify next steps to take with regard to performance optimization.

Intention: Provide a roadmap learners could follow

Narrative:

Start with picture of beautiful title slide of the talk with the snowman picture
Next time we want to tackle the issue way in advance
Approach our raytracing application more systematically, such that we can get the title slide done much quicker
What could we do to dive deeper in optimizing the raytracer?
Where can we go from here?

What we’re doing here:

Learning important programming concepts (parallel programming on many levels)
Deeper application profiling & tools to use

Next Steps

hpc-wiki.info - I/O - CPU Front End - CPU Back End - Memory leak - Oversubscription - Underutilization

Summary

Key Points

There are many profilers, some are language-specific, others are vendor-related, …
Simple profile with exclusive resources
Repeated measurements for reliability

Overview

Questions

Objectives

Intention: Step into the narrative

Setting the Baseline

Exercise: Baseline Measurement with time

Show me the solution

Why Care About Performance?

Exercise: Why care about performance?

Show me the solution

Core-h and Energy

Exercise: Core-h and Energy consumption

Show me the solution

What is Efficient?

Challenge: Many perspectives on Efficiency

Give me a hint

Show me the solution

Discussion: Which definition should we take?

Summary

Exercise: Recollecting efficiency

Overview

Questions

Objectives

Objective: Vary one parameter and compare to baseline

Starting Somewhere

Exercise: Starting Somewhere

Compared to the HPC System

Exercise: Comparing to the system

Requesting Resources

ToDo

Changing requirements

Exercise: Changing requirements

Summary

Discussion: Recollection

Overview

Questions

Objectives

Intention: Introduce more basic performance metrics

Scheduler Tools

Shortcomings

ToDo

Summary

Overview

Questions

Objectives

Intention: Introduce/Recollect concept of Speedup and do a simple scaling study

What do we look at?

Discussion: What dimensions can we look at?

Show me the solution

Exercise: Factors effecting scaling

Parameter Scan

ToDo: Advanced details in pinning

Exercise: Run the Example with different -n

Analyzing results

Exercise: Plot the scaling

Summary

ToDo

Overview

Questions

Objectives

Intention: Introduce third party tools for performance reports

Workflow

Tools

General report

How Does Performance Relate to Hardware?

ToDo: Connect Hardware to Performance Measurements

ToDo: Clarify relation to hardware in this course

Exercise: Match application behavior to hardware

Show me the solution

Summary

Overview

Questions

Objectives

Intention: Go deeper in performance and hardware relationship

ToDo: Extract episode about pinning

Why what how?

Summary

Overview

Questions

Objectives

Exercise: Baseline Measurement with `time`