Slurm 20.11 notes

Slurm 20.11 changes

In Slurm 20.11 some changes have been made by SchedMD that we know will affect some NSC users. As far as NSC has been able to determine, these changes are intentional and permanent, i.e we now need to update our jobs to how Slurm works in the new version.

The semantics of srun has changed, and if you use srun today to launch job steps from within a job you may need to change your job scripts to make them work with Slurm 20.11.

If you use mpprun or mpiexec.hydra to launch an MPI job and do not use srun in your job script for anything else you are not affected by this change.

If your jobs only run on a single node and does not use srun you are not affected by this change.

In most cases, jobs using srun will still run, but fewer job steps will concurrently, so you will lose performance. Often, this results in only one CPU core being used on each node (a slowdown of 97%).

For an application that uses mpprun to launch the main application and then use srun to start e.g a monitoring task on each node, the monitoring task might not start or the monitoring task might start before the main application and then block it from starting or from using all CPU cores.

Summary: if your application does not work, or runs slower, after the upgrade to Slurm 20.11 you are probably affected by this change, and will need to modify your job. In this case, please read the rest of this page for hints on how to modify your job.

If you need help modifying your jobs to work with Slurm 20.11, please contact NSC Support

Example 1: using srun to schedule tasks to fill all available CPUs

In earlier Slurm versions, this would work as expected (run 64 concurrent tasks on the two assigned nodes until all 256 tasks have completed):

#!/bin/bash
#SBATCH -N2 --exclusive
#
for task in $(seq 1 256); do
    srun -n1 -N1 --exclusive /somewhere/myapp $task &
done
wait

With Slurm 20.11, the above script will only run two concurrent tasks (one on each node), leaving 62 of the 64 allocated CPUs idle!

With Slurm 20.11, you can instead do:

#!/bin/bash
#SBATCH -N2 --exclusive
#
for task in $(seq 1 256); do
    srun -n1 -N1 --exact /somewhere/myapp $task &
done
wait

Another option is to skip srun and use parallel and jobsh:

#!/bin/bash
#SBATCH -N2 --exclusive
#
module load parallel/20181122-nsc1
seq 1 256 | parallel --ssh=jobsh -S $(hostlist -e -s',' -d  -p "$SLURM_CPUS_ON_NODE/" $SLURM_JOB_NODELIST) /somewhere/myapp {}

In this example we use GNU Parallel and ask it to run as many tasks per node as there are CPU cores on the node ($SLURM_CPUS_ON_NODE).

Example 2: using srun to launch tasks that should not be limited to the available CPUs

Sometimes we want to launch a monitoring task, a debugger or something similar on all nodes in a job, but we don’t want CPUs to be allocated to those tasks and unavailable to the real application.

To do this, you can either use jobsh (which is designed to mimic ssh as far as possible while still using Slurm internally) or srun. If you use srun you need to use certain options to ensure that it does not attempt to allocate CPU or memory for the task.

Example 1: use jobsh and loop over all nodes in the job

#!/bin/bash
#SBATCH -N2 --exclusive

# Start one instance of monitorapp per node in the job, but
# allocate no resources.
for node in $(hostlist -e "$SLURM_JOB_NODELIST"); do
   jobsh $node /somepath/monitorapp &
done

# Start the main application
mpprun /somepath/myapp

Example 2: use srun (with the same options jobsh would use) to launch one task per node in the job

#!/bin/bash
#SBATCH -N2 --exclusive

# Start one instance of monitorapp per node in the job, but
# allocate no resources.
    srun --whole --mem-per-cpu=0 /somepath/monitorapp

# Start the main application
mpprun /somepath/myapp

Using srun to launch an MPI application

If you use srun to launch your main MPI application, you should probably switch to mpprun or mpiexec.hydra instead. Contact NSC Support for more information.

Slurm 20.11 improvements

It’s not all bad news… Slurm 20.11 also fixes various bugs, especially one that sometimes prevented GUI windows from being displayed when run on a compute node.

We also need to run a supported version to get security fixes, so staying at Slurm 20.02 long-term is not an option.

Slurm 20.11 changes

Example 1: using srun to schedule tasks to fill all available CPUs

Example 2: using srun to launch tasks that should not be limited to the available CPUs

Using srun to launch an MPI application

Slurm 20.11 improvements

User support

Getting access

Everything OK!

Self-service