Running parallel jobs with mpprun

Introduction

mpprun is a NSC provided tool for launching MPI parallel jobs. By default mpprun is available in the path when you log in to NSC clusters. The main benefit of mpprun is that it can load correct MPI runtime libraries. Many HPC programs are built with lots of different libraries, e.g., MKL, netcdf, hdf5 etc. apart from MPI libraries. Exporting correct library paths at the time of running the job is often cumbersome and prone to errors. Use of mpprun minimizes such errors. mpprun also ensures that the job is run in an optimal manner in terms of spawning and binding of MPI ranks and threads. mpprun is also integrated with the parallel debugger called DDT. Using mpprun it is much easier to launch DDT debugger.

At NSC we recommend to use mpprun instead of mpirun or similar to start an MPI job. In this step-by-step tutorial we will show how to use mpprun for launching parallel jobs and also how to launch DDT debugger with mpprun.

Log in to NSC cluster

From a Mac or Linux machine, start by opening a terminal window and initiating a connection with the ssh program. On a Windows machine, one can use a program like “PuTTY” to connect using ssh. We will use NSC cluster tetralith for this tutorial. But the examples can be run on other NSC resources as well. To connect to tetralith run the following command from the local machine.

$ ssh -X x_username@tetralith.nsc.liu.se

The -X flag in the ssh command is to enable X11 forwarding in the SSH client. The X11 forwarding is needed for running graphical applications such as DDT. An alternative and more efficient way is to connect via the ThinLinc client. Please see more details about ThinLinc based login here:

Running graphical applications using ThinLinc

After logging to the cluster run the following command:

$ module list mpprun

Currently Loaded Modules:
    1) mpprun/4.2.1

Here it shows the mpprun/4.2.1 module is loaded by default. The exact version number of mpprun can change as we try to upgrade mpprun with bug fixes, feature upgrades, etc. Next check if mpprun is in the path. To do this run this command:

$ type mpprun
mpprun is /software/sse/manual/mpprun/4.2.1/bin/mpprun

It can be seen above mpprun is already present in the path.

Building code for mpprun

To run a job with mpprun one needs to first compile the code according to NSC recommendation. This means loading NSC provided build environments and compiling codes with NSC provided compilers. We will use the following module for this tutorial:

$ module load buildenv-intel/2018.u1-bare

After this we can compile MPI code. In this tutorial we will use a simple Fortran code. To do the tests we create a run directory running the test.

$ export MPPRUNTEST=/home/${USER}/mpprun_tutorial/$(date +%d%m%y)    
$ mkdir -p $MPPRUNTEST

One can choose some other suitable folder names instead of the above if one wants to. Next, we check compute accounting information

$ projinfo $USER

This will show compute project account name. We use that to set up the following:

$ export ACCOUNT_NAME=project_account_name    ## Add your project's account name here

Now we download the sample code to be used for this tutorial. This code is a MPI and OpenMP parallel toy code for Jacobi solver.

$ cd $MPPRUNTEST
$ wget -N https://www.nsc.liu.se/support/tutorials/mpprun/jacobi_mpiomp.F90

To compile the code run:

$ mpiifort jacobi_mpiomp.F90 -o jacobi_mpi
$ mpiifort -qopenmp jacobi_mpiomp.F90 -o jacobi_mpiomp

In the above the first command compiles the code for MPI only run. The second command compiles the code for hybrid MPI-OpenMP run. After the compilation, the binary files jacobi_mpi and jacobi_mpiomp will be created. Next we run the below command to extract build information from the binary:

$ dumptag jacobi_mpi           

File name $MPPRUNTEST/jacobi_mpi
NSC tags ---------------------------------
Build date		220214
Build time		155014
Built with MPI	impi 2018_1_163__bare
Linked with		intel 2018.u1__bare
Tag version		6
------------------------------------------

and

$ dumptag jacobi_mpiomp

File name $MPPRUNTEST/jacobi_mpiomp
NSC tags ---------------------------------
Build date		220214
Build time		155043
Built with MPI		impi 2018_1_163__bare
Linked with		intel 2018.u1__bare
Tag version		6
------------------------------------------

The dumptag command shown above is a NSC provided utility to show some useful information about the binary as can be seen above.mpprun uses this information to launch the binary in a correct way. For some binary if dumptag does not show information similar to that above then most probably the binary was not compiled with NSC provided modules. If the binary does not show MPI information, then such a binary cannot be run with mpprun.

Running jobs with mpprun

For running jobs with mpprun we do not need to have correct compiler and MPI modules loaded. Hence, let us first remove the compiler and MPI modules from the current environment by:

$ module unload buildenv-intel/2018.u1-bare

For launching a job we have to choose suitable SLURM allocation options and commands depending on the type of job as will be explained in the following examples.

Example 1: Running a pure MPI job

Download the batch script:

$ ### example 1.1 ###
$ cd $MPPRUNTEST
$ wget -N https://www.nsc.liu.se/support/tutorials/mpprun/jacobi_mpi.sh
$ chmod +x jacobi_mpi.sh
$ cat jacobi_mpi.sh 
#!/bin/bash

#SBATCH -J jacobi_mpi
#SBATCH -t 00:05:00
#SBATCH -n 32
#SBATCH -o out_jacobi_mpi
#SBATCH -e err_jacobi_mpi

mpprun jacobi_mpi

In the above script it can be seen that mpprun does not require the number of ranks to be specified with thempprun command line because mpprun picks it up from the SLURM environment. Then run this script with:

$ sbatch -A $ACCOUNT_NAME jacobi_mpi.sh
Submitted batch job ........

This will start the job on 32 MPI ranks when the resource is allocated. The output can be seen in the file out_jacobi_mpi. Errors if any can be seen in the file err_jacobi_mpi. There can be some variations of the above script which will also work.

$ ### example 1.2 ###
$ cd $MPPRUNTEST
$ wget -N https://www.nsc.liu.se/support/tutorials/mpprun/jacobi_mpi_1.sh
$ cat jacobi_mpi_1.sh
#!/bin/bash

#SBATCH -J jacobi_mpi_1
#SBATCH -t 00:05:00
#SBATCH -N 1
#SBATCH -o out_jacobi_mpi_1
#SBATCH -e err_jacobi_mpi_1

mpprun jacobi_mpi

$ sbatch -A $ACCOUNT_NAME jacobi_mpi_1.sh
Submitted batch job ........

In this example script #SBATCH -N 1 will allocate a full node. If a node contains 32 CPU cores then, the scipts jacobi_mpi.shand jacobi_mpi_1.sh are “almost” equivalent. However, SLURM treats the two cases slightly differently and we consider it is better to use #SBATCH -n xxx than #SBATCH -N YYY.

The number of ranks can be specified at mpprun command line if desired. This is shown in the example given below:

$ ### example 1.3 ###
$ cd $MPPRUNTEST
$ wget -N https://www.nsc.liu.se/support/tutorials/mpprun/jacobi_mpi_2.sh
cat jacobi_mpi_2.sh 
#!/bin/bash

#SBATCH -J jacobi_mpi_2
#SBATCH -t 00:05:00
#SBATCH -n 16
#SBATCH -o out_jacobi_mpi_2
#SBATCH -e err_jacobi_mpi_2

mpprun -n 16 jacobi_mpi

$ sbatch -A $ACCOUNT_NAME jacobi_mpi_2.sh
Submitted batch job ........

Note that in the above example the number of ranks specified to mpprun matches the number of ranks allocated by SLURM. It is also possible to choose other values of -n in the mpprun command line. But such job spawning may not use the resources most optimally.

Example 2: Running a MPI + OpenMP hybrid job

For hybrid jobs, we need to tell the SLURM that we need both ranks and threads. This is illustrated in the example below:

$ ### example 2 ###
$ cd $MPPRUNTEST
$ wget -N https://www.nsc.liu.se/support/tutorials/mpprun/jacobi_mpiomp.sh
$ cat jacobi_mpiomp.sh
#!/bin/bash

#SBATCH -J jacobi_mpiomp
#SBATCH -t 00:05:00
#SBATCH -n 8	       ## allocate 4 MPI ranks
#SBATCH -c 4	       ## allocate 4 threads/rank
#SBATCH -o out_jacobi_mpiomp
#SBATCH -e err_jacobi_mpiomp

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK   ## explicitly set this
mpprun jacobi_mpiomp

In the above example OMP_NUM_THREADS is set to 4. To run this example use:

$ sbatch -A $ACCOUNT_NAME jacobi_mpiomp.sh
Submitted batch job ........

In the example above 8 MPI ranks will be created. Each rank will run 4 threads.

Example 3: Debugging a parallel program with mpprun

Using mpprun we can easily launch DDT debugger tool. For debugging a program we need to first compile the code with -g option. We will use our example jacobi code.

$ cd $MPPRUNTEST
$ module load buildenv-intel/2018.u1-bare
$ mpif90 -g jacobi_mpiomp.F90 -o jacobi_mpi_dbg
$ module unload buildenv-intel/2018.u1-bare

In mpprun we only support interactive debugging. For using debugger via mpprun we will first allocate an interactive node

$ interactive -t 01:00:00 -n 32 -A $ACCOUNT_NAME
salloc: Granted job allocation ......
srun: Job step created

Once the allocation is made we can launch the DDT debugger as shown below:

$ module load allinea-DDT/21.0.2              ## Check and choose from DDT modules available
$ mpprun -ddt jacobi_mpi_dbg

The above commands will launch the DDT GUI. Please keep in mind that -ddt flag works only in the interactive environment and does not work in the batch mode.

Within DDT GUI we can now run/debug the job. In this tutorial, we will not attempt to teach how to use different features of DDT. More details about DDT is available in the user manual.

DDT User guide

Example 4: Pass miscelleneous flags to mpprun

The complete list of mpprun flags which can be seen by:

$ mpprun -h

Some of the flags are developmental in nature and can not be tried by users in general. One flag that is of particular significance is --pass flag. If one wants to pass MPI distribution specific flag to MPI job then that can be done via the --pass flag. An example is shown here.

$ cd $MPPRUNTEST
$ wget -N https://www.nsc.liu.se/support/tutorials/mpprun/jacobi_mpi_pass-flag.sh
$ cat jacobi_mpi_pass-flag.sh
#!/bin/bash

#SBATCH -J jacobi_mpi_pass-flag
#SBATCH -t 00:05:00
#SBATCH -n 16
#SBATCH -o out_jacobi_mpi_pass-flag
#SBATCH -e err_jacobi_mpi_pass-flag

## this will fail ###
mpprun -print-rank-map jacobi_mpi

## this will run ###
mpprun --pass="-print-rank-map" jacobi_mpi

## this will run: pass two flags ###
mpprun --pass="-print-rank-map -prepend-rank" jacobi_mpi

In the above example script -print-rank-map is a flag specific to Intel MPI. It prints the rank map for the job. However -print-rank-map is not a mpprun flag. Hence, if this flag is passed directly to mpprun then the job will fail. However if it is passed through --pass="-print-rank-map" then mpprun will pass it to the underlying Intel MPI job launcher. Run the job as usual:

$ sbatch -A $ACCOUNT_NAME jacobi_mpi_pass-flag.sh
Submitted batch job ........

After the job is over check the output file out_jacobi_mpi_pass-flag.

Conclusion

After completing this tutorial one should be able to use mpprun to submit, debug jobs. Many of the NSC compiled binaries are mpprun compatible. Using mpprun will increase the chances of better utilization of resources. Another advantage of mpprun is that logs written by mpprun helps NSC support team in debugging job erros.