Running parallel jobs with mpprun

Introduction

mpprun is a NSC provided tool for launching MPI parallel jobs. The mpprun command path is added by a default module, which is loaded when you log in to NSC clusters. The main benefit of mpprun is that it can load correct libraries at run time. Many HPC programs are built with lots of different libraries, e.g., MKL, netcdf, hdf5 etc. apart from MPI libraries. Exporting correct library paths at the time of running the binary is cumbersome and can lead to errors that is hard to debug. Use of mpprun ensures that correct libraries are loaded at the time of running the job. mpprun also ensures that the job is run in optimal manner in terms of spawning of MPI ranks and threads, binding of MPI ranks and threads etc., among the available CPU cores.

mpprun is also integrated with a number parallel debugger and profiler tools. mpprun makes it easier for you to run parallel debuggers like DDT or TotalView and parallel profiler like perf-report. The usage of these tools, however, depend on the availability of the corresponding software on the specific system.

At NSC we strongly recommend that you use mpprun instead of mpirun or similar to start an MPI job. This tutorial will get you started with mpprun. In this step-by-step tutorial we will show you how to use mpprun for launching parallel jobs and also how to launch profilers or debugers using mpprun. To take the best advantage of this tutorial run the commands as shown in the code blocks.

Step 1: Log in to NSC cluster

The first step is to login to a NSC cluster. From a Mac or Linux machine, you start by opening a terminal window and initiating a connection with the ssh program. On Windows, you could use a program like "PuTTY" to connect using ssh. We will use NSC cluster triolith for this tutorial. But the examples can be run on other NSC resources as well. To connect to Triolith run the following command from your local machine.

$ ssh -X x_username@triolith.nsc.liu.se

The -X flag in the ssh command is to enable X11 forwarding in your SSH client. The X11 forwarding is needed for running graphical applications. An alternative and more efficient way to run graphical applications on the NSC clusters is via the ThinLinc client. Please see more details about ThinLinc based login here:

Running graphical applications using ThinLinc

After logging to the cluster run the following command:

$ module list
Currently Loaded Modulefiles:
base-config/1
snic/2
triolith/1
no-compilers-loaded/1
dotmodules/1
mpprun/2.2.8
mc/4.8.1.6
nsc-default

Here the mpprun/2.2.8 module is loaded by default. The default version of mpprun module can change as we try to upgrade mpprun with bug fixes, feature upgrades, etc., Next step is to see that mpprun is in your path. To do this run this command:

$ which mpprun
/software/apps/mpprun/2.2.8/mpprun

As you can see that mpprun is already present in your path.

Step 2: Building code for mpprun

To run a job with mpprun you need to first compile the code according to NSC recommendation. For most cases this means loading NSC provided modules, e.g., compilers and MPI modules and then compile as usual. We will use the following modules for this tutorial:

$ module load intel/16.0.1 impi/5.1.2
$ module list 
Currently Loaded Modulefiles:
base-config/1
snic/2
triolith/1
dotmodules/1
mpprun/2.2.8
mc/4.8.1.6
nsc-default
intel/16.0.1
impi/5.1.2

Now Intel compiler 16.0.1 and Intel MPI 5.1.2 is loaded in your environment. After loading the modules, the Intel compilers and the MPI compiler wrappers are available in the path as can be seen from:

$ which ifort   ## fortran compiler
/software/apps/comp_wrapper/intel/ifort

and

$ which mpif90  ## MPI wrapper for fortran
/software/apps/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin/mpif90

It is important to note that in the example given above mpif90 is actually attached to the ifort compiler which can be seen from:

 $ mpif90 -v
 mpif90 for the Intel(R) MPI Library 5.1.2 for Linux*
 Copyright(C) 2003-2015, Intel Corporation.  All rights reserved.
 ifort version 16.0.1

Now you can compile your MPI code. For compiling FORTRAN code, you can either use mpif90/mpif77 or ifort -Nmpi. For understanding different features of mpprun we will use a simple test code. To do the tests prepare a folder for running the code

$ export MPPRUNTEST=/home/${USER}/mpprun_tutorial/$(date +%d%m%y)    
$ mkdir -p $MPPRUNTEST

You can choose some other suitable folder names instead of the above if you want to. Next, you need to check your project's account name and use that for setting up your allocation

$ projinfo $USER

This will show your project's account name. Use that to set up the following:

$ export ACCOUNT_NAME=project_account_name    ## Add your project's account name here

You can now download the sample code to be used for this tutorial. This code is a MPI and OpenMP parallel toy code for Jacobi solver.

$ cd $MPPRUNTEST
$ wget -N https://www.nsc.liu.se/support/tutorials/mpprun/jacobi_mpiomp.F90

To compile the code run:

$ mpif90 jacobi_mpiomp.F90 -o jacobi_mpi
$ mpif90 -openmp jacobi_mpiomp.F90 -o jacobi_mpiomp

In the above the first command compiles the code for MPI only run. The second command compiles the code for hybrid MPI-OpenMP run. After the compilation, the binary files jacobi_mpi and jacobi_mpiomp will be created. To see that the binaries are mpprun compatible, run the following command:

$ dumptag jacobi_mpi           

File name $MPPRUNTEST/jacobi_mpi
NSC tags ---------------------------------
Build date        160315
Build time        145000
Built with MPI    impi 5_1_2
Linked with       ifort 16_0_1
Tag version       5
------------------------------------------

and

$ dumptag jacobi_mpiomp

File name $MPPRUNTEST/jacobi_mpiomp
NSC tags ---------------------------------
Build date        160315
Build time        145000
Built with MPI    impi 5_1_2
Linked with       ifort 16_0_1
Tag version       5
------------------------------------------

The dumptag command shown above is a NSC provided utility to show some useful information about the binary as can be seen above.mpprun uses this information to launch the binary in a correct way. For some other binary if dumptag does not show information similar to above then most probably the binary was not compiled with NSC provided modules. If the binary does not show MPI information, then such a binary cannot be run with mpprun.

Step 3: Running jobs with mpprun

For running jobs with mpprun you do not need to have correct compiler and MPI modules loaded. Hence, let us first remove the compiler and MPI modules from the current environment by:

$ module unload intel/16.0.1 impi/5.1.2
$ module list
Currently Loaded Modulefiles:
base-config/1
snic/2
triolith/1
dotmodules/1
mpprun/2.2.8
mc/4.8.1.6
nsc-default
no-compilers-loaded/1

After unloading the modules you cannot find or mpif90 or mpirun in your path. For launching a job you have to choose suitable SLURM allocation options and commands depending on the type of job as will be explained in the following examples.

Example 1: Running a pure MPI job

Download the batch script:

$ ### example 1.1 ###
$ cd $MPPRUNTEST
$ wget -N https://www.nsc.liu.se/support/tutorials/mpprun/jacobi_mpi.sh
$ chmod +x jacobi_mpi.sh
$ cat jacobi_mpi.sh 
#!/bin/bash

#SBATCH -J jacobi_mpi
#SBATCH -t 00:05:00
#SBATCH -n 16
#SBATCH -o out_jacobi_mpi
#SBATCH -e err_jacobi_mpi

mpprun jacobi_mpi

In the above script it can be seen that mpprun does not require the number of ranks to be specified at thempprun command line. To run this script use the sbatch command:

$ sbatch -A $ACCOUNT_NAME jacobi_mpi.sh
Submitted batch job ........

This will start the job on 16 MPI ranks when the resource is allocated. The output can be seen in the file out_jacobi_mpi. Errors if any can be seen in the file err_jacobi_mpi. There can be some variations of the above script which will also work.

$ ### example 1.2 ###
$ cd $MPPRUNTEST
$ wget -N https://www.nsc.liu.se/support/tutorials/mpprun/jacobi_mpi_1.sh
$ cat jacobi_mpi_1.sh
#!/bin/bash

#SBATCH -J jacobi_mpi_1
#SBATCH -t 00:05:00
#SBATCH -N 1
#SBATCH -o out_jacobi_mpi_1
#SBATCH -e err_jacobi_mpi_1

mpprun jacobi_mpi

$ sbatch -A $ACCOUNT_NAME jacobi_mpi_1.sh
Submitted batch job ........

In this example script #SBATCH -N 1 will allocate a full node. If a node contains 16 CPU cores then, the scipts jacobi_mpi.shand jacobi_mpi_1.sh are "almost" equivalent. However, SLURM treats the two cases slightly differently and we consider it is better to use #SBATCH -n xxx than #SBATCH -N YYY.

The number of ranks can be specified at mpprun command line if desired. This is shown in the example given below:

$ ### example 1.3 ###
$ cd $MPPRUNTEST
$ wget -N https://www.nsc.liu.se/support/tutorials/mpprun/jacobi_mpi_2.sh
cat jacobi_mpi_2.sh 
#!/bin/bash

#SBATCH -J jacobi_mpi_2
#SBATCH -t 00:05:00
#SBATCH -n 16
#SBATCH -o out_jacobi_mpi_2
#SBATCH -e err_jacobi_mpi_2

mpprun -n 16 jacobi_mpi

$ sbatch -A $ACCOUNT_NAME jacobi_mpi_2.sh
Submitted batch job ........

It is also possible to choose other values of -n at the mpprun command line. But such job spawning may not use the resources most optimally.

Example 2: Running a MPI + OpenMP hybrid job

As mentioned before it is important to choose suitable SLURM commands depending on the type of job. For hybrid jobs, you need to tell the SLURM that you need both ranks and threads. This is illustrated in the example below:

$ ### example 2 ###
$ cd $MPPRUNTEST
$ wget -N https://www.nsc.liu.se/support/tutorials/mpprun/jacobi_mpiomp.sh
$ cat jacobi_mpiomp.sh
#!/bin/bash

#SBATCH -J jacobi_mpiomp
#SBATCH -t 00:05:00
#SBATCH -n 4           ## allocate 4 MPI ranks
#SBATCH -c 4           ## allocate 4 threads/rank
#SBATCH -o out_jacobi_mpiomp
#SBATCH -e err_jacobi_mpiomp

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK   ## you have to explicitly set this
mpprun jacobi_mpiomp

In the above example OMP_NUM_THREADS is set to 4. To run this example use:

$ sbatch -A $ACCOUNT_NAME jacobi_mpiomp.sh
Submitted batch job ........

In the example above 4 MPI ranks will be created. Each rank will run 4 threads.

Example 3: Debugging a parallel program with mpprun

Using mpprun you can easily launch third party debugger tools like Allinea the DDT or Rogue Wave TotalView. mpprun makes it easy for you to get started with parallel debuggers.

For debugging a program you need to first compile the code with -g option. We will use our example jacobi code.

$ cd $MPPRUNTEST
$ module load intel/16.0.1 impi/5.1.2
$ mpif90 -g jacobi_mpiomp.F90 -o jacobi_mpi_dbg
$ module unload intel/16.0.1 impi/5.1.2

In debugging sessions it is more useful to debug the code interactively. In mpprun we only support interactive debugging. For using debugger via mpprun you will first allocate an interactive node

$ interactive -t 01:00:00 -n 16 -A $ACCOUNT_NAME
salloc: Granted job allocation ......
srun: Job step created

Once the allocation is made you can launch the DDT debugger as shown below:

$ mpprun -ddt jacobi_mpi_dbg

Similarly you can launch the TotalView debugger by:

$ mpprun -tv jacobi_mpi_dbg

The above commands will launch the DDT or TotalView GUI depending respectively on -ddt or -tv flag. You have to keep in mind that -ddt and -tv flags work only in the interactive environment and does not work in the offline mode. Hence, these flags will not work in a job script and will result in some error.

Within DDT/TotalView GUI you can now run/debug the job. In this tutorial, we will not attempt to teach you how to use different features of DDT/TotalView. For more details about these debuggers please see respective user manuals.

Totalview User guide

DDT User guide

Example 4: Profiling a parallel program with perf-report

Allinea perf-report is an utility for profiling MPI applications. Using this utility you can see the performance of your code. This utility can tell you whether your code is using the resources optimally. To profile a code you first need to compile the code. In this case, do not use the -g option as it can slow down the profiling of the code.

$ cd $MPPRUNTEST
$ module load intel/16.0.1 impi/5.1.2
$ mpif90 jacobi_mpiomp.F90 -o jacobi_mpi_pr
$ module unload intel/16.0.1 impi/5.1.2

Next download the jobscript

$ wget -N https://www.nsc.liu.se/support/tutorials/mpprun/jacobi_mpi_pr.sh
$ cat jacobi_mpi_pr.sh
#!/bin/bash

#SBATCH -J jacobi_mpi_pr
#SBATCH -t 00:05:00
#SBATCH -n 16
#SBATCH -o out_jacobi_mpi_pr
#SBATCH -e err_jacobi_mpi_pr

mpprun -pr jacobi_mpi_pr

The -pr flag in the mpprun invokes the perf-report. The job can be submitted to the queue by

$ sbatch -A $ACCOUNT_NAME jacobi_mpi_pr.sh
Submitted batch job ........

After the job is over, the profiles will be written to two files:

$ ls *.txt *.html
9315227_16p_1t_2016-03-23_14-51.html  9315227_16p_1t_2016-03-23_14-51.txt

In the above file names 9315227 is SLURM job id, 16p refers to the number of processes, 1t refers to the number of threads, and rest is the date and time. The .txt file contains the profile information in text format and .html is the html file. The html file can be opened in the cluster:

$ konqueror *.html

Or it can be copied back to your local machine and viewed there using some other browser. An example profile of the Jacobi code launched via jacobi_mpi_pr.sh is shown here:

perf-report profile of the example code

For more details about perf-report see the perf-report user guide

Example 5: Pass miscelleneous flags to mpprun

mpprun supports a few number of flags which can be seen by:

$ mpprun -h

These flags are applicable to any MPI distribution, e.g., Intel MPI, OpenMPI etc. If you want to pass MPI distribution specific flag to your job then that can be done via the --pass flag. An example is shown here.

$ cd $MPPRUNTEST
$ wget -N https://www.nsc.liu.se/support/tutorials/mpprun/jacobi_mpi_pass-flag.sh
$ cat jacobi_mpi_pass-flag.sh
#!/bin/bash

#SBATCH -J jacobi_mpi_pass-flag
#SBATCH -t 00:05:00
#SBATCH -n 16
#SBATCH -o out_jacobi_mpi_pass-flag
#SBATCH -e err_jacobi_mpi_pass-flag

## this will fail ###
mpprun -print-rank-map jacobi_mpi

## this will run ###
mpprun --pass="-print-rank-map" jacobi_mpi

## this will run: pass two flags ###
mpprun --pass="-print-rank-map -prepend-rank" jacobi_mpi

In the above example script -print-rank-map is a flag specific to Intel MPI. It prints the rank map for the job. However -print-rank-map is not a mpprun flag. Hence, if this flag is passed directly to mpprun then the job will fail. However if it is passed through --pass="-print-rank-map" then mpprun will pass it to the underlying Intel MPI job launcher. Run the job as usual:

$ sbatch -A $ACCOUNT_NAME jacobi_mpi_pass-flag.sh
Submitted batch job ........

After the job is over check the output file out_jacobi_mpi_pass-flag.

Conclusion

After completing this tutorial you should be able to use mpprun to submit, debug and profile your jobs. Most of the NSC compiled binaries are mpprun compatible. Using mpprun will increase the chances of better utilization of resources for your jobs. Another advantage of mpprun is if some error happens in your job then NSC support team can look into mpprun logs and diagnose the problem more easily.


User Area

User support

Guides, documentation and FAQ.

Getting access

Applying for projects and login accounts.

System status

Everything OK!

No reported problems

Self-service

SUPR
NSC Express