Systems  
Status displays
System status
Retired systems
 
 
 
 
 
 
 
 
 
 
 
 

Vagn User Guide

1 About Vagn

Vagn is a Linux-based cluster with seven analysis nodes and a login node, and an attached disk storage system.

Vagn is designed primarily to process large amounts of data, not for running CPU-intensive computations.

Vagn is also the storage system of Ekman, a large compute cluster, located at PDC at KTH in Stockholm, and managed by PDC and NSC in collaboration.

Vagn is shared by three main user groups:

  • The Swedish Meteorological and Hydrological Institute (SMHI): the Rossby Centre (climate research) and SMHI FoUo (the oceanographic research group).
  • The Bert Bolin Centre for Climate Research (BBCC) at Stockholm University (gathers expertise from three departments within Stockholm University: Department of Meteorology (MISU), Department of Physical Geography and Quaternary Geology, and the Department of Applied Environmental Science)
  • The Linné Flow Centre at KTH

2 Getting an account on Vagn

In order to get access to Vagn you need to be a member of one of the groups using Vagn (or somehow be affiliated with one of the groups).

  1. Talk to your user group representative (see list below)
  2. Send an email to vagnekman-support@snic.vr.se. Please include:
    • Your full name.
    • A telephone number where we can reach you.
    • Which of the Vagn user groups you belong to, e.g misu, rossby, kthmech (unless it is obvious from your email address).
  3. We will then verify that you should be given access to Vagn by contacting your user group representative. When we get an OK from your user group representative we will create your Vagn account (usually within a working day) and inform you via email.
Vagn user group representatives (as of 2011-11-18)
GroupUser group representativeEmail
MISULaurent Brodeaulaurent@misu.su.se
RossbyMichael Kolaxmichael.kolax@smhi.se
SMHI FoUoAnders Höglundanders.hoglund@smhi.se
KTH MechanicsPhilipp Schlatterpschlatt@mech.kth.se

3 Accessing Vagn

3.1 Logging in

Please read this guide on security on NSC systems, it will show you how to keep your account and the reset of the system secure, and also how to use SSH logins as efficiently as possible (e.g not having to type your password all the time).

When you have received a username and a password from NSC, login to Vagn (vagn.nsc.liu.se) using SSH. SSH client software is available for most operating systems (e.g PuTTY for Windows, OpenSSH for Linux/MacOS). Remember to tell your SSH client to use your Vagn username.

Example (using OpenSSH):

You can use any SSH client that supports the SSH protocol version 2 (all modern SSH clients should do this, e.g OpenSSH, PuTTY).

Note that the first time you log in you will need to change the temporary password we sent to you to a permanent one.

Choose a good password, and do not use that password for anything else except your Vagn account.

3.2 Getting data to and from Vagn

You can transfer files to and from Vagn using e.g SFTP, scp, rsync, FFV (basically any method that uses SSH to move data).

If you need to regularly transfer large files we recommend FFV.

3.2.1 scp

scp is a simple tool that is useful for copying a single of a few files to or from a remote system. Example - copy a local file named local-file to your home directory on Vagn:

3.2.2 sftp

sftp is an interactive file transfer program, similar to ftp. Example:

There are also graphical SFTP clients available.

3.2.3 rsync

rsync is a file copying tool that can be used both locally and over the network. Its main advantage over scp and sftp is that is handles copying of whole directory trees well, and that rsync transfers can easily be restarted without having to re-transfer data. Example - copy the directory tree named local-tree to Vagn:

3.2.4 FFV

FFV is an asynchronous file transfer tool for moving large amounts of data locally or remotely. It has been developed by NSC for the purpose of simplifying moving of data between Ekman and Vagn. See http://www.nsc.liu.se/~perl/ffv/ for more information about FFV.

For additional information about scp, sftp or rsync, see the respective man page. These programs are also available for Windows and MacOS (e.g PuTTY, FileZilla).

3.2.5 Swestore

If you need to transfer data to/from SweStore, please read the SNIC knowledge base docs.

3.2.6 Moving data to and from Ekman

3.3 Running graphical applications

Some applications on Vagn (e.g Matlab) have a graphical user interface. To be able to display windows from an application running on Vagn on your own computer, you need two things:

  1. An X server software installed on your computer.
    • If you run Linux, this is already taken care of
    • If you run MacOS, you might need to install and start X11.app which is included in MacOS but not always installed.
    • If you run Windows, you need to find a third-party X server software, as this is not normally included in Windows. Ask your local system administrator.
  2. Enable X11 forwarding in your SSH client. This allows windows from Vagn to be displayed on your local computer. If you use OpenSSH this is done using the "-X" option to ssh, e.g "ssh -X username@vagn.nsc.liu.se".

4 Sharing Vagn with others

4.1 Using the login node

When you first login to Vagn (vagn.nsc.liu.se), you reach the "login node" (hostname "analys1"). This is just a small part of Vagn, it is a single Linux server that serves as Vagn's connection to the outside world.

It is important to know that the login node is a resource that is shared with all other Vagn users, and if it is slow or crashes all Vagn users are affected. For this reason we do not allow you to run anything but the most essential things on the login node.

On the login node, you are permitted to:

  • Run file transfers to and from Vagn
  • Manage your files on Vagn (copy, edit, delete files etc)
  • Submit batch and interactive jobs (more about that later)
  • Run small applications if you are certain that they will not use large amounts of memory or CPU. As a guideline, anything using more than 1GB of RAM or that runs on more than one CPU core should probably not be run on the login node. If you are unsure, please contact the Vagn support team (vagnekman-support@snic.vr.se) and discuss if what you need to do is suitable for the login node.

Anything not permitted to run on the login node should be run on one or more of the analysis nodes as an interactive or batch job.

4.2 Batch queuing system

There are two main reasons why Vagn has a batch queuing system just like larger clusters.

  • Making it possible to automate data processing on a large scale using batch jobs.
  • Making it possible for many users to share the analysis nodes without interfering with each other (e.g by using up all memory on a node).

Vagn uses SLURM for scheduling (deciding who and what gets to use the system at any given time) and resource management (keeping track of nodes and allocated resources, starting and stopping jobs etc).

The Vagn job queue is configured as a FIFO queue: first-come, first-served.

4.3 Interactive jobs

An interactive job is what you use if you "just want to run an application". This is what happens under the hood when you use the "interactive" command:

  1. You run "interactive", usually with some extra options to use non-default settings, e.g to request more memory or more CPU cores.
  2. The scheduling system puts your request in the queue, waiting for resources (CPU, memory or a certain node type) to become available.
  3. You wait for the job to start (on Vagn, you rarely have to wait at all due to the low utilization of the system).
  4. The scheduling system starts your job on a suitable analysis node, and reserves the amount of memory and CPU cores you requested.
  5. You are automatically logged in to an analysis node and can start working.

If your interactive session has no started after 30 seconds, all resources on Vagn are probably already in use and you will have to wait in the queue. You can check the queue status by logging in to Vagn again in another window and using the "squeue" command.

Squeue has many options (see "man squeue" for details). If you want to see who is running on Vagn right now and how much resources they use, try e.g 'squeue -o "%.6i %.9P %.8j %.8u %.8a %.8T %.10L %.5D %.4C %.10m %R"'.

Example interactive session (here I reserve 2 CPU cores and 2 GB of RAM for 4 hours):

Remember to end your interactive session by typing "exit". When you do that, the resources you reserved are released and become available to other users.

Note: the "interactive" command takes the same options as "sbatch", so you can read the sbatch man page to find out all the options that can be used. The most common ones are:

  • "-t HH:MM:SS" - choose for how long you want to reserve resources. The default value is 12 hours and the maximum is six days. Choose a reasonable value! If everyone always use six days, it becomes very difficult to estimate when new jobs can start, and if you forget to end your interactive session, resources will be unavailable to other users for up to six days.
  • "-n X" - reserve X CPU cores (note: this limit is not enforced, if your application uses more CPU cores than you have reserved, other users will suffer).
  • "–mem X" - reserve X megabytes of memory (note: this limit is enforced - your application will be killed if you use more than the amount of memory you requested).

4.4 Batch jobs

A batch job is a non-interactive (no user input is possible) job. What happens during the batch job is controlled by the job script (sometimes known as "submit script").

Preparing a batch job:

  1. Copy any needed input files to Vagn.
  2. Write the job script (some examples are included below)

Submitting a batch job:

  1. Load any modules needed to run your job (you can not use "module" in the job script). The environment in the shell where you run "sbatch" will be saved and recreated when starting the job. This includes the current working directory.
  2. Submit the job to the queue (e.g "sbatch myjob.sh")
    • Job options (e.g amount of memory reserved, number of CPU cores reserved, maximum wall time etc) can either be set in the job script (by adding "#SBATCH <options>" lines) or by giving the same options to sbatch. You can put options in both locations. If an option is present in both places, the sbatch option is used.
    • The environment (current directory, loaded modules, $PATH and other environment variables) is recorded by sbatch and will be restored when the job starts.
  3. The job is now in the queue. On Vagn, it will usually start fairly quickly.

Monitoring a batch job:

  • You can monitor all your jobs, both batch and interactive, using the "squeue" command (e.g "squeue -u $USER" to see your jobs).
  • If you want to cancel a queued or running job, use the "scancel" command (e.g "scancel 12345").
  • When the job has started, the standard output and standard error from the job script (which will contain output from your application if you have not redirected it elsewhere) will be written to a file named slurm-NNNNN.out in the directory where you submitted the job (NNNNN is replaced with the job ID).

What happens when a job starts?

  1. The environment (current working directory and environment variables such as $PATH) that were set when you submitted the job are recreated on the node where the job will be started.
  2. The job script starts executing on the first node allocated to the job. If you have requested more than one node, your job script is responsible for starting your processes on all nodes in the job, e.g by using srun, ssh or an MPI launcher.
  3. The job ends when your job script ends. All processes started by the job will be terminated if they are still running. The resources allocated to the job are now free to use for other jobs.
    • Note: if you run applications in the background ("application &") from your job script, you have to make sure that the job script does not end until all background applications has ended. This can be accomplished by adding a "wait" line to the script. Wait will cause the script to stop executing on that line until all background applications have finished.
    • Note: if your job runs for longer than the time you requested ("sbatch -t HH:MM:SS"), the job will be killed automatically.
  4. You can now fetch the output files generated by your job.

Sample job script:

Sample job script for running several background tasks within one job:

Sample job script for running an MPI application on two nodes using 16 cores

4.5 Logins outside the interactive/batch system

In order to allow you to monitor and debug running jobs, you can login to an analysis node directly from the login node provided that you have an active job running on that analysis node.

(If you try to login to an analysis node where you do not have a job running you will get the error message "Access denied: user x_XXXXX (uid=NNNN) has no active jobs".)

This feature is only intended for monitoring and debugging running jobs! Do not start any compute jobs from this type of "direct" login! If you do, you circumvent the normal limitations on job length, memory use etc, and you will likely cause problems for other users (e.g causing the node to run out of memory and stop working).

To use this feature, find out which analysis node your job is using (use e.g "squeue -u $USER"), then run e.g "ssh a2" from the login node to login to that analysis node. You can then use normal Unix tools like "top" and "ps" to monitor your job.

Hint: It is possible to run several terminals "inside" your interactive shell in a way that still stays inside the job. Since the interactive shell is implemented using "screen" (a terminal window multiplexer) you can use all screen features.

Some common screen commands (read "man screen" for more information):
CommandWhat it does
Ctrl-a cCreate a new terminal inside screen
Ctrl-a wList the terminals inside this screen
Ctrl-a "List the terminals inside this screen as a menu
Ctrl-a KClose the current terminal
Ctrl-a nGo to the next terminal
Ctrl-a AName the current terminal
Ctrl-a hWrite terminal contents to file ("screendump")
Ctrl-a HStart/stop logging of terminal to file

4.6 Requesting the correct amount of resources

Please try to request approximately the number of CPU cores that your application will use. If you do not, your job may be started on a node where not enough CPU cores are available for all users, and everyone using that node will then see bad performance.

Note: If you use applicatiosn that might be threaded but where you do not know how many cores they will use, you can try starting them and then running "top" and check the %CPU column for your processes. 100% CPU is equivalent to one full CPU core, 400% is four cores etc.. If you run applications that routinely use more than one CPU core, please request an appropriate number of cores when submitting the job.

To allocate more than one CPU core on a single node, use "-n <number of cores>" and "-N1-1" (to avoid getting cores spread out over more than one node), e.g:

"interactive –mem=24000 -n 4 -N1-1 -t 8:00:00" will give you four cores and 24GB RAM on one node for 8 hours.

The same options are used for batch jobs, e.g "sbatch –mem=24000 -t 8:00:00".

Also, please try to request a suitable amount of memory for your job. I.e a little more than you think you will need, but not so much that a lot of RAM will be unused.

All the options "–mem", "-n" etc for interactive and sbatch are described in the man page for sbatch (run "man sbatch" on Vagn to read it).

4.7 The noshare partition, and why you should avoid using it

If you for some reason require a whole node dedicated to one job you can use the "noshare" partition. When using the noshare partition you MUST also request a particular node type (thin=32GB/8 cores, fat=64GB/8 cores, huge=256GB RAM/32 cores) using the -C option.

E.g: "interactive -p noshare -C thin -t 8:00:00" will give you exclusive access to one 32GB node for 8 hours.

Do not use the noshare partition unless you know that you really need a whole node for your job!

Note: using –mem instead of the noshare partition will ensure that your job is started as soon as possible. Why? There is room for 22 jobs submitted with "–mem=32000" in Vagn, but only four jobs submitted with "-p noshare -C thin" (as there are only four thin nodes). Both alternatives will give you 32GB RAM, but you might have to wait longer in the queue when using noshare.

4.8 Limitations on memory and core availability

If you request more than 32186 MB RAM your job can only be run on one of the fat nodes (a6, a7 or a8), which might result in you having to wait longer for the job to start.

If you request more than 64372 MB RAM your job can only be run on one of the new fat nodes (a7 or a8), which might result in you having to wait longer for the job to start.

If you request more than 8 cores on a single node your job can only be run on one of the new fat nodes, which might increase your queue time.

Unless you are planning to run a job over multiple nodes (e.g using MPI), you should never request more than 32 cores and 256000 MB RAM (the maximum that is available in a single fat node).

4.9 Running batch jobs without annoying other users

Since Vagn is used for both interactive and batch jobs, batch job users should be careful and not run so many concurrent jobs that it becomes hard for other users to start jobs.

Note: because the queue in Vagn is a true FIFO queue, one user can easily block the entire system for other users by submitting a large volume of jobs at once. Doing this is not acceptable! Vagn must be available for interactive use, i.e it is not acceptable for most users to have to wait for several hours to get an interactive shell.

If you want to submit a lot of jobs at once without disturbing other users, you can do so by limiting the number of concurrent jobs: use job naming and the "–dependency=singleton" feature.

This example submits 6 jobs but no more than 3 can run at any one time:

There are currently no hard limits on the number of jobs a single user can run or submit, please use common sense!

Common sense can be assisted by checking the queue status before submitting a large volume of jobs. To get an overview of running and queued jobs, use the "squeue" command with suitable options, e.g squeue -o "%.7i %.9P %.8u %.8T %.11L %.11l %.8N %.10m"

5 Building and running your own applications on Vagn

We recommend using the Intel compilers: ifort (Fortran), icc (C), and icpc (C++).

5.1 Compiling OpenMP Applications

Example: compiling the OpenMP-program, openmp.f with ifort:

$ ifort -openmp openmp.f

Example: compiling the OpenMP-program, openmp.c with icc:

$ icc -openmp openmp.c

5.2 Compiling MPI Applications

Before compiling an MPI application you should load an MPI module. We recommend the OpenMPI, which is added to your environment with the command:

$ module add openmpi

Example: compiling the MPI-program, mpiprog.f with ifort:

$ ifort -Nmpi mpiprog.f 

Where mpiprog.f being:

Example: compiling the MPI-program, mpiprog.c with icc:

$ icc -Nmpi mpiprog.c

5.3 Compilers and NSC compiler wrapper

When invoking any of the Intel compilers (icc, ifort, or icpc), there is a wrapper-script that looks for Vagn-specific options. Options starting with -N are used by the wrapper to affect the compilation and/or linking processes, but these options are not passed to the compiler itself.

Vagn/NSC compiler wrapper options
OptionWhat it does
-NhelpWrite wrapper-help
-NverboseLet the wrapper be more verbose
-NmklMake the compiler compile and link against the currently loaded MKL-module
-NmpiMake the compiler compile and link against the currently loaded MPI-module
-NmixrpathMake the compiler link a program build with both icc/icpc and ifort

Example:

$ module load mkl
$ ifort -Nverbose -Nmkl -o example example.F -lmkl_lapack -lmkl -lguide -lpthread
ifort INFO: Linking with MKL mkl/10.1.0.015.
ifort INFO: -Nmkl resolved to: -I/software/intel/cmkl/10.1.0.015/include 
-L/software/intel/cmkl/10.1.0.015/lib/em64t 
-Wl,--rpath,/software/intel/cmkl/10.1.0.015/lib/em64t

5.3.1 Useful Options for the Intel Compilers

Below is a short list of useful compiler options. The manual pages "man ifort" and "man icc" contain more details, and further information is also found at the Intel Compilers page

Intel compiler optimization options
OptionWhat it does
-O0Disable optimizations.
-O1,-O2Enable optimizations (DEFAULT).
-O3Enable -O2 plus more aggressive optimizations that may not improve performance for all programs.
-ipEnables interprocedural optimizations for single file compilation.
-ipoEnables multifile interprocedural (IP) optimizations (between files). Hint: If your build process uses ar to create .a-archives you need to use xiar (Intels implementation) instead of the system's /usr/bin/ar for an IPO build to work.

The recommended optimization settings to build binaries that work on all Vagn nodes are "-O2 -ip -xW".

Intel compiler debugging options
OptionWhat it does
-gGenerate symbolic debug information
-tracebackGenerate extra information in the object file to allow the display of source file traceback information at run time when a severe error occurs.
-fpe<n>Specifies floating-point exception handling at run-time
-mpMaintains floating-point precision (while disabling some optimizations).
Intel compiler profiling options
OptionWhat it does
-pCompile and link for function profiling with UNIX gprof tool.
Intel compiler options that only apply to Fortran programs
OptionWhat it does
-assume byterecSpecifies (for unformatted data files) that the units for the OPEN statement RECL specifier (record length) value are in bytes, not longwords (four-byte units). For formatted files, the RECL unit is always in bytes.
-r8Set default size of REAL to 8 bytes.
-i8Set default size of integer variables to 8 bytes.
-zeroImplicitly initialize all data to zero.
-saveSave variables (static allocation) except local variables within a recursive routine; opposite of -auto.
-CBPerforms run-time checks on whether array subscript and substring references are within declared bounds.

Little endian to Big endian conversion in Fortran is done through the F_UFMTENDIAN environment variable. When set, the following operations are done:

  • The WRITE operation converts little endian format to big endian format.
  • The READ operation converts big endian format to little endian format.
Intel compiler options that only apply to Fortran programs
OptionWhat it does
F_UFMTENDIAN = bigConvert all files.
F_UFMTENDIAN ="big;little:8"All files except those connected to unit 8 are converted.

5.4 Math libraries

5.4.1 MKL, Intel Math Kernel Library

The Intel Math Kernel Library (MKL) is available, and we strongly recommend using it. Several versions of MKL may exist, you can see which versions are available with the "module avail" command. The library includes the following groups of routines:

  • Basic Linear Algebra Subprograms (BLAS):
    • vector operations
    • matrix-vector operations
    • matrix-matrix operations
  • Sparse BLAS (basic vector operations on sparse vectors)
  • Fast Fourier transform routines (with Fortran and C interfaces). There exist wrappers for FFTW 2.x and FFTW 3.x compatibility.
  • LAPACK routines for solving systems of linear equations
  • LAPACK routines for solving least-squares problems, eigenvalue and singular value problems, and Sylvester's equations
  • ScaLAPACK routines including a distributed memory version of BLAS (PBLAS or Parallel BLAS) and a set of Basic Linear Algebra Communication Subprograms (BLACS) for inter-processor communication.
  • Vector Mathematical Library (VML) functions for computing core mathematical functions on vector arguments (with Fortran and C interfaces).

Full documentation can be found online at http://www.intel.com/software/products/mkl/ and in ${MKL_ROOT}/doc on Vagn.

  • Library structure
    The Intel MKL is located in the software/intel/mkl directory. The MKL consists of two parts: a linear algebra package and processor specific kernels. The former part contains LAPACK and ScaLAPACK routines and drivers that were optimized as without regard to processor so that it can be used effectively on different processors. The latter part contains processor specific kernels such as BLAS, FFT, BLACS, and VML that were optimized for the specific processor.
  • Linking with MKL
    To use LAPACK and BLAS software you must link two libraries: MKL LAPACK and the threaded or sequential kernel. The required MKL-path is automatically added by the compiler wrapper if the option -Nmkl is added, and the appropriate MKL-module is loaded.

    This table lists the most common MKL link options. See the following chapter for examples.

    MKL options
    OptionWhat it does
    -NmklAdd required paths corresponding to the loaded MKL module.
    -lmkl_lapackUse MKL LAPACK and BLAS
    -lmkl -lguide -lpthreadUse threaded MKL
    -lmkl_intel_lp64 -lmkl_sequential -lmkl_coreUse sequential MKL (see next chapter).
  • MKL and threading
    The MKL is threaded by default, but there is also a non-threaded "sequential" version available. (The instructions here are valid for MKL 10.0 and newer, older versions worked differently.)

    If threaded or sequential MKL gives best performance varies between applications. MPI applications will typically launch one MPI-rank on each processor core on each node, in this case threads are not needed as all cores are already used. However if you use threaded MKL you can start fewer ranks per node and increase the number of threads per rank accordingly.

    The threading of MKL can be controlled at run time through the use of a few special environment variables.

    • OMP_NUM_THREADS controls how many OpenMP threads that should be started by default. This variable affects all OpenMP programs including the MKL library.
    • MKL_NUM_THREADS controls how many threads MKL-routines should spawn by default. This variable affects only the MKL library, and takes precedence over any OMP_NUM_THREADS setting.
    • MKL_DOMAIN_NUM_THREADS let the user control individual parts of the MKL library. E.g. MKL_DOMAIN_NUM_THREADS="MKL_ALL=1;MKL_BLAS=2;MKL_FFT=4" would instruct MKL to use one thread by default, two threads for BLAS calculations, and four threads for FFT routines. MKL_DOMAIN_NUM_THREADS also takes precedence over OMP_NUM_THREADS.

    If the OpenMP environment variable controlling the number of threads is unset when launching an MPI application with mpprun, mpprun will by default set OMP_NUM_THREADS=1.

  • Example, dynamic linking using ifort and lapack

    Use MKL LAPACK and threaded MKL:

    $ module load mkl
    $ ifort -Nmkl -o example example.o -lmkl_lapack -lmkl -lguide -pthread
    ifort INFO: Linking with MKL mkl/10.0.2.018.
    

    Use MKL LAPACK and sequential MKL:

    $ module load mkl
    $ ifort -Nmkl -o example example.o -lmkl_lapack -lmkl_intel_lp64 -lmkl_sequential -lmkl_core
    ifort INFO: Linking with MKL mkl/10.0.2.018.
    

    Example, linking with MKL ScaLAPACK and OpenMPI ScaLAPACK depends on BLACS, LAPACK, and BLAS (in that order), where the BLACS library also depends on an underlying MPI. Therefore, it is important to choose the correct combination of libraries in the right order when linking a program with ScaLAPACK. MKL is shipped with BLACS-libraries which are recompiled for OpenMPI and IntelMPI (the latter is not installed on Vagn). To link a program with ScaLAPACK and OpenMPI:

    $ module load mkl
    $ module load openmpi
    $ ifort -Nmkl -Nmpi -o my_binary my_code.f90 -lmkl_scalapack_lp64 -lmkl_blacs_openmpi_lp64 \
    -lmkl_lapack -lmkl -lguide -lpthread 
    ifort INFO: Linking with MPI openmpi/1.2.5-i101011.
    ifort INFO: Linking with MKL mkl/10.0.2.018.
    

    Example, linking with ScaLAPACK, alternatives to MKL and OpenMPI By default we would recommend using the above combination (OpenMPI + MKL), but there are alteratives. It so happens that both mvapich2 and IntelMPI are derived from the same code base (mpich2), and mvapich2 can (usually) be used as a drop in replacement for IntelMPI. As compared to the OpenMPI+MKL example above, instead of blacs_openmpi use blacs_intelmpi. I.e.:

    $ module load mkl
    $ module load mvapich2
    $ ifort -Nmkl -Nmpi -o my_binary my_code.f90 -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 \
    -lmkl_lapack -lmkl -lguide -lpthread 
    ifort INFO: Linking with MPI mvapich2/1.0.2-i101011.
    ifort INFO: Linking with MKL mkl/10.0.2.018.
    

    It is also possible to use ScaliMPI by using the "vanilla" netlib ScaLAPACK and BLACS, and link them against your LAPACK/BLAS of choice. If your choice of LAPACK/BLAS is MKL (generally the best choice):

    $ module load mkl
    $ module load openmpi
    $ sppath=/software/libs/scalapack/1.8.0/i101011
    $ blpath=/software/libs/BLACS/i101011/LIB-scamp
    $ ifort -Nmkl -Nmpi -o my_binary my_code.f90 $sppath/libscalapack.a \
    $blpath/blacsF77init_MPI-Vagn-0.a $blpath/blacs_MPI-Vagn-0.a \
    -lmkl_lapack -lmkl -lguide -lpthread
    

5.5 Executing Parallel Jobs

There are two main alternatives to develop program codes that can be executed on multiple processor cores: OpenMP and MPI. OpenMP parallelization can be used for paralllelization of code that is to run within a single node (with up to 8 cores), whereas MPI is used for parallelization of code that can run on single as well as multiple nodes. The two types of applications are executed differently.

5.5.1 Executing an MPI application

An MPI application is started with the command:

$ mpprun mpiprog.x

Use "mpprun –help" to get a list of options and a brief description.

Note:

  • mpprun has to be started from a SLURM job. Either write a batch script and submit it with sbatch, or start an interactive shell using the command interactive [more details].
  • mpprun will launch a number of ranks determined from the SLURM environment variables [more details].
  • mpprun requires an MPI binary built according to NSC-recommendations in order to automatically choose the correct MPI implementation [more details].
  • In order to explicitly choose an MPI implementation to use, invoke mpprun with the flag "–force-mpi=<MPI module>"

Executing an OpenMP application

The number of threads to be used by the application must be defined, and should be less or equal to eight. You can set the number of threads to be used by the application in two ways, either by defining a shell environment variable before starting the application or by calling an OpenMP library routine in the serial portion of the code.

Environment variable:

     export OMP_NUM_THREADS=N
     time openmp.x

Library routine in Fortran:

SUBROUTINE OMP_SET_NUM_THREADS(scalar_integer_expression)

Library routine in C/C++:

#include <omp.h>
void omp_set_num_threads(int num_threads)

Note: The maximum number of threads can be queried in your application by use of the external integer function.

In Fortran:

INTEGER FUNCTION OMP_GET_MAX_THREADS()

In C/C++:

#include <omp.h>
int omp_get_max_threads(void)

6 Using installed applications

6.1 Modules

We use cmod (module) to handle the environment when there exist several installed versions of the same software. This application sets up the correct paths to the binaries, man-pages, libraries, etc. for the currently selected module.

The correct environment is set up by using the module command.

Module usage
modulelists the available subcommands
module listlists currently loaded modules
module availlists available modules
module load exampleloads the environment specified in the module named example
module unload exampleunloads the environment specified in the module named example

A default environment is automatically setup when you log in. The default modules are:

In order to find out to which version of the compiler the module ifort refer, you may list all modules:

The note "(def)" indicates which version that is the default, and, in case of the Fortran compiler, it is thus version 10.1.017. Please note, however, that the choice of default module may change over time. Therefore, if you wish to re-compile part of a program and link a new executable, you may need to ensure that you are using the same version of the compiler that you had at the time of the first built. You can switch to another version of the compiler as follows:

If you want to know exactly what a module does when its loaded you can read the definition in files located under /etc/cmod/modulefiles.

6.2 Useful environment variables for your jobs

Useful environment variables availabel in the job environment (interactive and batch)
VariableDefinitionDefault value
$SNIC_BACKUPThe user's primary directory at the centre (the part of the Centre Storage that is backed up)/home/$USER
SNIC_SITEAt what SNIC site am I running?nsc
SNIC_RESOURCEWhat resource am I using at this SNIC site?vagn
$SLURM_NODELISTNodes allocated to this job
$SLURM_JOBIDThe job ID of this job

There are many other $SLURM_* variables, see the man patch for sbatch for details.

6.3 Matlab

Please do not run Matlab on the login node, as it can use quite a lot of memory and CPU.

The matlab module is loaded by default, so the command "matlab" is always available.

If you want to load netcdf files from within Matlab, you need to load the netcdf module before starting matlab.

Example of running Matlab interactively:

6.4 ParaView

Example for using ParaView on aa Vagn analysis node connecting to the GUI running on port 11111 on your local computer:

  1. Setup an SSH tunnel that forwards port 44455 (please choose your own random port) to your local computers port 11111. On your local computer, run e.g "ssh -R a1:44455:localhost:11111 x_username@vagn.nsc.liu.se"
  2. Start an interactive session, e.g "interactive -N 1 –mem=8000 -t 01:00:00"
  3. On the analysis node, run e.g "pvserver –rc –use-offscreen-rendering –client-host=a1 –server-port=44455"

7 Installing software

Installing software yourself (e.g in your home directory) is generally permitted. Use comon sense, e.g don't install software from untrusted sources. Also, do not start services that can introduce security problems. If you are unsure about an application, please ask vagnekman-support@snic.vr.se before installing.

Software that is available in CentOS or the EPEL package repository can generally be installed quickly, just send an email to vagnekman-support@snic.vr.se and we will in most cases grant your request.

Non-packaged software that will be used by only one or a few users should be installed and maintained by that user. If needed, NSC will assist you when installing/building software, but we prefer that applications with few users are maintained by those users. We do not have the resources to build and maintain every application someone wants to use on Vagn.

Non-packaged software that is of interest to many users and where NSC has the skills needed to actively maintain the software can be installed and maintained by NSC. Examples: ParaView, cdo, hdf5, netcdf, …

8 Storage

Users have access to several different file system types on Vagn.

File system types on Vagn
Mount pointUse forSizeComment
/homeImportant data7 TBEach user has their own directory under /home. Backed up to tape every night
/nobackup/vagn1Less important data or data that can be restored by re-running calculations94 TBNOT backed up
/nobackup/vagn2Less important data or data that can be restored by re-running calculations408 TBNOT backed up
/scratch/localLocal scratch data during the running of a job890 GBNOT backed up. This file system is located on the local disk of each analysis node. It is not shared between nodes. Files on this file system can be removed without warning (but not if you are currently running a job on that node), e.g if the analysis node is reinstalled or if the disk becomes full.
/softwareNot writable by users35GBThis file system contains software provided by NSC and is not writable by users.
Anything under /nobackup not listed above, e.g /nobackup/rossby15Follow SMHI guidelines.variesThese are SMHI file systems from Gimle that are available on Vagn for convenience. Please note that performance on these file system will typically be lower than on the local Vagn file systems (/nobackup/vagn1+vagn2 and /home)

How much data you can store on /home, /nobackup/vagn1 and /nobackup/vagn2 is limited by quotas.

On /home space is limited by a per-user quota (currently 200GiB).

On vagn1 and vagn2 there are group quotas in place that limits how much data any one user group can store. If the group quota is full, the group needs to decide who will delete or move files. Since all available disk space on Vagn is already split up between the groups, there is no point in asking NSC for more space - there is no more space to give out. If the group quota on vagn1 or vagn2 is full, please discuss this with your user group representative.

On /scratch/local there are no quotas, but remember that you share that disk with all other users on that node, and that files that you put there can and will be removed without warning when you are not running jobs on that analysis node.

Some of the SMHI file systems have quotas, some don't. Please see the Gimle User Guide or ask the NSC SMHI support group about these file systems.

9 Publishing Vagn data to non-Vagn users

Vagn is connected to the SMHI Publisher system, which allows Vagn users to copy data to a publishing server, from where it can be downloaded by users without the need for a Vagn account.

Publisher is an SMHI-funded system, but non-SMHI Vagn users may use the system for temporary file transfers. This service to non-SMHI Vagn users may be terminated if it causes problems for SMHI users.

Please read the Publisher User Guide for more information.

10 How to get help

You can contact the Vagn/Ekman support team using the email address vagnekman-support@snic.vr.se. You can use this address for anything related to Vagn, e.g

  • Asking a question
  • Telling us that something is wrong
  • Start a discussion regarding some long-term issue or future needs

Please include the following information:

  • a relevant subject line (e.g "I cannot start Matlab on Vagn")
  • your Vagn username
  • which computer/cluster the problem is related to (we need to know if your problem is on Vagn or Ekman)
  • which software you are using, including compilers (for example "ifort 9.0.032") and switches (for example "-apo")
  • a short description of the problem, specifying what actions you have performed, which results you got, and which results you expected to get.
  • for a communication problem, please include details of your own computer and network
  • if you have more than one separate problem, please send one email for each problem

You may use English or Swedish, we will try to reply in the same language. Please note that as we have some staff that are not fluent in Swedish, you may sometimes get an answer in English regardless of the language of your original question.

We read email to the support address during normal office hours (approximately 08-17 local Swedish time: CET/CEST). We try to always give you some kind of answer (not counting the automated one) within two working days (but you will usually hear from us sooner than that).

You will get an automated reply from our support ticket system within a few minutes. If you want to add more information about your problem, reply to that email, that way the additional information will automatically be added to our database.

When you have a new question, please send a new email, do not reply to an old conversation. A reply to an old email might only reach the person who handled that problem, and that person could be busy, on leave etc. Sending a new email ensures that your request is seen by all support staff as soon as possible.






Page last modified: 2012-02-22 12:20
For more information contact us at info@nsc.liu.se.