Systems  
Status displays
System status
Retired systems «
 
 
 
 
 
 
 
 
 
 
 
 

Mozart User Guide

Please note that this documentation is being updated after a system reinstallation. Not all details will be accurate until the update is complete.

Contents

Description

Mozart is a Linux-based SGI Altix 3700 Bx2 supercomputer that provides a large global shared memory. The resource is intended for Swedish academic users and it is equipped with application software that reflects the needs among the Swedish research community in natural sciences. Please take your time and learn more about Mozart from the information below, and see whether the computer can expand the scope of your scientific work.

Hardware

64 processors Intel Itanium 2, 1.6 GHz, 6 MB on-chip cache
Interconnect High-bandwidth NUMA-link
Memory 512 GiB

Software

Operating System:

Debian Lenny (5.0)

Resource Manager:

SLURM

Scheduler:

SLURM

Math library:

SGIs Scientific Computing Software Library

Intel Math Kernel Library 10.2 (recommended)

MPI:

SGIs Message Passing Toolkit

Applications (link)

Compilers (link)

General information

  • Status information for Mozart is available [here].
  • Use the batch queue system for running jobs.
  • As scratch directory, please use /scratch/$USER and remove scratch files after job completion.
  • Backup of /home is made every night.
  • If you use bash, do not remove the default entries in your ~/.bashrc.
  • For questions, contact support@nsc.liu.se.

Quick guide

  1. Log in to Mozart using ssh:
    ssh username@mozart.nsc.liu.se
  2. Once logged in, enable e-mail forwarding to your real e-mail address (has only to be done once):
           echo youremail@xxx.yyy.se > ~/.forward
  3. Compile an MPI application using the preferred Intel compiler (more details):
    ifort mpiprog.f -Nmpt
  4. Run the application as a batch job:
    1. Create a submit script. This file contains information about which project the job should be accounted on, how many processors you wish to use, how long you expect the job to run, how to start the application, etc.
    2. Submit the job.
      sbatch script.sh
    3. When the job is finished, an e-mail will be sent to the e-mail address specified in ~/.forward.

Accessing the system

Log in to Mozart with ssh:

ssh username@mozart.nsc.liu.se

File transfer is available using scp and sftp:

scp ./example user@mozart.nsc.liu.se:~/documents
sftp user@mozart.nsc.liu.se
Connecting to mozart.nsc.liu.se...
sftp>

Security

When a system is compromised and passwords stolen, the thing that causes the most grief is when the stolen password can be used for more than one system. A user that has accounts on many different computers and gets his/her shared password stolen will allow the intruders to easily cross administrative domains and further compromise other systems.

  • DO NOT use a trivial password based on your name, account, dogs name, etc.
  • DO NOT share passwords between different systems.

To login to a system and then continue from that system to a third (as illustrated below) should be avoided.


When logging into a system, read the “last login” information. If you can't verify the information, contact support@nsc.liu.se as soon as possible.

Checklist:

  • Use different passwords for different systems.

  • Do not use weak passwords.

  • Avoid chains of ssh sessions.

  • Check: “Last login: DATE from MACHINE”

SSH public-key authentication

There is an alternative to traditional passwords. This method of authentication is known as key-pair or public-key authentication. While a password is simple to understand (the secret is in your head until you give it to the ssh server which grants or denies access), a key-pair is somewhat more complicated.

Our recommendation is to use whichever method you feel comfortable with. If you invest some time to learn about key-pairs you will receive several benefits, including better security and easier work flow.

A key-pair is as the name suggests a pair of cryptographic keys. One of the keys is called the private key (this one should be kept secure and protected with a pass phrase) and a public key (this one can be passed around freely as the name suggests).

After you have created the pair, you have to copy the public key to all systems that you want to ssh to. The private key is kept as secure as possible and protected with a good pass phrase. On your laptop/workstation you use a key-agent to hold the private key while you work.

  • Can be much more secure than regular password authentication

  • Can be less secure if used incorrectly (understand before use)

  • Allows multiple logins without reentering password/pass phrase

  • Allows safer use of ssh chains

How to use SSH public-key authentication instead of regular password authentication is described in chapter 4 in SSH tips, tricks & protocol tutorial by Damien Miller.

Short description of the necessary steps involved using SSH public-key authentication (read Damien Miller's guide above for more details):

  • Generate a key-pair, choose a good pass phrase and make sure private key is secure (once).

  • Put your public key into ~/.ssh/authorized_keys on desired systems.

  • Load your private key into your key-agent (ssh-add with OpenSSH).

  • Run ssh all you want without reentering your pass phrase without the risk of anyone stealing your password.

Storage

Users have access to several file systems on Mozart:

/home

2 TB

Backed up.

/scratch

3,5 TB

Not backed up.

/nobackup/global

2,5 TB

Not backed up.

  • Use the /home/$USER directory for storage of important documents, and not scratch files.
  • Use the /scratch/$USER directory for temporary storage, and delete your scratch files after job completion. Automatic cleanup of files in /scratch is performed. Files that have not been modified within the last 14 days are deleted.
  • Use the /nobackup/global/$USER directory for large long-term storage. From a technical point of view, it is a relatively robust and safe disk, but it is a slow file system and should not be used as scratch area for your calculations.

Environment

We use something called cmod (or module) to handle the environment when there exists several versions of the same software installed. This application sets up the correct paths to the binaries, man-pages, libraries, etc. for the currently selected module.

The correct environment is set up by using the module command . Here is a list of the most useful arguments to module:


module

lists the available arguments


module list

lists currently loaded modules


module avail

lists the available modules for use


module load example

loads the environment specified in the module named example


module unload example

unloads the environment specified in the module named example

A default environment is automatically declared when you log in. The default modules are:


Intel 11.1


mkl 10.2

For example, we have Intel MKL for Linux versions 10.2 and 9.1.032 installed, you could switch from 10.2 to 9.1.032 with the command:

andjo@mozart:~$ module avail

In directory /etc/cmod/modulefiles:

  +default                       -mkl/10.2 (def)              
  +dotmodules                    -mkl/10.2.4.032              
  -gaussian/default              -mkl/9.1.023                 
  -gaussian/g03.E01              -mkl/default                 
  -gaussian/g09.A02 (def)        -nsc/default                 
  -icc/11.1.069 (def)            -openmpi/1.4.1-i111069 (def) 
  -icc/default                   -openmpi/default             
  -idb/11.1.069 (def)            -root                        
  -idb/default                   -snic/default                
  -ifort/11.1.069 (def)          +snic/mozart                 
  -ifort/default                 -snic/strauss                
  -mkl/10.1.1.019              
andjo@mozart:~$ module list
Currently loaded modules:
  1) mkl
  2) ifort
  3) icc
  4) idb
  5) nsc
  6) snic/mozart
  7) snic
  8) dotmodules
  9) default
andjo@mozart:~$ module unload icc
andjo@mozart:~$ module unload mkl
andjo@mozart:~$ module load mkl/9.1.023
andjo@mozart:~$ module list
Currently loaded modules:
  1) ifort
  2) idb
  3) nsc
  4) snic/mozart
  5) snic
  6) dotmodules
  7) default
  8) mkl/9.1.023
andjo@mozart:~>

Tip: The environment is specified in the files located under /etc/cmod/modulefiles.

Resource Name Environment Variable

If you are using several NSC resources and copying scripts between them, it can be useful for a script to have a way of knowing what resource it is running on. You can use the NSC_RESOURCE_NAME variable for that:

username@mozart:~> echo "Running on $NSC_RESOURCE_NAME"
Running on mozart

Compiling

For a quick introduction how to compile and run jobs see Quick Guide.

Check Sofware for installed compilers and versions, generally the following compilers are aviable:


Intel

Gnu Compiler

C

icc
gcc

C++

icc
g++

Fortran

ifort
gfortran

We recommend using the Intel compilers.

Compiling OpenMP applications

Example: compiling the OpenMP-program, openmp.f with ifort:

ifort -openmp openmp.f

Example: compiling the OpenMP-program, openmp.c with icc:

icc -openmp openmp.c

Compiling MPI applications

Example: compiling the MPI-program, mpiprog.f with ifort:

ifort mpiprog.f -Nmpi

Example: compiling the MPI-program, mpiprog.c with icc:

icc mpiprog.c -Nmpi

Intel compiler, useful compiler options

Below is a short list of useful compiler options.
The manual pages "man ifort" and "man icc" contain more details, and further information is also found at the Intel homepage [here].

(a) Optimization
There are three different optimization levels in Intel's compilers:


-O0

Disable optimizations.


-O1,-O2 

Enable optimizations (DEFAULT).


-O3

Enable -O2 plus more aggressive optimizations that may not improve performance for all programs.

A recommended flag for general code is -O2 and for best performance -O3 -ip. As always however, aggressive optimisation runs a higher risk of encountering compiler limitations.

(b) Debugging


-g

Generate symbolic debug information.


-traceback

Generate extra information in the object file to allow the display of source file traceback information at runtime when a severe error occurs.


-fpe<n>

Specifies floating-point exception handling at run-time.


-mp

Maintains floating-point precision (while disabling some optimizations).

(c) Profiling


-p

Compile and link for function profiling with UNIX gprof tool.

(d) Options that only apply to Fortran programs


-assume byterecl

Specifies (for unformatted data files) that the units for the OPEN statement RECL specifier (record length) value are in bytes, not longwords (four-byte units). For formatted files, the RECL unit is always in bytes.


-r8

Set default size of REAL to 8 bytes.


-i8

Set default size of integer variables to 8 bytes.


-zero 

Implicitly initialize all data to zero.


-save

Save variables (static allocation) except local variables within a recursive routine; opposite of -auto.


-CB

Performs run-time checks on whether array subscript and substring references are within declared bounds.

(e) Miscellaneous
Little endian to Big endian conversion in Fortran is done through the F_UFMTENDIAN environment variable. When set, the following operations are done:

  • The WRITE operation converts little endian format to big endian format.
  • The READ operation converts big endian format to little endian format.


F_UFMTENDIAN = big 

Convert all files.


F_UFMTENDIAN ="big;little:8" 

All files except those connected to unit 8 are converted.

Math libraries

Intel Math Kernel Library

MKL versions 10.2 (default), 10.1, and 9.1 are installed.

For complementary information look at user notes for Intel® Math Kernel Library for Linux Technical [here].

The Math Kernel Library includes the following groups of routines:

  • Basic Linear Algebra Subprograms (BLAS):

    • vector operations

    • matrix-vector operations

    • matrix-matrix operations

  • Sparse BLAS (basic vector operations on sparse vectors)

  • Fast Fourier transform routines (with Fortran and C interfaces)

  • LAPACK routines for solving systems of linear equations

  • LAPACK routines for solving least-squares problems, eigenvalue and singular value problems, and Sylvester's equations

  • Vector Mathematical Library (VML) functions for computing core mathematical functions on vector arguments (with Fortran and C interfaces).

Full documentation can be found at http://www.intel.com/software/products/mkl/

Directory structure
MKL is located in $MKL_ROOT, defined at login. Semantically, MKL consists of two parts: LAPACK and processor specific kernels. The LAPACK library contains LAPACK routines and drivers that were optimized as without regard to processor so that it can be used effectively on different processors. Processor specific kernels contain BLAS, FFT, CBLAS, and VML that were optimized for the specific processor. Threading software is supplied as a separate dynamic link library - libguide.so, when linking dynamically to MKL.
Linking with MKL
To use LAPACK and BLAS software you must link two libraries: LAPACK and one of the processor specific kernels (i.e. libmkl). Please use -L$MKL_ROOT instead of hardcoding the path. This will ensure that correct libraries are used when switching modules between different mkl versions.

Example:


ld myprog.o -L$MKL_ROOT -lmkl_lapack -lmkl

Example (Dynamic linking using ifort):


ifort -L$MKL_ROOT -o example example.o  -lmkl_lapack -lmkl

SGIs Scientific Computing Software Library (SCSL)

We no longer recommend the use of SGIs mathematical library routines known as SCSL. They are mostly provided to be used by old binaries. The libraries are located at /usr/lib and they are included with the link command:

 ifort
-L/usr/local/scsl/default/lib myprog.f -lscs -lsdsm 
The routines are callable from Fortran, C, and C++ programs.

The SCSL library contains:

  • BLAS
  • LAPACK
  • Signal processing library with FFTs.
  • Sparse direct solvers.
  • Sparse iterative solvers.

Executing parallel jobs

There are two main alternatives to develop program codes that can be executed on multiple shared memory CPUs namely OpenMP and MPI, and the two types of applications are executed differently.

Executing an OpenMP application

The number of threads to be used by the application must be defined, and should be less or equal to the number of CPUs that you allocate in your job submission. You can set the number of threads to be used by the application in two ways, either by defining a shell environment variable before starting the application or by calling an OpenMP library routine in the serial portion of the code.

  1. Environment variable:
    export OMP_NUM_THREADS=N
    time openmp.x
    
  2. Library routine:

    In Fortran:

    SUBROUTINE OMP_SET_NUM_THREADS(scalar_integer_expression)
    
    In C/C++:
    #include <omp.h>
    void omp_set_num_threads(int num_threads)
    
Note:
  • If not defined, the number of threads will be set to 1 on Mozart (this is system dependent).
  • If use of library routine is made, it has to be done in the serial portion of the code.
  • The maximum number of threads can be queried in your application by use of the external integer function:

    In Fortran:

    INTEGER FUNCTION OMP_GET_MAX_THREADS()
    
    In C/C++:
    #include <omp.h>
    int omp_get_max_threads(void)
    

Executing an MPI application

An execution of an MPI application on N CPUs is performed with the command mpprun in a submit script. Here is an example script:

#! /bin/sh

#SBATCH -n 2
#SBATCH -t 00:30:00

mpprun mpiprog

The script above, when submitted with sbatch, will run an MPI application on two ranks with a maximum walltime limit of 30 minutes.

Submitting jobs

There are two ways to submit jobs to the batch queue system, either as an interactive job or as a batch job. Interactive jobs are most useful for debugging as you get interactive access to the input and the output of the job when it is running. But the normal way to run the applications is by submitting them as batch jobs.

qsub is the name of the command for submitting jobs, both batch and interactive jobs.

Important: qsub takes a submit script as a parameter and not standard in or any binary application. The PBS-script must end its lines with a newline (\n) as is the default on Unix platforms. On Windows, line-endings are terminated with a combination of a carriage return (\r) and a newline(\n) - this will not work with qsub.

Example of useful arguments to qsub (more important arguments are marked bold), read the man page for additional arguments and details:


-A account_string

The project the job should be accounted on.

The account should be specified as given by the command
projinfo. That is:

For large and medium scaled projects all blanks " " should be removed and all "/" replaced with "-". For example to account on the SNAC project "SNIC 005/06-98" the string "SNIC005-06-98" should be used.

For small scaled projects, a letter "p" should be added in front of the project numbers. For example to account on the project "2006599" the string "p2006599" should be used.

Persons that are only a member in a single project with valid allocation may omit this argument.


-j oe

Join standard out (o) and standard error (e) to the same file. As default the standard out and standard error is saved in two different files.


-n N

The number of processors to run the job on, where N is an integer in the interval [1,64].


-t hh:mm:ss

The expected maximum execution time for the job.


--mail-type ALL

Send mail to the local user@mozart.nsc.liu.se when the job changes state. If none specified no mail is sent.


--mail-user email1@ifm.liu.se,email2@other.mail.se

List of e-mail addresses to send mail to. If not specified mail is sent to the local user@mozart.nsc.liu.se


-J myjobname

Name of the job, consisting of up to 15 printable, non white space characters with the first character alphabetic.

Submitting batch jobs

  1. Create a submit script. This is a shell-script with additional declarations for the arguments to sbatch, the arguments to sbatch are declared as #SBATCH sbatchargument, e.g. -J myjobname is specified as #SBATCH -J myjobname.

    Example of a submit script named sbatchsample.sh using 24 processors. The job will be accounted on the SNAC project "SNIC 005/06-98". The wall clock time is 10 minutes and an e-mail will be sent when the job exits normally or exits with an error:

    #!/bin/sh
    
    # Account the job on the SNAC project "SNIC 005/06-98"
    #SBATCH -A SNIC005-06-98
     
    # Request 24 processors for the job and request 10 minutes of wall-clock time. 
    #SBATCH -n 24
    #SBATCH -t 00:10:00 
    #SBATCH --mail-type=END
    
    # Start the job with mpirun on the processors that the batch queue system have 
    # allocated for your job.
    mpprun hello_world.icc.openmpi
  2. Submit the job, by specifying the PBS-script as the only argument to qsub:

    panor@mozart:~> sbatch sbatchsample.sh
  3. Check the status of the job with qstat; by specifying qstat -n you get information about allocated processors. For even more details add -f as an argument. You should also try using squeue.

    panor@mozart:~/calc/mpiprog> qstat
    Job id              Name             User             Time Use S Queue
    ------------------- ---------------- ---------------- -------- - -----
    140.mozart          pbssample.sh     panor            63:24:23 R workq
                                    
  4. Since we specified -m ae, an e-mail will be sent to the local user at Mozart when/if the job exits and aborts. The standard out and standard error is saved in the directory from where the job was submitted as pbssample.sh.o5723

Automatic cleanup of scratch directory

Automatic cleanup of files in /scratch is performed. Files that have not been modified within the last 14 days are deleted.

Large storage that is not deleted is located at /nobackup/global but, as the name suggests, this directory is not backed up. For large and highly secure storage, please contact support@nsc.liu.se.

Frequently used commands

Read the man-pages for more information about each listed command below. Commonly used commands are marked bold.
Frequently used SLURM user commands:
squeue

Submits a job to the SLURM queuing system.

squeue

Show status of SLURM batch jobs.

scancel

Delete a SLURM job from the queue.

Less frequently used PBS user commands:
qalter

Modifies the attributes of a job.

qhold

Requests that the PBS server place a hold on a job.

qrerun

Reruns a PBS batch job.

qrls

Release hold on PBS batch job.

qsig

Requests that a signal be send to the session leader of a batch job.






Page last modified: 2011-03-15 13:42
For more information contact us at info@nsc.liu.se.