If you have hard-coded the number of cores per node, please note that Triolith has 16 cores per compute node, not 8 as on the previous generation (e.g Neolith/Kappa/Matter).
NOTE: It is not recommended to hard-code the number of cores in this way. It will break your jobs if you run on e.g the "huge" Kappa nodes. Please use the relevant SLURM environment variables instead, e.g
$SLURM_JOB_CPUS_ON_NODE. For more information, read the sbatch man page.
There is still a local scratch disk available on each node, but you can no longer write files directly to /scratch/local. Instead use the environment variable
$SNIC_TMP, which will be set to a directory that will be created for each job (and deleted when the job ends).
E.g: if your job old script looks like this
#!/bin/bash #SBATCH -t 00:10:00 # ./myapp --tempdir=/scratch/local
then change it to
#!/bin/bash #SBATCH -t 00:10:00 # ./myapp --tempdir=$SNIC_TMP
--tempdir in the example above is just used in this example, not a magic option that you can use for any application to control where it write temporary files!
This change is being done to enable sharing of nodes between jobs.
By the way: there are other standardized SNIC environment variables that you can use to make your job script more portable across all SNIC HPC sites:
|SNIC_SITE||At what SNIC site am I running?||nsc|
|SNIC_RESOURCE||At what compute resource at the SNIC site $SNIC_SITE (see above) am I running?||triolith|
|SNIC_BACKUP||Shared directory with tape backup||/home/$USERNAME|
|SNIC_NOBACKUP||Shared directory without tape backup||/proj/PROJECT / users / USERNAME (if such a directory exists)|
|SNIC_TMP||Recommended directory for best performance during a job (local disk on nodes if applicable)||set for each individual job, e.g /scratch/local/12345|
Sometimes it can be useful to use the "module" command in job scripts. By doing this, you do not have to remember loading certain module before submitting the job.
#!/bin/bash #SBATCH --time=10:00:00 #SBATCH --nodes=2 #SBATCH --exclusive module load someapp/1.2.3 someapp my_input_file.dat
If you write your job scripts using /bin/bash, /bin/csh or /bin/tcsh, the "module" command is automatically available from your script.
If you use /bin/ksh or /bin/sh, the
module command is not available.
Take a look at the Triolith specific batch job and scheduling information.
Recompile your own applications! If you have previously compiled your own software we definitely recommend recompiling it on Triolith. See instructions on this page for how to build your applications on Triolith.
There are basically two compiler suites supported by NSC on Triolith, Intel (intel) and the GNU compiler collection (gcc). Other compilers may be installed and made accessible in due course, but support will be restricted to intel and gcc for the foreseeable future. The optimization instructions below only refer to these.
Support for the Sandybridge type of processor in Triolith is invoked by means of the
-xAVX switch to all intel compilers (icc, ifort and icpc). Alternatively if you compile on Triolith you can invoke the
-xHost switch which makes the compilation default to the highest available instruction set on the compilation host machine, effectively
-xAVX on Triolith.
Code compiled this way on Triolith can only be run on Intel processors supporting the AVX instruction set, at the time this is written only Intel Sandybridge and Ivybridge (consumer level processors presently) based CPU:s support AVX. If support for generic x86 processors, earlier Intel CPU:s and AMD CPU:s, is desired the
-axAVX can be attempted. There may be a performance penalty on Triolith using this option which will have to be checked on a case by case basis.
Regarding the global optimization level switch
-O<X>, where X is between 0-3 for the Intel compilers, it is tempting to turn this all the way up to 3. However, this will not unequivocally yield better performing binaries, often they will perform worse than those using the default
-O2, but it will unconditionally lead to more compilation trouble for any decent size code. If you still choose to try the
-O3 switch, it is good practise to also add the
-no-ipo switch which removes many problems related to the use of
If you have OpenMP code to compile, you need to also add the
-openmp switch to enable this by the intel compilers.
ifort -O2 -xAVX -o mybinary mycode.f90 icc -O3 -no-ipo -xAVX -o mybinary mycode.c icpc -xHost -openmp -o mybinary mycode.cpp #default global optimization level is "-O2"
The GNU compilers shipped with CentOS 6 (the operating system on Triolith) were released well before the Intel Sandybridge line of processors. The support for AVX is therefore not as well developed on these compilers as in the Intel compilers. There is some support however and later compiler releases can be expected to produce better performing binaries.
A good choice of GCC compiler flags on Triolith is
-O3 -mavx -march=native for any installed version of GCC, either those shipped with CentOS 6 or those installed by NSC, accessible via the module system. A binary built this way will run exclusively on AVX capable CPU:s. The choice of
-O3 is safe for GCC compilers in general as the developers are more conservative with respect to numerically less precise code generation.
If you instead desire a binary capable of running on generic x86 CPU:s while retaining some tuning similar to the Intel
-axAVX switch you could consider the switches
-O3 -mtune=native -msse<X> with a suitable value for
-msse3 should let the binary run on the vast majority of current HPC CPU:s from both AMD and Intel. If your code uses OpenMP you are advised to use the
-fopenmp switch to make use of this feature,
gfortran -O3 -mavx -march=native mybinary mycode.f90 gcc -O3 -mtune=native -msse3 -o mybinary mycode.c g++ -O3 -mavx -march=native -fopenmp -o mybinary mycode.cpp
The normal NSC compiler wrappers and mpprun are available, so to build and run an MPI application you only need to do load a module containing an MPI (e.g build-environment/nsc-recommended) and add the
-Nmpi flag when compiling.
module add build-environment/nsc-recommended icc -Nmpi -o myapp myapp.c
To run such an application you only need to use mpprun to start it, e.g
The compiler wrapper and mpprun will handle the compiler options to build against the loaded MPI version (Intel MPI in this case, it's part of build-environment/nsc-recommended) and how to launch an MPI application built against that MPI.
Currently both Intel MPI and OpenMPI are installed and supported on Triolith.
NSC recommends Intel MPI, as it has shown the best performance for most applications. However, if your application does not work/compile with Intel MPI or gets better performance when using OpenMPI, please use that instead.
You can see which versions of Intel MPI and OpenMPI are installed by running "module avail" (look for "impi" and "openmpi").
Intel MPI is loaded in the build-environment/nsc-recommended module. To use Intel MPI with the Intel compilers, just load build-environment/nsc-recommended. To use OpenMPI, load build-environment/nsc-recommended and then load an openmpi module (which will then unload the impi module).
The recommended way to build MPI binaries at NSC is to load an MPI module (e.g
module load build-environment/nsc-recommended), then compile your application normally.
Our compiler wrappers will figure out how to link your application against the version of MPI corresponding to the module that you loaded.
Once built, we recommend that you use our MPI launcher
mpprun. When launching the binary using mpprun, you do not need to specify how many ranks to start or which MPI should be used, mpprun will figure that out from the binary and the job environment.
Important note regarding OpenMPI performance
The currently (2012-10-24) installed OpenMPI version is close to IntelMPI performance wise only if you set the core binding yourself using extra flags to mpprun (unlike IntelMPI where this is done by default). We are working on incorporating this into mpprun, but there are many corner cases to work out. Until this is done, we recommend that you launch your OpenMPI applications like this:
mpprun --pass="--bind-to-core --bysocket" /software/apps/vasp/5.3.2-13Sep12/openmpi/vasp-gamma
The Intel Math Kernel Library (MKL) is available, and we strongly recommend using it. Several versions of MKL may exist, you can see which versions are available with the "module avail" command.
See Math libraries for more information.
Note: If you on e.g Neolith have used the "-N" option (e.g -N2) only to get a number of full nodes, on Triolith you need to add
--exclusive (or some other means of specifying the number of cores to allocate).
sbatch -N2 will give you a total of two cores spread out over two nodes, which is probably not what you want.
If you are a member of a single project, the scheduler assumes that your will run all your jobs using that project.
When you are a member of multiple projects however, the scheduler cannot decide for you which project to use, to you need to specify which project to use for each job.
If you don't specify a project when you're a member of multiple project, you will get the following error:
sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified
Note: since the PI of a project may add new members at any time, you might suddenly find yourself a member of multiple projects, so if you want to be on the safe side, always specify which project to use, even if you are currently a member of just one.
You can specify which project to use for a job in several ways:
--account=PROJECT_NAMEas an option to sbatch or interactive on the command line (e.g
interactive -A snic-123-456)
#SBATCH -A PROJECT_NAMEor
#SBATCH --account=PROJECT_NAMEto your job script
Note: replace PROJECT_NAME in the examples with your actual project name, which you can find in NSC Express (the Resource Manager Name line when you view a project) or by running
projinfo on Triolith.