The Intel Math Kernel Library (MKL) is available, and we strongly recommend using it. Several versions of MKL may exist, you can see which versions are available with the "module avail" command. The library includes the following groups of routines:
The Intel MKL installations are located in the /software/intel directory, usually as part of an Intel Composer installation (compiler + MKL and other tools).
When you have loaded an mkl module (or the build-environment/nsc-recommended module, which contains MKL), the environment variable
$MKL_ROOT will point to the MKL installation directory for that version, e.g.:
$ module load mkl/126.96.36.199 Unloading conflicting module 'mkl/10.3.10.319' before proceeding $ echo MKL_ROOT /software/apps/intel/composer_xe_2013_sp1.2.144/mkl
The MKL consists of two parts: a linear algebra package and processor specific kernels. The former part contains LAPACK and ScaLAPACK routines and drivers that were optimized as without regard to processor so that it can be used effectively on different processors. The latter part contains processor specific kernels such as BLAS, FFT, BLACS, and VML that were optimized for the specific processor. It is generally best to let MKL decide for itself which kernels to use, and use the automatic linking features, unless you are compiling a program that you intend to run on another architecture (i.e. compiling software for Kappa on Triolith).
If you want to build an application using MKL with the Intel compilers at NSC, we recommend using the flag
-Nmkl (to get your application correctly tagged) and the flag
-mkl flag is available in Intel compilers from version 11. Some examples:
ifort -Nmkl -mkl=parallel ..
will link the with the (default) threaded Intel MKL. Be careful if you use this with an MPI program.
ifort -Nmkl -mkl=sequential ..
will link with the sequential version of Intel MKL. This is usually best for MPI programs.
ifort -Nmkl -mkl=cluster ..
will link with Intel MKL cluster components (sequential) that use Intel MPI. If you use this option you should also load an MPI module (e.g
module load impi).
If, for some reason, you cannot use the
-mkl flag, please read the Intel documentation to find out what linker flags you need. You will probably find the Intel MKL link line advisor very useful.
The MKL is threaded by default, but there is also a non-threaded "sequential" version available. (The instructions here are valid for MKL 10.0 and newer, older versions worked differently.)
If threaded or sequential MKL gives best performance varies between applications. MPI applications will typically launch one MPI-rank on each processor core on each node, in this case threads are not needed as all cores are already used. However if you use threaded MKL you can start fewer ranks per node and increase the number of threads per rank accordingly.
The threading of MKL can be controlled at run time through the use of a few special environment variables.
OMP_NUM_THREADS controls how many OpenMP threads that should be started by default. This variable affects all OpenMP programs including the MKL library.
MKL_NUM_THREADS controls how many threads MKL-routines should spawn by default. This variable affects only the MKL library, and takes precedence over any OMP_NUM_THREADS setting.
MKL_DOMAIN_NUM_THREADS let the user control individual parts of the MKL library. Suppose you would like instruct MKL to use 1 thread by default, 2 threads for BLAS calculations, and 4 threads for FFT routines, then following could given:
AMD Core Math Library is another option for BLAS, LAPACK and FFT subroutines. You might be able to find installations of ACML on some of our clusters, but ACML is optimized for AMD processors, however, so we don't recommend using it, as the performance will be suboptimal.
OpenBLAS is an open source project supported by the Lab of Parallel Software and Computational Science, ISCAS in China. It provides optimized BLAS and LAPACK subroutines. It was forked from GotoBLAS2 1.13 BSD, so if you are looking for "GotoBLAS", this is probably what you should use instead, as GotoBLAS is not being developed anymore. The performance of OpenBLAS can be very good, and many of the critical subroutines like
DGEMM will match the speed of Intel MKL.
The FFTW library is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, of arbitrary input size, and of both real and complex data (as well as of even/odd data, i.e. the discrete cosine/sine transforms or DCT/DST). There are usually several versions installed on most clusters, both the legacy version 2.x and 3.x.
If you still want to link to FFTW, find the installation directory, typically
/software/apps/fftw/version/compiler/.. and then add e.g. the following to the linking line in the makefile: