Intel Math Kernel Library (MKL)

The Intel Math Kernel Library (MKL) is available, and we strongly recommend using it. Several versions of MKL may exist, you can see which versions are available with the "module avail" command. The library includes the following groups of routines:

  • BLAS (Basic Linear Algebra Subprograms)
    • vector operations
    • matrix-vector operations
    • matrix-matrix operations
  • Sparse BLAS (basic vector operations on sparse vectors)
  • Fast Fourier transform routines (with Fortran and C interfaces). There exist wrappers for FFTW 2.x and FFTW 3.x compatibility.
  • LAPACK routines for solving systems of linear equations, least-squares problems, eigenvalue and singular value problems, and Sylvester's equations
  • ScaLAPACK routines including a distributed memory version of BLAS (PBLAS or Parallel BLAS) and a set of Basic Linear Algebra Communication Subprograms (BLACS) for inter-processor communication.
  • Vector Mathematical Library (VML) functions for computing core mathematical functions on vector arguments (with Fortran and C interfaces).

MKL library structure

The Intel MKL installations are located in the /software/intel directory, usually as part of an Intel Composer installation (compiler + MKL and other tools).

When you have loaded an mkl module (or the build-environment/nsc-recommended module, which contains MKL), the environment variable $MKL_ROOT will point to the MKL installation directory for that version, e.g.:

$ module load mkl/11.1.2.144 
Unloading conflicting module 'mkl/10.3.10.319' before proceeding
$ echo MKL_ROOT
/software/apps/intel/composer_xe_2013_sp1.2.144/mkl

The MKL consists of two parts: a linear algebra package and processor specific kernels. The former part contains LAPACK and ScaLAPACK routines and drivers that were optimized as without regard to processor so that it can be used effectively on different processors. The latter part contains processor specific kernels such as BLAS, FFT, BLACS, and VML that were optimized for the specific processor. It is generally best to let MKL decide for itself which kernels to use, and use the automatic linking features, unless you are compiling a program that you intend to run on another architecture (i.e. compiling software for one cluster on another).

Linking with MKL

If you want to build an application using MKL with the Intel compilers at NSC, we recommend using the flag -Nmkl (to get your application correctly tagged) and the flag -mkl=MKLTYPE. The -mkl flag is available in Intel compilers from version 11. Some examples:

ifort -Nmkl -mkl=parallel ..

will link the with the (default) threaded Intel MKL. Be careful if you use this with an MPI program.

ifort -Nmkl -mkl=sequential ..

will link with the sequential version of Intel MKL. This is usually best for MPI programs.

ifort -Nmkl -mkl=cluster ..

will link with Intel MKL cluster components (sequential) that use Intel MPI. If you use this option you should also load an MPI module (e.g module load impi).

If, for some reason, you cannot use the -mkl flag, please read the Intel documentation to find out what linker flags you need. You will probably find the Intel MKL link line advisor very useful.

MKL and threading

The MKL is threaded by default, but there is also a non-threaded "sequential" version available. (The instructions here are valid for MKL 10.0 and newer, older versions worked differently.)

If threaded or sequential MKL gives best performance varies between applications. MPI applications will typically launch one MPI-rank on each processor core on each node, in this case threads are not needed as all cores are already used. However if you use threaded MKL you can start fewer ranks per node and increase the number of threads per rank accordingly.

The threading of MKL can be controlled at run time through the use of a few special environment variables.

OMP_NUM_THREADS controls how many OpenMP threads that should be started by default. This variable affects all OpenMP programs including the MKL library.

MKL_NUM_THREADS controls how many threads MKL-routines should spawn by default. This variable affects only the MKL library, and takes precedence over any OMP_NUM_THREADS setting.

MKL_DOMAIN_NUM_THREADS let the user control individual parts of the MKL library. Suppose you would like instruct MKL to use 1 thread by default, 2 threads for BLAS calculations, and 4 threads for FFT routines, then following could given:

MKL_DOMAIN_NUM_THREADS="MKL_ALL=1;MKL_BLAS=2;MKL_FFT=4"
Beware Always set OMP_NUM_THREADS or MKL_NUM_THREADS if you want multithreading! If it is unset when launching an MPI application with mpprun, mpprun will by default set OMP_NUM_THREADS=1, thus disabling multithreading.

ACML

AMD Core Math Library is another option for BLAS, LAPACK and FFT subroutines. You might be able to find installations of ACML on some of our clusters, but ACML is optimized for AMD processors, however, so we don't recommend using it, as the performance will be suboptimal.

OpenBLAS / GotoBLAS

OpenBLAS is an open source project supported by the Lab of Parallel Software and Computational Science, ISCAS in China. It provides optimized BLAS and LAPACK subroutines. It was forked from GotoBLAS2 1.13 BSD, so if you are looking for "GotoBLAS", this is probably what you should use instead, as GotoBLAS is not being developed anymore. The performance of OpenBLAS can be very good, and many of the critical subroutines like DGEMM will match the speed of Intel MKL.

FFTW

The FFTW library is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, of arbitrary input size, and of both real and complex data (as well as of even/odd data, i.e. the discrete cosine/sine transforms or DCT/DST). There are usually several versions installed on most clusters, both the legacy version 2.x and 3.x.

NSC generally recommends using the FFT routines in Intel MKL library instead of linking to FFTW, as the performance is usually better. If your code is using FFTW 3+, you do not need to make any modification to the source to use Intel's FFT, as MKL provides wrapper subroutines that match FFTW interface.

If you still want to link to FFTW, find the installation directory, typically /software/apps/fftw/version/compiler/.. and then add e.g. the following to the linking line in the makefile:

-L/software/apps/fftw/3.3.2/i1214/lib -lfftw3