Running VASP

This tutorial will get you started with the basics of running VASP on NSCs clusters. Here, we will use the Triolith system, but following the same steps works on Gamma as well.

Checklist

In order to run VASP at NSC, you need to have

  • A computer time allocation on the cluster. This means that you need to be a member of an existing compute project, or apply for one yourself. This process is further described under Applying for a new project.
  • A user account on a cluster, so that you can log in using SSH. If you are a member of a project, you can request an account.
  • A software license for VASP, as it is not free software. Each license only allows a certain number of named people to use the software. Your name needs to be on the list of a license. If you are Ph.D student, we suggest that you check with your supervisor, most likely, they will know which VASP license you are covered by.

Log in to Triolith

The first step is to log in to Triolith. On a Mac or Linux machine, you start by opening a terminal window and initatiating a connection with the ssh program. On Windows, you could use a program like "PuTTY" to connect using ssh.

$ ssh x_username@triolith.nsc.liu.se

The welcome message is then displayed and you are logged in to the so-called "login node" of the cluster. This is the place where you prepare your calculations and send jobs to the compute nodes. Please note that you share this node with all the other users logged into Triolith at that moment, and it is only a single server, so you cannot (and should not try to) run any real calculations on this server.

Last login: Thu Jul  2 11:21:20 2015 from ...
Welcome to Triolith!

PLEASE READ THE USER GUIDE: http://www.nsc.liu.se/systems/triolith/

Note: Triolith has two login nodes, triolith1 and triolith2. If this
one is unavailable you can try the other one ("ssh triolith2.nsc.liu.se").

//NSC Support <support@nsc.liu.se>
[x_username@triolith1 ~]$ 

I would recommend that the first command you run is projinfo -C, which shows the status of the projects that you are a member of.

[x_username@triolith1 ~]$ projinfo -C
You are a member of 1 active project.

SNIC 2016/XX-YY
═══════════════
Principal Investigator (PI):   Firstname Lastname
Project storage directory:     /proj/directory
Slurm account:                 snic2016-XX-YY
Current core time allocation:  ZZZZZ h/month

If you are not a member of any project, you cannot run any calculations on Triolith. projinfo will in that case tell you that you are not a member of any active project.

Preparing input files

Before setting up a new calculation, you need to determine where to store the input and output files. You have several options:

  • Your private Unix home directory /home/x_username.
  • A shared storage area usually shared with all the other project members, usually located under /proj/[projectname]/username/.
  • The possibility of using the local hard drives of the compute nodes while a job is running.

The shared project storage is located on a high-performance clustered file system, so that is what where we typically recommend that you store your ongoing calculations. If you do not know where your project storage is located, you can try running the snicquota command, it will show how much you can store, and also the name of your project storage directory. The project storage is fully described in the storage documentation.

In this tutorial, we will try an example from the VASP web page, a CO molecule on a Ni (111) surface. After having moved to the project directory, we download the archived input files and decompress them:

$ cd /proj/[projectname]/username
$ wget http://www.vasp.at/vasp-workshop/examples/3_5_COonNi111_rel.tgz
$ tar 3_5_COonNi111_rel.tgz
$ cd 3_5_COonNi111_rel.tgz
$ ls
INCAR  KPOINTS  POSCAR  POTCAR

You should find the 4 required input files for VASP: INCAR, KPOINTS, POSCAR, and POTCAR. Now to run, we only need to request a compute node and invoke the actual VASP program.

Finding the VASP binaries

There are two ways to get hold of the VASP program: you either download the source code for the VASP program and compile it yourself on the cluster, or you use the preinstalled binaries that are available in the directory /software/apps/vasp/ on the clusters. Please note that in order to use NSC binaries of VASP, you need to tell us about your VASP license. We have a page describing how that procedure works.

Compiling the VASP program from source code is necessary if you need to do modifications or install extra add-on packages. It is quite straightforward and NSCs support people can provide makefiles and instructions if you need help. There is a guide for how to compile VASP on Triolith, and you can often find makefiles in the /software/apps/vasp/ directory.

For this example, we will select the latest preinstalled version of VASP. An overview of all the VASP installations we have is available on the Triolith software depository page. When this tutorial was written, version 5.3.5-01Apr14 "build02" was the most recent standard version. The directory paths to the three standard binaries is:

/software/apps/vasp/5.3.5-01Apr14/build02/vasp
/software/apps/vasp/5.3.5-01Apr14/build02/vasp-gamma
/software/apps/vasp/5.3.5-01Apr14/build02/vasp-noncollinear

You do not need to copy the files from there into your home or working directory. The /software is available on all compute nodes and the intention is that you should start these binaries directly. Note that NSC's software installation policy is to never remove old versions unless they are fundamentally broken, so you can rely on the binaries being there for the full lifetime of the cluster.

How to run the VASP program

VASP is a parallel program meant to run on many processor cores (or compute nodes) simultaneously using MPI for communication, so you should start the program with the mpprun command in your job script or in the interactive shell (see below), for example:

mpprun /software/apps/vasp/5.3.5-01Apr14/build02/vasp

Usually, you do not need to give any flags to mpprun, it will automatically figure out how many cores it should use, and how to connect to the other compute nodes. Please keep in mind that mpprun is special command that only exists on NSCs clusters. More information about mpprun is available in the NSC build environment description and the mpprun software page.

Test run the calculation

It is not advisable to run directly on the machine where you logged in (the "login node"). If you want to test your calculation before running it for real in the batch queue, you allocate one or more compute nodes for interactive use with the interactive command instead. This example is a very small calculation (7 atoms / 48 bands / 12 k-points), so we only need a single compute node. After some, hopefully short, waiting time, you will get a command shell on a compute node, where you can run VASP. Here, we are using 1 compute node, so VASP will run on 16 processor cores in parallel.

interactive -N 1 -t 00:20:00
.......
[pla@n448 3_5_COonNi111_rel]$ mpprun /software/apps/vasp/5.3.5-01Apr14/default/vasp
mpprun INFO: Starting impi run on 1 node ( 16 ranks )
 running on   16 total cores
 distrk:  each k-point on   16 cores,    1 groups
 distr:  one band on    1 cores,   16 groups
 using from now: INCAR     
 vasp.5.3.5 31Mar14 (build Apr 08 2014 11:32:36) complex                        
  
 POSCAR found :  3 types and       7 ions    ...

Check that the calculation starts without errors. This one will take around 10-12 seconds to finish on Triolith, so you can actually wait for it to finish. For a real calculation, you would likely have to stop it after a few iterations. Afterwards, collect some parameters from the OUTCAR file, like the time required for one SCF iteration and the number of electronic bands. Use the timing of the first iterations to extrapolate the run time for the whole calculation, as you will need to make an estimate when you write job script. The timings for each ionic step can be extracted from the OUTCAR file with the grep command using e.g.

$ grep LOOP+ OUTCAR
LOOP+:  cpu time    3.70: real time    3.80
LOOP+:  cpu time    1.88: real time    1.88
LOOP+:  cpu time    1.59: real time    1.59
LOOP+:  cpu time    0.95: real time    0.95

This should give you some idea of the amount of time required to run the full calculation. If it runs to slowly, you will have to either use more compute nodes or adjust the settings (for example use less k-points or a smaller basis set).

How many compute nodes can I use?

Part of the fun with using a supercomputing centre is that you can on many processors in parallel and thus speed up your calculations. Unfortunately, there is a limit on how many cores you can use efficiently. What is a good number? Two rough guidelines are:

  • Number of cores < number of atoms (e.g. with 64 atoms, aim for having not more than 64 cores or 4 compute nodes)
  • Number of cores < number of bands/8, i.e. you want to have at least 8 bands per core when using band parallelization.

More information about this topic can be found in the article "Selecting the Right Number of Cores for a VASP Calculation" by Peter Larsson at NSC.

To get good speed when running on many compute nodes, you also need to adjust some of the input parameters in the INCAR file, which influence the parallelization scheme. The two most influential ones are NCORE and KPAR, for band and k-point parallelization, respectively. For NCORE, try setting it to the number of cores that you use per compute. Typically, that is 16 on Triolith.

NCORE = 16

If you have more than one k-point in the calculation, you can try parallelization over k-points. Try setting KPAR to number of compute nodes or number of k-points, whichever is smaller.

KPAR = min(number of compute nodes,NKPT)

It works best with medium-sized hybrid calculations, where you have a few k-points and a lot of computational work per k-point. It does not work as well for metals, for example, you cannot expect to run a small metallic cell with 1000 thousands of k-points efficiently on 1000 compute nodes.

Running jobs in the batch queue

The best way to run many calculations is to prepare all of them at the same time, put them in Triolith's job queue, and then let the system start the jobs as soon as there are enough compute available for you. To do this, you need to write a so-called job script for each job. It tells Triolith how much resources you need (e.g. how many nodes for how long time) and what you want to run on the nodes. The job script is typically written in bash and a minimal job script for VASP looks like this:

#!/bin/bash
#SBATCH -J jobname
#SBATCH -N 1
#SBATCH --exclusive
#SBATCH -t 12:00:00
#SBATCH -A snic2015-x-yyy

mpprun /software/apps/vasp/5.3.5-01Apr14/build02/vasp

It request 1 compute node for exclusive use for 12 hours with a specific job name. If you have computer time allocation in several project, you also need to specify which project you want to account for this job.

Note that this script assumes that you send it to the job queue while being in the same directory as you have the input files. Otherwise, you will need make a cd command inside the script to move to the right directory with the input files before you start VASP with mpprun.

To send it to the job queue, use the sbatch command.

[pla@n448 3_5_COonNi111_rel]$ sbatch job.sh
Submitted batch job 6944017

The sbatch command gives you a job number that you can use to track the job and it see it status in the queue system. You can do this with the squeue command.

[pla@n448 3_5_COonNi111_rel]$ squeue -u x_username
  JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
6944017  triolith   COonNi      pla  PD       0:00      1 (None)

Here, the job is still waiting in the queue, but when it starts running, you will the status has changed to "R".

[pla@n448 3_5_COonNi111_rel]$ squeue -u x_username
  JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
6944017  triolith   COonNi      pla   R       0:05      1 n122

Note that it is possible to inspect the output files and follow the progress of the job while it is running. Typically, people look at the OSZICAR file, which provides condensed output, one line per SCF iteration.

[pla@triolith1 3_5_COonNi111_rel]$ cat OSZICAR 
N       E                     dE             d eps       ncg     rms          rms(c)
DAV:   1     0.247881146959E+03    0.24788E+03   -0.24207E+04  1568   0.100E+03
DAV:   2    -0.349973712274E+02   -0.28288E+03   -0.25375E+03  1760   0.197E+02
DAV:   3    -0.473201701633E+02   -0.12323E+02   -0.12181E+02  1920   0.477E+01

When the job has finished, it disappears from the job list, and you will find the full set of output files in the job directory.

[pla@triolith1 3_5_COonNi111_rel]$ ls
CHG     CONTCAR  EIGENVAL  INCAR   KPOINTS  OUTCAR  POSCAR  slurm-6944033.out  WAVECAR
CHGCAR  DOSCAR   IBZKPT    job.sh  OSZICAR  PCDAT   POTCAR  vasprun.xml        XDATCAR

If there was some kind of problem with job, e.g. if it crashes or terminated earlier than expected, you should look inside the output file from queue system. It is called "slurm-jobid.out". It contains what you would normally see in the terminal window when you run the program manually. In this case, everything looks ok.

[pla@triolith1 3_5_COonNi111_rel]$ tail slurm-6944017.out
DAV:   1    -0.408333066044E+02   -0.63615E-04   -0.23970E-02  1504   0.686E-01    0.385E-02
RMM:   2    -0.408342800080E+02   -0.97340E-03   -0.26240E-04   852   0.123E-01    0.435E-01
RMM:   3    -0.408333994813E+02    0.88053E-03   -0.12356E-04   741   0.824E-02    0.162E-01
RMM:   4    -0.408333266195E+02    0.72862E-04   -0.19571E-05   686   0.292E-02
   4 F= -.40833327E+02 E0= -.40828786E+02  d E =-.104236E-03
 BRION: g(F)=  0.160E-03 g(S)=  0.000E+00 retain N=  2 mean eig= 0.27
 eig:   0.281  0.255
 reached required accuracy - stopping structural energy minimisation
 writing wavefunctions
mpprun INFO: Elapsed time (h:m:s):  0:00:12.345511

This concludes the VASP tutorial.


User Area

User support

Guides, documentation and FAQ.

Getting access

Applying for projects and login accounts.

System status

Everything OK!

No reported problems

Self-service

SUPR
NSC Express