ABINIT vs VASP Round 2

In the second round of ABINIT vs VASP comparison, I am using a bigger 128-atom supercell of Li2FeSiO4 with one k-point, as is customary for big supercells. Compared to the silicon supercell that was studied in round 1, this system is big enough to be able to scale well across a few nodes in a cluster. This will tell us something about the parallel scaling. These are the results using optimized settings for both VASP and ABINIT on the Matter cluster at NSC:

Speed ABINIT vs VASP for 128 atoms

The parallel scaling of ABINIT is quite remarkable. It is superlinear for 1-4 nodes, and with 16 nodes the efficiency is still above 100%. Clearly, ABINIT is superior to VASP in terms of parallel scaling.

So should we all run ABINIT and enjoy tremendous speed-ups? The astute reader will notice that I have played a trick and only showed the normalized results. The chart above does not show the actual speed, only the parallel scaling relative to a single node run. The reason is that I want to compare to ideal linear scaling. Comparing actual speed, however, reveals another picture:

Speed ABINIT vs VASP for 128 atoms

VASP is 4x faster than ABINIT, despite worse parallel scaling. It makes me wonder if I missed some setting in ABINIT that magically reduces all the data structures by half. In particular, I was expecting gamma-point only optimization options, but it does not seem to exist in ABINIT? So in order to make it at all comparable, I used the normal -DNGZhalf version of VASP. For actual calculations with the gamma-point only version, VASP would be even faster than shown here.

Parallelization settings in ABINIT

ABINIT can parallelize over k-points, spin, bands and FFTs. Here, it is just 1 k-point, and no spin polarization, so there are two influential parameters which affects parallelization: npbands and npfft. The general recommendation in the ABINIT manual is to use npbands = number of cores, and npftt=1 for runs of up to a few nodes, and to aim for npband >= 4*npfft for wide parallel runs. Just like NPAR needs to be optimized for VASP, different npband and npfft combinations should be tested for ABINIT. These are the values I found, and what I used to generate the comparison above:

Nodes Best combination Speed (Jobs/h)
1 npbands=8/npfft=1 0.48 |
2 npbands=8/npfft=2 1.30 |
4 npbands=8/npfft=4 3.00 |
8 npbands=8/npfft=8 5.32 |
16 npbands=8/npfft=16 10.13 |

The general rule from the manual does not hold exactly here. Using only band parallelization is optimal for a single node, but once we run across nodes in the network, we need to activate the FFT parallelization for best performance. The rule seems to be:

npbands = 8 (cores per node?)
npfft = number of compute nodes

The choice can be quite influential: with 16 nodes, npband=128/npfft=1 is almost three times slower. But please keep in mind that these settings may not be universal.