On September 8, Intel finally lifted the veil and revealed the new Xeon E5 server processors based on the “Haswell” architecture. These are the processers that you are likely to find in new supercomputer and cluster installations during the next years.
The main improvements are:
- Up to 18-cores per processor socket. I expect that the mainstream configuration will be 10-14 cores/socket, so your typical 2-socket compute node will have 20-28 cores, and twice that number of threads with hyperthreading enabled.
- Faster memory bandwidth with up to 2133 MHz DDR4-memory. Early benchmarks suggest a 40% improvement in bandwidth vs Triolith-style hardware (based on the “Sandy Bridge” platform). This is especially important for electronic structure codes, which tend to be limited by memory bandwidth.
- Improved vectorization with AVX2 instructions. This can theoretically double the floating point arithmetics performance, but it reality there is diminishing return beyond some point for longer vectors. We expect +25% out of it at most. You will need to recompile your codes, or link to AVX2-enabled libraries such as Intel’s MKL, to use this feature.
- Faster single core performance. Fortunately, the processor cores are still getting faster. Clock frequencies are not increasing, but according to Intel, the Haswell cores have about 10% better throughput of instructions per clock cycle. This is mainly from improvements in caches and better branch predictions, so it might not necessarily improve an already well-tuned and vectorized code.
Further reading: A longer technical overview of the Xeon E5 v3 series processors is available at enterprisetech.com and the old review of the Haswell microarchitecture on realwordtech.com is still relevant.
Upcoming Haswell-based systems in Sweden
So when can you get access to hardware like this as a supercomputing user in Sweden?
- PDC in Stockholm has just announced that they will be installing a new 1+ petaflops Cray XC30 system to replace the “Lindgren” Cray XE6 system. It will be based on the 16-core variant of these new processors for a total of 32 cores per node. The system will be available for SNIC users from January 1st 2015.
- NSC will install a new cluster dedicated to weather forecasting in late 2014 based on the 8-core variant. This system belongs to the SMHI and will not be available for SNIC users, but it will be an interesting configuration, with very good balance between compute power, memory bandwidth, parallel interconnect performance and storage. While being optimized for weather forecasting, it could also perform very well on electronic structure workloads.
I expect to be able to work on VASP installations and run benchmarks on both of these systems during fall/winter, so please check in here later.