Berzelius is the premier AI/ML cluster at NSC. It was donated to NSC by the Knut and Alice Wallenberg foundation in 2020 and it was installed in the spring of 2021. It is used for research by Swedish academic research groups.
Access for projects to Berzelius is granted by NSC via an application process in SUPR. Berzelius will be opened for project applications in May 2021, and general access for research projects on Berzelius is expected by June 2021.
Berzelius is an NVIDIA® SuperPOD consisting of 60 NVIDIA® DGX-A100 compute nodes supplied by Atos. Each DGX-A100 node is equipped with 8 NVIDIA® A100 Tensor Core GPUs, 2 AMD Epyc™ 7742 CPUs, 1 TB RAM and 15 TB of local NVMe SSD storage. The A100 GPUs have 40 GB on-board HBM2 VRAM.
Fast compute interconnect is provided via 8x NVIDIA® Mellanox® HDR per node connected in a non-blocking fat-tree topology. In addition, every node is equipped with NVIDIA® Mellanox® HDR dedicated storage interconnect.
Shared, central storage accessible from all compute nodes of the cluster is provided by a DDN A³I storage cluster consisting of 4 AI400x all NVMe SSD storage servers using the high bandwidth interconnect end-to-end to the GPUs. The total accessible storage space is 1 PB and is shared between all projects. Aggregate data read bandwidth from the storage is 192 GB/s.
Compute resources are allocated via the SLURM resource manager. User access to the system login nodes is provided via SSH and the ThinLinc remote desktop solution.
All nodes have a local disk where applications can store temporary files. The size of this disk (available to jobs as
/scratch/local) is 15 TB, and is shared between all jobs using the node.