Using Apptainer on Berzelius

Introduction

Apptainer is an open-source containerization platform that is primarily designed for high-performance computing (HPC) and scientific computing environments. It focuses on providing secure and efficient containerization for running applications and workflows, particularly in shared and multi-user HPC clusters.

In the context of containerization technologies like Apptainer, "container" and "image" are two closely related concepts, but they have distinct meanings:

  • A container is a running instance of a container image.

  • A container image is a static, standalone package that contains all the necessary files and configurations needed to run an application or service.

For AI and ML work on Berzelius, where highly complex production environments and a high degree of user customizability are essential, NSC strongly recommends the use of a container environment. Apptainer and Enroot are the supported options, while Docker is not supported due to security considerations.

Employing a container environment offers several advantages, including enhanced portability and the ability to reproduce results across a wide range of systems, including laptops, Berzelius, and EuroHPC resources like LUMI. Additionally, it provides users with the flexibility to select their preferred operating system independently of the host environment, resulting in a more familiar and user-friendly experience.

Apptainer on Berzelius

Please be aware that Apptainer will not run from the /home/username directory. The reason for this is that Apptainer image files can be large, and there is no need to store Apptainer images in /home/username, which has a limited quota of 20 GB. Please run your images directly from your project directory.

Apptainer is available on both login nodes and compute nodes.

[xuan@node044 ~]$ apptainer --version
apptainer version 1.1.9-1.el8

You can check the available options and subcommands using --help:

apptainer --help

Downloading Apptainer Images

From Docker Hub repositories

Apptainer is compatible with Docker images. Docker Hub is a cloud-based platform and registry service provided by Docker. It serves as a central repository for container images, making it easy for developers to share, distribute, and collaborate on containerized applications and services. Image hosted on the hub can be easily downloaded with the docker:// URL as reference.

apptainer pull pytorch_2.0.1.sif docker://pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime

The image is stored locally as a .sif file (pytorch_2.0.1.sif, in this case).

From Nvidia NGC

NVIDIA GPU Cloud (NGC) is a platform and repository that provides a comprehensive set of GPU-optimized containers, pre-trained deep learning models, and AI software to accelerate and simplify AI and GPU-accelerated computing workflows.

apptainer pull tensorflow-20.03-tf2-py3.sif docker://nvcr.io/nvidia/tensorflow:20.03-tf2-py3

Building Apptainer Images

Running containers from the available public images is not the only option. In many cases, it is required to create a new one from scratch.

An Apptainer definition file is a configuration file that provides instructions for building an Apptainer image. An example of the Apptainer definition file is as follows. Please refer to the Apptainer User Guide for more details.

Let’s take a look at the definition file:

  • Header: The first two lines define the base image. In this case, the image cuda:11.7.1-cudnn8-devel-ubuntu22.04 from Docker Hub is used.
  • %environment is used to define environment variables available inside the container.
  • %post are lines to execute inside the container

We set the environment variable PYTHONNOUSERSITE=1 to instruct Python in the container to ignore the user-specific site-packages directory on the host when searching for modules and packages. This can be particularly useful when working in a container environment to ensure that the Python environment only uses packages installed inside the container and not in user-specific locations on the host.

We first download the base image:

apptainer build cuda_11.7.1-cudnn8-devel-ubuntu22.04.sif  docker://nvidia/cuda:11.7.1-cudnn8-devel-ubuntu22.04

We build the image from the following definition file.

Bootstrap: localimage
From: cuda_11.7.1-cudnn8-devel-ubuntu22.04.sif

%environment

export PATH=/opt/mambaforge/bin:$PATH
export PYTHONNOUSERSITE=True

%post

apt-get update && apt-get install -y --no-install-recommends \
git \
nano \
wget \
curl

# Install Mambaforge
cd /tmp
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh"
bash Mambaforge-$(uname)-$(uname -m).sh -fp /opt/mambaforge -b
rm Mambaforge*sh

export PATH=/opt/mambaforge/bin:$PATH

mamba install python==3.10 pytorch==2.0.1 torchvision torchaudio torchdata torchtext pytorch-cuda=11.7 -c pytorch -c nvidia -y

# Pin packages
cat <<EOT > /opt/mambaforge/conda-meta/pinned
pytorch==2.0.1
EOT

mamba install matplotlib scipy pandas -y

Building an Apptainer image requires root access. You can build an image on Berzelius with a few restrictions using the fakeroot feature. The fakeroot feature allows an unprivileged user to run a container as a "fake root" user by leveraging user namespace UID/GID mapping. A "fake root" user has almost the same administrative rights as root but only inside the container and the requested namespaces.

apptainer build --fakeroot pytorch_2.0.1.sif pytorch_2.0.1.def

The image can be built directly from a base image without downloading it from a registry. You just need to change the definition header to:

Bootstrap: docker
From: nvidia/cuda:11.7.1-cudnn8-devel-ubuntu22.04

Modifying Apptainer Images

You can modify an existing image to suit your requirements.

The command apptainer build provides a flag --sandbox that will create a writable directory in your work directory.

apptainer build --fakeroot --sandbox pytorch_2.0.1 pytorch_2.0.1.sif

We then initialize an interactive session using the apptainer shell command to write files within the sandbox directory with the --writable flag.

apptainer shell --fakeroot --writable pytorch_2.0.1/
Apptainer> mamba install jupyterlab -y 
Apptainer> apt update  
Apptainer> apt install vim -y
Apptainer> exit

Finally, we save the modified image.

apptainer build pytorch_2.0.1_v2.sif pytorch_2.0.1

Binding Directories

We may need to access outside directories when running a container. By default, Apptainer binds:

  • The user’s home directory ($HOME)
  • The current directory when the container is executed ($PWD)
  • System-defined paths: /tmp, /proj, etc.

You can specify the directories to bind using the --bind or -B flag.

apptainer shell -B /proj/your_proj/users/username/data:/data pytorch_2.0.1.sif

Here, the colon : separates the path to the directory on the host from the mounting point inside the container.

Running Containers

Initializing a Shell

The apptainer shell command initializes a new interactive shell inside the container. The --nv flag to enable Nvidia support.

To quit the container, typing exit or hitting Ctrl + D.

[xuan@node001 containers]$ apptainer shell --nv pytorch_2.0.1.sif
Apptainer> exit
exit
[xuan@node001 containers]$

Note that when exiting from the container all the running processes are killed (stopped). Changes saved into bound directories are preserved. By default anything else in the container is lost.

Executing Commands

The command apptainer exec starts the container from a specified image and executes a command inside it.

[xuan@node001 containers]$ apptainer exec --nv pytorch_2.0.1.sif python -c "import torch; print('GPU Name: ' + torch.cuda.get_device_name(0))"
GPU Name: NVIDIA A100-SXM4-40GB

Running Containers in Batch Jobs

You can integrate the containers into a batch job submission script. You need first to create a batch job script.

 #!/bin/bash

#SBATCH -A your_proj_account
#SBATCH --gpus=1
#SBATCH --time 00:10:00

apptainer exec --nv -B /local_data_dir:/data apptainer_image.sif python some_script.py

Berzelius Container Modules

Berzelius Container Modules are lightweight wrappers that make it possible to transparently use Apptainer containers as environment modules. Please read the user guide.

Apptainer Commands Cheat Sheet

  • Downloading images

    apptainer pull pytorch_2.0.1.sif docker://pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime
  • Building images

    apptainer build --fakeroot apptainer_image.sif apptainer_image.def
  • Initializing a shell in containers

    apptainer shell --nv apptainer_image.sif
  • Executing commands in containers

    apptainer exec --nv -B /local_data_dir:/data apptainer_image.sif python some_script.py
    apptainer exec --nv -B /local_data_dir:/data apptainer_image.sif bash -c "echo 'Training a CNN' && python some_script.py"

User Area

User support

Guides, documentation and FAQ.

Getting access

Applying for projects and login accounts.

System status

Everything OK!

No reported problems

Self-service

SUPR
NSC Express