Using Enroot on Berzelius

Introduction

Enroot is a simple and powerful tool to turn container images into unprivileged sandboxes. Enroot is targeted for HPC environments with integration with the SLURM scheduler, but can also be used as a standalone tool to run containers as an unprivileged user. Enroot is similar to Singularity, but with the added benefit of allowing users to read/write in the container and also to appear as a root user within the container environment.

Please read Enroot's github page for more information.

Note: Due to what we consider to be severe design flaws in Pyxis, we have decided to deprecate both of Enroot and Pyxis and remove them from the cluster on the 2024-01-15. Users should migrate to Apptainer. Please reach out to us if you require assistance with this.

Setting up Nvidia Credentials

This step is necessary for importing container images from Nvidia NGC.

  • Completing the steps 4.1 and 4.3. Save the API key.

  • Adding the API key by adding these lines to the config file at ~/.config/enroot/.credentials

    machine nvcr.io login $oauthtoken password your_api_key
    machine authn.nvidia.com login $oauthtoken password your_api_key

    Please replace your_api_key with your real API key.

  • Setting the config path by adding the line to ~/.bashrc

    export ENROOT_CONFIG_PATH=/home/username/.config/enroot
  • Making the path valid

    source ~/.bashrc

Setting Path to User Container Storage

By default, your enroot containers will be saved in your home directory. On Berzelius, you have only 20 GB disk space for home. Please put enroot containers in your project directory.

We create the directories for Enroot container storage.

mkdir -p /proj/your_proj/users/username/enroot/cache /proj/your_proj/users/username/enroot/data

Add following lines to your ~/.bashrc

export ENROOT_CACHE_PATH=/proj/your_proj/users/username/enroot/cache
export ENROOT_DATA_PATH=/proj/your_proj/users/username/enroot/data

To make the change valid

source ~/.bashrc

Downloading Enroot Images

Enroot is compatible with Docker images. Docker Hub is a cloud-based platform and registry service provided by Docker. It serves as a central repository for container images, making it easy for developers to share, distribute, and collaborate on containerized applications and services. Image hosted on the hub can be easily downloaded with the docker:// URL as reference.

  • From Docker Hub repositories
enroot import --output pytorch_1.12.1.sqsh 'docker://pytorch/pytorch:1.12.1-cuda11.3-cudnn8-devel'

The image is stored locally as a .sqsh file (pytorch_1.12.1.sqsh, in this case).

NVIDIA GPU Cloud (NGC) is a platform and repository that provides a comprehensive set of GPU-optimized containers, pre-trained deep learning models, and AI software to accelerate and simplify AI and GPU-accelerated computing workflows.

  • From Nvidia NGC

    enroot import --output nvidia_pytorch_22.09.sqsh 'docker://nvcr.io#nvidia/pytorch:22.09-py3'

Creating a Container

enroot create --name nvidia_pytorch_22.09 nvidia_pytorch_22.09.sqsh

You can check existing Enroot containers by the command enroot list.

[xuan@node025 enroot]$ enroot list
nvidia_pytorch_22.09

Binding Directories

We may need to access outside directories when running a container.

You can specify the directories to bind using the --mount flag.

enroot start --mount /path/on/host:/path/in/container nvidia_pytorch_22.09

Here, the colon : separates the path to the directory on the host from the mounting point inside the container.

Running Containers

Initializing a Shell

You need to be on a compute node where you have access to GPU resources to start a container.

enroot start --root --rw --mount /path/on/host:/path/in/container nvidia_pytorch_22.09  
  • --root: Ask to be remapped to root inside the container
  • --rw: Make the container root filesystem writable
  • --mount: Perform a mount from the host inside the container (colon-separated)

To quit the container, typing exit or hitting Ctrl + D.

[xuan@node025 enroot]$ enroot start --root nvidia_pytorch_22.09

=============
== PyTorch ==
=============

NVIDIA Release 22.09 (build 44877844)
PyTorch Version 1.13.0a0+d0d6b1f

Container image Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Copyright (c) 2014-2022 Facebook Inc.
Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
Copyright (c) 2012-2014 Deepmind Technologies    (Koray Kavukcuoglu)
Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
Copyright (c) 2011-2013 NYU                      (Clement Farabet)
Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
Copyright (c) 2006      Idiap Research Institute (Samy Bengio)
Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
Copyright (c) 2015      Google Inc.
Copyright (c) 2015      Yangqing Jia
Copyright (c) 2013-2016 The Caffe contributors
All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

root@node025:/workspace# exit
exit

Executing Commands

You can also start a container and execute your commands.

enroot start --rw --mount /path/on/host:/path/in/container nvidia_pytorch_22.09 sh -c 'python my_script.py' 

Running Containers in Batch Jobs

You can integrate the containers into a batch job submission script. You need first to create a batch job script.

#!/bin/bash
#SBATCH -A your_proj_account
#SBATCH --gpus=1
#SBATCH --time 00:10:00

enroot start --rw --mount /path/on/host:/path/in/container nvidia_pytorch_22.09 bash -c "cd /your/working/dir && python my_script.py"

With the slurm plugin Pyxis, we can run a job on multiple nodes. See an example below.

#!/bin/bash
#SBATCH -A your_project
#SBATCH --nodes=2
#SBATCH --gpus=8
#SBATCH --ntasks-per-node=8
#SBATCH --time=0-00:10:00

srun --container-image=/path/to/your_container.sqsh --container-name=your_container --container-mounts=/path/on/host:/path/in/container --container-writable bash -c "cd /your/working/dir && python my_script.py"

Enroot Commands Cheat sheet

  • Downloading images

    enroot import --output nvidia_pytorch_22.09.sqsh 'docker://nvcr.io#nvidia/pytorch:22.09-py3
  • Create containers

    enroot create --name nvidia_pytorch_22.09 nvidia_pytorch_22.09.sqsh
  • List containers

    enroot list
  • Remove containers

    enroot remove nvidia_pytorch_22.09
  • Initializing a shell in containers

    enroot start --root --rw --mount /path/on/host:/path/in/container nvidia_pytorch_22.09
  • Executing commands in containers

    enroot start --rw --mount /path/on/host:/path/in/container nvidia_pytorch_22.09 bash -c "cd /your/working/dir && python my_script.py"

User Area

User support

Guides, documentation and FAQ.

Getting access

Applying for projects and login accounts.

System status

Everything OK!

No reported problems

Self-service

SUPR
NSC Express