Using and buidling Containers#

Containers are software tools that abstract environment by packaging application and dependencies at the application level. They can be thought as a sort of a computer in a box, allowing you to fully abstract and customize the software environment. It sufficient to have a container engine (such as Docker or Singularity) to be able to run and port any application no matter the host environment, as all the dependencies from the OS to the tiniest module can be packaged with the application. More information on containers here.

All applications on a gpu node are run through containers. Non-containerised applications will NOT be supported (i.e. installing software directly on a gpu node).

CNSC-tech provides some pre-built containers with common DL environments that can be used out of the box and are optimised to run on Liger GPU resources.
Users can also build their own containers to customise their environment and package all their needed software dependencies and applications. This page gives information on these topics.

Using containers#

Liger GPU servers use Singularity as container engine. The singularity command can be used to run, build and manage containers. List all the commands with singularity -h. More detailed information available in the documentation.

Important note: All container operations have to be done from login01. Run ssh login01 when on Liger

Container registry#

Ready-to-use AI container images provided by us can be found in the liger-ai-tools repository Docker registry.
The containers were built using Docker. The liger-ai-tools repository contains all the respective Dockerfiles. Information on the content of each container can be found the the repository main README.

In Liger, you might find some of those containers in the following folder:

/softs/singularity/containers/ai/

PLEASE NOTE that the containers in that folder might not be up to date. If unsure, please pull the containers from the registry to be sure to use the latest version.

When running containers on a gpu node, make sure to load the latest Singularity version through module.

Pull containers from the liger-ai-registry#

from login01 Singularity cache is in your HOME space by default. You have to set the singularity cache directory to SCRATCH to avoid filling up your quota in HOME before pulling a container from any registry. You can do so by running the repository script:

source set_pull_credentials.sh

or running the following command:

export SINGULARITY_CACHEDIR=$SCRATCHDIR

Load singularity:

module load singularity

Pull a container from the liger-ai-tools registry (no tag defaults to "latest"):

singularity pull docker://gitlab-registry.in2p3.fr/ecn-collaborations/liger-ai-tools/<container>

Examples#

Pull the NGC container:

singularity pull docker://gitlab-registry.in2p3.fr/ecn-collaborations/liger-ai-tools/ngc-tf2.3-fat

Run a IPython shell session in the TensorFlow NGC, from a gpu node:

singularity exec --nv /softs/singularity/containers/ai/ngc-tf2.3-fat_latest.sif ipython3

Run a simple shell session inside any container, from a gpu node:

singularity shell --nv /softs/singularity/containers/ai/<container>

Third party containers#

Singularity can be also used to pull third-party containers from an online container hub. Singularity is fully compatible with Docker, hence both Singularity and Docker containers can be brought to Liger through a singularity pull. During the pull process, Singularity will automatically convert the Docker image to the Singularity format maintaining the same internal functionalities.

A good resource for AI containers is NGC (NVIDIA GPU Cloud). The platform provides useful AI containers that are optimised for NVIDIA GPU cards, such as Liger ones.

Examples#

Before being able to pull an external image, make sure to go through the following steps: - From login01, run module load singularity - Edit the set_pull_credentials.sh script with your details, following the in-file instructions - Run source set_pull_credentials.sh

Pull the NGC CUDA image:

singularity pull docker://nvcr.io/nvidia/cuda:latest

Pull the lolcow Docker image

singularity pull docker://godlovedc/lolcow

Building containers#

If the provided containers do not suit your purposes, are missing some dependencies or perhaps you want to tightly bundle your application inside the container for easier deployment then it is advised to build your own containers. There are several ways to build a container: - Using either Docker or Singularity - From scratch or from a pre-existing image

In this section we'll describe how to use Docker to build a container from an existing image, specifically images from the NGC platform.
It is advised to avoid building images from scratch and in most cases it is a good idea to create images starting from the ones provided by NGC. These images are optimised for AI workload on GPUs and cover a wide range of technologies. Before thinking of building your own image check whether it's already there!

Build containers with Docker#

Detailed reference here.

This section assume you have Docker install on your local machine. Docker is not present on Liger and therefore containers cannot be build directly there. Since Singularity is compatible with Docker, building your containers in one or the other technology does not make a difference, it's just a matter of choice. The good point about Singularity is that is directly available on Liger, however Docker is much more widespread and has more online resources.

Docker can build container images from a Dockerfile: a set of simple instructions executed sequentially by docker build through the Docker command line tool.
There are many instructions available, we are going to focus on the most relevant ones that are also pretty straight-forward:

  • FROM: the image that will constitute the base of your container. For AI workloads, it is highly advised to pick an image from NGC.
  • RUN: run bash commands in the container environment. It is often use to install any extra dependency that might be needed.
  • COPY: copy local files and folders into the container (programs, source files, etc)

In the folder docker of this repository we provide the Dockerfiles used to build the containers that are currently available in Liger. Snippets of code from the Dockerfile used to build the ngc-tf2.3-fat container are reported below with explanatory comment:

# Use the nvidia optimised tensorflow container from NGC as starting image
FROM nvcr.io/nvidia/tensorflow:20.09-tf2-py3
...
# Since the provided image does not contain tensorflow-probability, use the pip (python package manager) command to install it. Make sure pip is upgraded first
RUN python3 -m pip install -U pip && pip install tensorflow-probability==0.11.1
...

You can use the Dockerfiles provided as a guideline / starting point to make container recipes.

Export Docker images to Liger#

To export your local custom images to Liger you can use two methods: through the liger-ai-tools registry and via static image.

Through Liger container registry (suggested)#

note: the image will be public (can be used by other users)

  • Log in gitlab.in2p3.fr with your ECN credentials
  • Request access to the repository by asking us (cnsc-help@ec-nantes.fr or Slack)
  • Follow GitLab instructions to set your registry access credentials (adding tokens, docker login)
  • docker push gitlab.in2p3.fr/ecn-collaborations/liger-ai-tools/<your container>
  • Log in to Liger and use Singularity to pull your container from docker hub, as outlined previously (section Using containers - Third party containers). Remember to add docker://<YOUR REPO> before your repository name
  • Run your container on a gpu node.

Static image#

note: the image will be private

  • Use docker save <IMAGE> -o <IMAGE>.tar to save your image to a tar file
  • Move the tar file to Liger through scp or WinSCP.
  • SSH into a gpu node
  • In a gpu node, load singularity (latest version):
module load singularity
  • and run
singularity build <IMAGE>.sif docker-archive://<IMAGE>.tar
  • Run your container on a gpu node.

Learn more about containers#