AI Overview#

Liger has a specific partition (group of servers) dedicated to Artificial intelligence workloads. These servers are equipped with powerful GPUs (Big Data most popular accelerators) that can be exploited to speed up state-of-the-art AI computational jobs.

All AI software resources are contained in the following GitLab repository: liger-ai-tools

Ensure you are able to connect to the ICI HPC clusters. As usual for all tests and compilation, you MUST work on a computing node.

GPU resources#

The following list of GPU nodes is dedicated to Artificial intelligence jobs:

Name Model CPUs RAM GPUs GPU RAM
turing01 DELL C4140 - 2x Xeon Gold 6252
- 24 cores @ 2.10 GHz
- 48 cores in total
384GB 4x GPU Nvidia Tesla V100
- Tensor cores
- NVLink hyper bandwidth
32GB
viz[01-04] bullx R421-E4 - 2x Intel Xeon E5-2680v3
- 12 cores @ 2.5GHz
- 24 cores in total
256GB 4x 2 GPUs NVIDIA K80 12GB

* Expressions in square brackets indicate a range, i.e node[01-10] = 10 servers: nodes[01], node[02]...node[10]
* Data on each row refers to a single server

Job submission#

AI jobs are submitted via the Slurm batch scheduler, as all other jobs in Liger. Check out other guides in these docs to know how to use Slurm.
Example scripts and resources can be found in the repository: liger-ai-tools

Applications#

Artificial Intelligence - GPU resources are configured to host containerised applications. As a consequence, it is highly recommended to avoid running programs directly on the server, since the environment (installed programs, server configuration etc.) is likely not to be compatible with most applications. Instead, all jobs should be submitted (via Slurm) through Singularity, the container engine installed on Liger.
Non-containerised applications will NOT be supported (i.e. installing software directly on GPU servers).

We provide some pre-built containers with common DL environments that can be used out of the box and are optimised to run on NVIDIA GPUs.
To ensure your application has all the required dependencies, you can use containers available in the system, pull external containers or build your own ones, more information in Using and building containers.

Useful resources#

This documentation does not provide tutorials on Deep Learning (DL). For that, we encourage you to take a look at: - MIT introduction to deep learning - Nvidia resources on deep learning, and developer site - the courses taught at Master Datascience Paris Saclay and available on https://github.com/m2dsupsdlclass/lectures-labs