Liger has a specific partition (group of servers) dedicated to Artificial intelligence workloads. These servers are equipped with powerful GPUs (Big Data most popular accelerators) that can be exploited to speed up state-of-the-art AI computational jobs.
All AI software resources are contained in the following GitLab repository: liger-ai-tools
Ensure you are able to connect to the ICI HPC clusters. As usual for all tests and compilation, you MUST work on a computing node.
The following list of GPU nodes is dedicated to Artificial intelligence jobs:
|turing01||DELL C4140||- 2x Xeon Gold 6252
- 24 cores @ 2.10 GHz
- 48 cores in total
|384GB||4x GPU Nvidia Tesla V100
- Tensor cores
- NVLink hyper bandwidth
|viz[01-04]||bullx R421-E4||- 2x Intel Xeon E5-2680v3
- 12 cores @ 2.5GHz
- 24 cores in total
|256GB||4x 2 GPUs NVIDIA K80||12GB|
* Expressions in square brackets indicate a range, i.e node[01-10] = 10 servers: nodes, node...node
* Data on each row refers to a single server
AI jobs are submitted via the Slurm batch scheduler, as all other jobs in Liger. Check out other guides in these docs to know how to use Slurm.
Example scripts and resources can be found in the repository: liger-ai-tools
Artificial Intelligence - GPU resources are configured to host containerised applications. As a consequence, it is highly recommended to avoid running programs directly on the server, since the environment (installed programs, server configuration etc.) is likely not to be compatible with most applications. Instead, all jobs should be submitted (via Slurm) through Singularity, the container engine installed on Liger.
Non-containerised applications will NOT be supported (i.e. installing software directly on GPU servers).
We provide some pre-built containers with common DL environments that can be used out of the box and are optimised to run on NVIDIA GPUs.
To ensure your application has all the required dependencies, you can use containers available in the system, pull external containers or build your own ones, more information in Using and building containers.
This documentation does not provide tutorials on Deep Learning (DL). For that, we encourage you to take a look at: - MIT introduction to deep learning - Nvidia resources on deep learning, and developer site - the courses taught at Master Datascience Paris Saclay and available on https://github.com/m2dsupsdlclass/lectures-labs