Slurm options for GPU resources#

Job Submission#

When you submit a job with Slurm on Liger, you must specify:

A partition which defines the type of compute nodes you wish to reserve.

You may wish to set additional options in order to customise your jobs. Some of the options are:

A QoS (Quality of Service) --qos which calibrates your resource needs (number of nodes,execution time, ...). If not specified, its value will be the Default QOS specified for your account (usually qos_gpu)
A Liger account, or project, can be specified to access resources that are restricted to users of a particular project or simplt to "bill" the calculation time on a pecific account. The option is --account=<account>
A reservation, only needed to access resources that are reserved for a particular group.
Number of cores, number of nodes etc. Note that the number of nodes corresponds to the memory used, according to the following formula: allocated memory = TOTAL MEM / TOTAL CORES * NUMBER OF CORES
see slurm webiste for all the available options

There is 1 partition on Liger for GPU resources called: gpus.

Partition#

Slurm partition added on gpu nodes:

PartitionName=gpus
   AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=NO QoS=N/A
   DefaultTime=01:00:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
   MaxNodes=4 MaxTime=4-04:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED
   Nodes=turing[01-03],viz[01-04]
   PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=YES:4
   OverTimeLimit=NONE PreemptMode=OFF
   State=UP TotalCPUs=208 TotalNodes=7 SelectTypeParameters=NONE
   JobDefaults=(null)
   DefMemPerCPU=8192 MaxMemPerNode=368640

That means here we have:

8192 MB ram per core
12 cores per GPU
a total of 368 GB ram

Note: DefMemPerCPU, MaxMemPerNode correspond to the maximum memory for nodes with the largest capacity in Liger. Other GPU nodes have less memory and therefore will throw an error if trying to reserve more memory than they have.

Options cheat sheet#

Below information on the slurm option for GPU jobs. Remember to add #SBATCH before for sbatch scripts.

Compulsory options#

For all jobs on GPUs, you must specify:

--partition=gpus

For jobs on turing01 (reserved) you must specify at least:

--reservation=turing01
--account=(1)
--qos=(2)
--nodelist=turing01

(1) one among: gpu-milcom,gpu-coquake,gpu-others,gpu-ici,gpu-og2110150
(2) one among: See QOS policy below.

Even when using other nodes, it is advised to specify all the options above with the desired settings in order to ensure your job settings.

QoS policy#

Partition	Qos	Time Limit	MaxJobsPerUser
`gpus`	`qos_gpu`	20 hours	3
`gpus`	`qos_gpu-long`	100 hours	2
`gpus`	`qos_gpu-dev`	2 hours	2

Requesting GPUs#

To request GPU nodes:

1 node with 1 core and 1 GPU card

--gres=gpu:1
1 node with 2 cores and 2 GPU cards

--gres=gpu:2 -c2
1 node with 3 cores and 3 GPU cards, specifically the type of Tesla V100 cards. Note that It is always best to request at least as many CPU cores are GPUs

--gres=gpu:V100:3 -c3

The available GPU node configurations are shown here.

When you request GPUs, the system will set two environment variables - we strongly recommend you do not change or unset these variables:

CUDA_VISIBLE_DEVICES
GPU_DEVICE_ORDINAL

To your application, it will look like you have GPU 0,1,.. (up to as many GPUs as you requested). So if for example, there are two jobs from different users: the first one requesting 1 GPU card, the second 3 GPU cards, and they happen landing on the same node gpu-08: