Reference#
Useful commands and workflow reference.
Interactive job#
Connection on the front end#
Access to the front end is done via an ssh connection:
$ ssh login@liger.ec-nantes.fr
The resources of this interactive node are shared between all the connected users:
Reminder the interactive on the front end is reserved exclusively for compilation and script development. The Liger front end nodes are not equipped with GPU. Therefore, these nodes cannot be used for executions requiring one or more GPU.
To effectuate interactive executions of your GPU codes on compute nodes with GPUs like turing01
and all viz nodes viz[01-14]
, you must use one of the following two commands:
- The
srun
command: - to obtain a terminal on a GPU compute node within which you can execute your code,
- or to directly execute your code on the GPU partition.
- The
salloc
command to reserve GPU resources which allows you to do more than one execution consecutively.
However, if the computations require a large amount of GPU resources (in number of cores, memory, or elapsed time), it is necessary to submit a batch job.
Obtaining a terminal on a GPU compute node#
It is possible to open a terminal directly on an accelerated compute node on which the resources have been reserved for you (here, 1 GPU on the default gpu partition) by using the following command:
$ srun --pty --ntasks=1 --cpus-per-task=12 --gres=gpu:1 --hint=nomultithread [--other-options] bash
Comments
- An interactive terminal is obtained with the --pty
option.
- The reservation of physical cores is assured with the --hint=nomultithread option
(no hyperthreading).
- The memory allocated for the job is proportional to the number of requested CPU cores . For example, if you request 1/4 of the cores of a node, you will have access to 1/4 of its memory. For example, on turing01, the --cpus-per-task=12
option allows reserving 1/4 of the node memory per GPU. You may consult our documentation on this subject: Memory allocation on GPU partitions
- --other-options
contains the usual Slurm options for job configuration (--time=, etc.): See the documentation on batch submission scripts
- For multi-project users and those having both CPU and GPU hours, it is necessary to specify on which project account to count the computing hours of the job.
- We strongly recommend that you consult our documentation detailing computing hours management on Liger to ensure that the hours consumed by your jobs are deducted from the correct allocation.
The terminal is operational after the resources have been granted:
[randria@login02 ~]$ srun --pty -p gpus --ntasks=1 --cpus-per-task=12 --gres=gpu:1 --hint=nomultithread bash
[randria@turing01 ~]$ hostname
turing01
[randria@turing01 ~]$ printenv | grep CUDA
CUDA_VISIBLE_DEVICES=0 <-- GPU 0
[randria@turing01 ~]$ nvidia-smi -L
GPU 0: Tesla V100-SXM2-32GB (UUID: GPU-5a80af23-787c-cbcb-92de-c80574883c5d) <-- allocated
GPU 1: Tesla V100-SXM2-32GB (UUID: GPU-233f07d9-5e4c-9309-bf20-3ae74f0495b4)
GPU 2: Tesla V100-SXM2-32GB (UUID: GPU-a1a1cbc1-8747-d8cd-9028-3e2db40deb04)
GPU 3: Tesla V100-SXM2-32GB (UUID: GPU-8d5f775d-70d9-62b2-b46c-97d30eea732f)
[randria@turing01 ~]$ squeue -j $SLURM_JOB_ID
JOBID PARTITION USER ST TIME NODES CPUS QOS PRIORITY NODELIST(REASON) NAME
1730514 gpus randria R 3:03 1 12 normal 396309 turing01 bash
[randria@turing01 ~]$ scontrol show job $SLURM_JOB_ID
JobId=1730514 JobName=bash
Priority=396309 Nice=0 Account=ici QOS=normal
JobState=RUNNING Reason=None Dependency=(null)
RunTime=00:03:33 TimeLimit=01:00:00 TimeMin=N/A
Partition=gpus AllocNode:Sid=login02:22331
NodeList=turing01
BatchHost=turing01
NumNodes=1 NumCPUs=12 CPUs/Task=12 ReqB:S:C:T=0:0:*:1
MinCPUsNode=12 MinMemoryCPU=8G MinTmpDiskNode=0
Command=bash
Comments
- CUDA_VISIBLE_DEVICES=0
means here we have allocated only 1 GPU, the GPU 0
(if we had 2 GPUs requested, it would be 0,1
) . You can also use the variable GPU_DEVICE_ORDINAL
- the scontrol show job
results show JobState=RUNNING
means that your session is active and running.
To leave the interactive mode, use exit command :
[randria@turing01 ~]$ exit
Caution: If you do not yourself leave the interactive mode, the maximum allocation duration (by default or specified with the
--time
option) is applied and this amount of hours is then counted for the project you have specified.
Interactive execution on the GPU partition#
If you don't need to open a terminal on a compute node, it is also possible to start the interactive execution of a code on the compute nodes directly from the front end by using the following command (here, with 2 GPU on the default gpu partition) :
$ srun -p gpus --ntasks=2 --cpus-per-task=12 --gres=gpu:2 --hint=nomultithread [--other-options] ./my_executable_file
Reserving reusable resources for more than one interactive execution#
Each interactive execution started as described in the preceding section is equivalent to a different job. As with all the jobs, they are susceptible to being placed in a wait queue for a certain length of time if the computing resources are not available.
If you wish to do more than one interactive execution in a row, it may be pertinent to reserve all the resources in advance so that they can be reused for the consecutive executions. You should wait until all the resources are available at one time at the moment of the reservation and not reserve for each execution separately.
Reserving resources (here, for 2 GPU on the default gpu partition) is done via the following command:
The reservation becomes usable after the resources have been granted:
$ salloc -p gpus --ntasks=2 --cpus-per-task=12 --gres=gpu:2 --hint=nomultithread [--other-options]
salloc: Granted job allocation 1730516
You can verify that your reservation is active by using the squeue
command. Complete information about the status of the job can be obtained by using the scontrol show job <job identifier>
command.
You can then start the interactive executions by using the srun
command:
$ srun [--other-options] ./code
Comments: if you do not specify any option for the srun command, the options for
salloc
(for example, the number of tasks) will be used by default.
Important
- After reserving resources with
salloc
, you are still connected on the front end (you can verify this with thehostname
command). It is imperative to use thesrun
command so that your executions use the reserved resources. - If you forget to cancel the reservation with
scancel
, the maximum allocation duration (by default or specified with the--time
option) is applied and this amount of hours is then counted for the project you have specified. Therefore, in order to cancel the reservation, you must manually enter:
$ exit
exit
salloc: Relinquishing job allocation 1730516
Batch submission: single GPU#
To submit a single-GPU job in batch on Liger GPU Node, you must first create a submission script:
For a job with 1 GPU in default gpu partition#
#!/bin/bash
#SBATCH --job-name=single_gpu # name of job
#SBATCH --account=<project_id> # replace <project_id> by your project ID
#SBATCH -p gpus # select the gpu partiton
#SBATCH --nodelist=<node> # name of the GPU node
#SBATCH --ntasks=1 # total number of processes (= number of GPUs here)
#SBATCH --gres=gpu:1 # number of GPUs (1/4 of GPUs)
#SBATCH --cpus-per-task=11 # number of cores per task (1/4 of the 4-GPUs node)
# /!\ Caution, "multithread" in Slurm vocabulary refers to hyperthreading.
#SBATCH --hint=nomultithread # hyperthreading is deactivated
#SBATCH --time=00:10:00 # maximum execution time requested (HH:MM:SS)
#SBATCH --output=gpu_single%j.out # name of output file
#SBATCH --error=gpu_single%j.out # name of error file (here, appended with the output file)
# cleans out the modules loaded in interactive and inherited by default
module purge
# loading of modules
module load ...
# load singularity: container engine to execute the image
module load singularity
# bind and run your program in the container
singularity exec --nv -B <data_folder>:/workspace \
$BASE_DIR/$IMAGE \
<my_script>
To launch a Python script, it is necessary to replace the last line by:
# code execution
python -u script_single_gpu.py
Comment: The Python's option
-u
(= unbuffered) deactivates the buffering of standard outputs which are automatically effectuated by Slurm.
Submit this script via the sbatch command:#
$ sbatch single_gpu.slurm
Slurm commands reference#
sacct
: display accounting data for all jobs and job steps in the Slurm databasesacctmgr
: display and modify Slurm account informationsalloc
: request an interactive job allocationsattach
: attach to a running job stepsbatch
: submit a batch script to Slurmscancel
: cancel a job or job step or signal a running job or job stepscontrol
: display (and modify when permitted) the status of Slurm entities. Entities include: jobs, job steps, nodes, partitions, reservations, etc.sdiag
: display scheduling statistics and timing parameterssinfo
: display node partition (queue) summary informationsprio
: display the factors that comprise a job's scheduling prioritysqueue
: display the jobs in the scheduling queues, one job per linesreport
: generate canned reports from job accounting data and machine utilization statisticssrun
: launch one or more tasks of an application across requested resourcessshare
: display the shares and usage for each charge account and usersstat
: display process statistics of a running job stepsview
: a graphical tool for displaying jobs, partitions, reservations, and Blue Gene blocks You must add the option-Y
when connecy by SSH.
Python#
Python is an interpreted object-oriented programming language.
Installed versions#
Python is installed on Liger in version 2 and in version 3.
Important note: The support of version 2.7 ended on 01/01/2020. From this date, there is no longer any updates, not even for security patches; only version 3 is maintained.
Usage#
Python is accessible with the module
command :
To load version 3.7.1
python/3.7.1
To load version 2.7.12
module load python/2.7.12
Comments#
If you wish to change to the other version of Python, you must open another session. Anaconda python packages installation is available on Liger but it's recommended today you use container installation and building with docker.
Computattion hours accounting#
On LIGER, computing hours are allocated per project and with accounting differentiation between CPU and GPU hours.
The accounting hours consumed by your jobs are determined on the basis of:
- The number of reserved physical cores × the elapsed time for a CPU or GPU job.
- Or/and the number of reserved GPUs × the elapsed time for a GPU job.
Note
For a GPU type job, the number of reserved GPUs taken into account for the job accounting is not necessarily equal to the number of GPUs requested via the Slurm directive
#SBATCH --gres=gpu:....
Indeed, an adjustment can be made according to the number of CPUs which is also requested for the job. For example, requesting a single GPU from a node (i.e. 1/4 of the GPUs) via#SBATCH --gres=gpu:1
and half of the CPUs of the node, leads to the reservation of half of the resources of the node. The job accounting is then calculated on the basis of a reservation of half of the GPUs (so as if you had requested#SBATCH --gres=gpu:2
or#SBATCH --gres=gpu:4
depending on the type of node).
Note also that during an execution in exclusive mode (the --exclusive
option of the sbatch command or the Slurm directive #SBATCH --exclusive
), you reserve all resources on a gpu node:
- 48 physical cores for cpu partitions.
- 48 physical cores and 4 GPUs for default gpu partition.
The accounting will then be on the basis of:
- The number of reserved nodes × 48 cores × elapsed time for a CPU job.
- The number of reserved nodes × 4 GPUs × elapsed time for a GPU job in default gpu partition..
You can consult your project allocation by using the Mybalance
command or SLURM commands.
GPU monitoring: nvidia-smi#
The NVIDIA System Management Interface (nvidia-smi
) is a command line utility, based on top of the NVIDIA Management Library (NVML), intended to aid in the management and monitoring of NVIDIA GPU devices.
This utility allows administrators to query GPU device state and with the appropriate privileges, permits administrators to modify GPU device state. It is targeted at the TeslaTM, GRIDTM, QuadroTM and Titan X product, though limited support is also available on other NVIDIA GPUs.
NVIDIA-smi ships with NVIDIA GPU display drivers on Linux. Nvidia-smi can report query information as XML or human readable plain text to either standard output or a file. For more details, please refer to the nvidia-smi documentation.
Example Output#
# nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... Off | 00000000:18:00.0 Off | 0 |
| N/A 41C P0 57W / 300W | 0MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2... Off | 00000000:3B:00.0 Off | 0 |
| N/A 37C P0 53W / 300W | 0MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 Tesla V100-SXM2... Off | 00000000:86:00.0 Off | 0 |
| N/A 38C P0 57W / 300W | 0MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 Tesla V100-SXM2... Off | 00000000:AF:00.0 Off | 0 |
| N/A 42C P0 57W / 300W | 0MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Querying GPU Status#
These are NVIDIA’s high-performance compute GPUs and provide a good deal of health and status information.
List all available NVIDIA devices#
$ nvidia-smi -L
GPU 0: Tesla V100-SXM2-32GB (UUID: GPU-5a80af23-787c-cbcb-92de-c80574883c5d)
GPU 1: Tesla V100-SXM2-32GB (UUID: GPU-233f07d9-5e4c-9309-bf20-3ae74f0495b4)
GPU 2: Tesla V100-SXM2-32GB (UUID: GPU-a1a1cbc1-8747-d8cd-9028-3e2db40deb04)
GPU 3: Tesla V100-SXM2-32GB (UUID: GPU-8d5f775d-70d9-62b2-b46c-97d30eea732f)
List certain details about each GPU#
$ nvidia-smi --query-gpu=index,name,uuid,serial --format=csv
index, name, uuid, serial
0, Tesla V100-SXM2-32GB, GPU-5a80af23-787c-cbcb-92de-c80574883c5d, 1562720002969
1, Tesla V100-SXM2-32GB, GPU-233f07d9-5e4c-9309-bf20-3ae74f0495b4, 1562520023800
2, Tesla V100-SXM2-32GB, GPU-a1a1cbc1-8747-d8cd-9028-3e2db40deb04, 1562420015554
3, Tesla V100-SXM2-32GB, GPU-8d5f775d-70d9-62b2-b46c-97d30eea732f, 1562520023100
Monitor overall GPU usage with 1-second update intervals#
$ nvidia-smi dmon
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 57 42 39 0 0 0 0 877 1290
1 54 38 38 0 0 0 0 877 1290
2 57 38 38 0 0 0 0 877 1290
3 57 43 41 0 0 0 0 877 1290
Monitor per-process GPU usage with 1-second update intervals#
$ nvidia-smi pmon
# gpu pid type sm mem enc dec command
# Idx # C/G % % % % name
0 14835 C 45 15 0 0 python
1 14945 C 64 50 0 0 python
2 - - - - - - -
3 - - - - - - -
in this case, two different python processes are running; one on each GPU; only 2 over 4 GPU are used
Monitoring and Managing GPU Boost#
The GPU Boost feature which NVIDIA has included with more recent GPUs allows the GPU clocks to vary depending upon load (achieving maximum performance so long as power and thermal headroom are available). However, the amount of available headroom will vary by application (and even by input file!) so users should keep their eyes on the status of the GPUs. A listing of available clock speeds can be shown for each GPU on turing01 with V100:
$ nvidia-smi -q -d SUPPORTED_CLOCKS
==============NVSMI LOG==============
Timestamp : Mon Nov 23 18:48:39 2020
Driver Version : 450.51.06
CUDA Version : 11.0
Attached GPUs : 4
GPU 00000000:18:00.0
Supported Clocks
Memory : 877 MHz
Graphics : 1530 MHz
Graphics : 1522 MHz
Graphics : 1515 MHz
Graphics : 1507 MHz
[...180 additional clock speeds omitted...]
Graphics : 150 MHz
Graphics : 142 MHz
Graphics : 135 MHz
As shown, the Tesla V100 GPU supports 187 different clock speeds (from 135 MHz to 1530 MHz). However, only one memory clock speed is supported (877 MHz). Some GPUs support two different memory clock speeds (one high speed and one power-saving speed). Typically, such GPUs only support a single GPU clock speed when the memory is in the power-saving speed (which is the idle GPU state). On all recent Tesla and Quadro GPUs, GPU Boost automatically manages these speeds and runs the clocks as fast as possible (within the thermal/power limits and any limits set by the administrator).
To review the current GPU clock speed (here we display 1 GPU), default clock speed, and maximum possible clock speed, run:
$ nvidia-smi -q -d CLOCK
==============NVSMI LOG==============
Timestamp : Mon Nov 23 18:56:48 2020
Driver Version : 450.51.06
CUDA Version : 11.0
Attached GPUs : 4
GPU 00000000:18:00.0
Clocks
Graphics : 1290 MHz
SM : 1290 MHz
Memory : 877 MHz
Video : 1170 MHz
Applications Clocks
Graphics : 1290 MHz
Memory : 877 MHz
Default Applications Clocks
Graphics : 1290 MHz
Memory : 877 MHz
Max Clocks
Graphics : 1530 MHz
SM : 1530 MHz
Memory : 877 MHz
Video : 1372 MHz
Max Customer Boost Clocks
Graphics : 1530 MHz
SM Clock Samples
Duration : 0.01 sec
Number of Samples : 4
Max : 1290 MHz
Min : 135 MHz
Avg : 870 MHz
Memory Clock Samples
Duration : 0.01 sec
Number of Samples : 4
Max : 877 MHz
Min : 877 MHz
Avg : 877 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
...
Ideally, you’d like all clocks to be running at the highest speed all the time. However, this will not be possible for all applications. To review the current state of each GPU and any reasons for clock slowdowns, use the PERFORMANCE
flag:
$ nvidia-smi -q -d PERFORMANCE
Attached GPUs : 4
GPU 00000000:18:00.0
Performance State : P0
Clocks Throttle Reasons
Idle : Not Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
...
Reviewing System/GPU Topology and NVLink with nvidia-smi#
To properly take advantage of more advanced NVIDIA GPU features (such as GPU Direct), it is vital that the system topology be properly configured. The topology refers to how the various system devices (GPUs, InfiniBand HCAs, storage controllers, etc.) connect to each other and to the system’s CPUs. Certain topology types will reduce performance or even cause certain features to be unavailable. To help tackle such questions, nvidia-smi supports system topology and connectivity queries:
$ nvidia-smi topo --matrix
GPU0 GPU1 GPU2 GPU3 mlx5_0 mlx5_1 CPU Affinity NUMA Affinity
GPU0 X NV2 NV2 NV2 NODE NODE 0,2,4,6,8,10 0
GPU1 NV2 X NV2 NV2 NODE NODE 0,2,4,6,8,10 0
GPU2 NV2 NV2 X NV2 SYS SYS 1,3,5,7,9,11 1
GPU3 NV2 NV2 NV2 X SYS SYS 1,3,5,7,9,11 1
mlx5_0 NODE NODE SYS SYS X PIX
mlx5_1 NODE NODE SYS SYS PIX X
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
Reviewing this section will take some getting used to, but can be very valuable. The above configuration shows 4 Tesla V100 and 2 Mellanox EDR InfiniBand HCA (mlx5_0
and mlx5_1
) all connected to the first CPU of a server. Because the CPUs are 12-core Xeons, the topology tool recommends that jobs be assigned to the first 12 CPU cores (although this will vary by application).
The NVLink connections themselves can also be queried to ensure status, capability, and health. Readers are encouraged to consult NVIDIA documentation to better understand the specifics.
$ nvidia-smi nvlink --status
GPU 0: Tesla V100-SXM2-32GB (UUID: GPU-5a80af23-787c-cbcb-92de-c80574883c5d)
Link 0: 25.781 GB/s
Link 1: 25.781 GB/s
Link 2: 25.781 GB/s
Link 3: 25.781 GB/s
Link 4: 25.781 GB/s
Link 5: 25.781 GB/s
GPU 1: Tesla V100-SXM2-32GB (UUID: GPU-233f07d9-5e4c-9309-bf20-3ae74f0495b4)
Link 0: 25.781 GB/s
Link 1: 25.781 GB/s
Link 2: 25.781 GB/s
Link 3: 25.781 GB/s
Link 4: 25.781 GB/s
Link 5: 25.781 GB/s
GPU 2: Tesla V100-SXM2-32GB (UUID: GPU-a1a1cbc1-8747-d8cd-9028-3e2db40deb04)
Link 0: 25.781 GB/s
Link 1: 25.781 GB/s
Link 2: 25.781 GB/s
Link 3: 25.781 GB/s
Link 4: 25.781 GB/s
Link 5: 25.781 GB/s
GPU 3: Tesla V100-SXM2-32GB (UUID: GPU-8d5f775d-70d9-62b2-b46c-97d30eea732f)
Link 0: 25.781 GB/s
Link 1: 25.781 GB/s
Link 2: 25.781 GB/s
Link 3: 25.781 GB/s
Link 4: 25.781 GB/s
Link 5: 25.781 GB/s
nvidia-smi nvlink --capabilities
Printing all GPU Details#
To list all available data on a particular GPU, specify the ID of the card with -i. Here’s the output from an older Tesla GPU card:
$ nvidia-smi -i 0 -q
$ nvidia-smi -i 0 -q -d MEMORY,UTILIZATION,POWER,CLOCK,COMPUTE
SSH on GPU nodes#
You can connect by SSH to a GPU node which were assigned to one of your jobs so that you can monitor the execution of your calculations with tools such as top
or nvidia-smi
(for example).
When one of your jobs is running, the compute nodes assigned to it are visible with the command squeue -j <job_id>
or squeue -u $USER
:
$ squeue -u $USER
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
2042259 gpus my_job my_login R 01:42 12 turing01
In this example, job # 2042259
is running on turing01. You can connect via ssh
with the following command:
$ ssh turing01
Note that you will be automatically disconnected from the node when your job is finished.
If you try to connect to a node on which none of your jobs is running, you will obtain the following error message:
[randria@login02 ~]$ ssh turing01
Access denied: user randria (uid=1014) has no active jobs.
Connection closed by 172.30.0.54