Useful commands and workflow reference.
Connection on the front end#
Access to the front end is done via an ssh connection:
$ ssh firstname.lastname@example.org
The resources of this interactive node are shared between all the connected users:
Reminder the interactive on the front end is reserved exclusively for compilation and script development. The Liger front end nodes are not equipped with GPU. Therefore, these nodes cannot be used for executions requiring one or more GPU.
To effectuate interactive executions of your GPU codes on compute nodes with GPUs like
turing01 and all viz nodes
viz[01-14], you must use one of the following two commands:
- to obtain a terminal on a GPU compute node within which you can execute your code,
- or to directly execute your code on the GPU partition.
salloccommand to reserve GPU resources which allows you to do more than one execution consecutively.
However, if the computations require a large amount of GPU resources (in number of cores, memory, or elapsed time), it is necessary to submit a batch job.
Obtaining a terminal on a GPU compute node#
It is possible to open a terminal directly on an accelerated compute node on which the resources have been reserved for you (here, 1 GPU on the default gpu partition) by using the following command:
$ srun --pty --ntasks=1 --cpus-per-task=12 --gres=gpu:1 --hint=nomultithread [--other-options] bash
- An interactive terminal is obtained with the
- The reservation of physical cores is assured with the
--hint=nomultithread option (no hyperthreading).
- The memory allocated for the job is proportional to the number of requested CPU cores . For example, if you request 1/4 of the cores of a node, you will have access to 1/4 of its memory. For example, on turing01, the
--cpus-per-task=12 option allows reserving 1/4 of the node memory per GPU. You may consult our documentation on this subject: Memory allocation on GPU partitions
--other-options contains the usual Slurm options for job configuration (--time=, etc.): See the documentation on batch submission scripts
- For multi-project users and those having both CPU and GPU hours, it is necessary to specify on which project account to count the computing hours of the job.
- We strongly recommend that you consult our documentation detailing computing hours management on Liger to ensure that the hours consumed by your jobs are deducted from the correct allocation.
The terminal is operational after the resources have been granted:
[randria@login02 ~]$ srun --pty -p gpus --ntasks=1 --cpus-per-task=12 --gres=gpu:1 --hint=nomultithread bash [randria@turing01 ~]$ hostname turing01 [randria@turing01 ~]$ printenv | grep CUDA CUDA_VISIBLE_DEVICES=0 <-- GPU 0 [randria@turing01 ~]$ nvidia-smi -L GPU 0: Tesla V100-SXM2-32GB (UUID: GPU-5a80af23-787c-cbcb-92de-c80574883c5d) <-- allocated GPU 1: Tesla V100-SXM2-32GB (UUID: GPU-233f07d9-5e4c-9309-bf20-3ae74f0495b4) GPU 2: Tesla V100-SXM2-32GB (UUID: GPU-a1a1cbc1-8747-d8cd-9028-3e2db40deb04) GPU 3: Tesla V100-SXM2-32GB (UUID: GPU-8d5f775d-70d9-62b2-b46c-97d30eea732f) [randria@turing01 ~]$ squeue -j $SLURM_JOB_ID JOBID PARTITION USER ST TIME NODES CPUS QOS PRIORITY NODELIST(REASON) NAME 1730514 gpus randria R 3:03 1 12 normal 396309 turing01 bash [randria@turing01 ~]$ scontrol show job $SLURM_JOB_ID JobId=1730514 JobName=bash Priority=396309 Nice=0 Account=ici QOS=normal JobState=RUNNING Reason=None Dependency=(null) RunTime=00:03:33 TimeLimit=01:00:00 TimeMin=N/A Partition=gpus AllocNode:Sid=login02:22331 NodeList=turing01 BatchHost=turing01 NumNodes=1 NumCPUs=12 CPUs/Task=12 ReqB:S:C:T=0:0:*:1 MinCPUsNode=12 MinMemoryCPU=8G MinTmpDiskNode=0 Command=bash
CUDA_VISIBLE_DEVICES=0 means here we have allocated only 1 GPU, the
GPU 0 (if we had 2 GPUs requested, it would be
0,1) . You can also use the variable
scontrol show job results show
JobState=RUNNING means that your session is active and running.
To leave the interactive mode, use exit command :
[randria@turing01 ~]$ exit
Caution: If you do not yourself leave the interactive mode, the maximum allocation duration (by default or specified with the
--timeoption) is applied and this amount of hours is then counted for the project you have specified.
Interactive execution on the GPU partition#
If you don't need to open a terminal on a compute node, it is also possible to start the interactive execution of a code on the compute nodes directly from the front end by using the following command (here, with 2 GPU on the default gpu partition) :
$ srun -p gpus --ntasks=2 --cpus-per-task=12 --gres=gpu:2 --hint=nomultithread [--other-options] ./my_executable_file
Reserving reusable resources for more than one interactive execution#
Each interactive execution started as described in the preceding section is equivalent to a different job. As with all the jobs, they are susceptible to being placed in a wait queue for a certain length of time if the computing resources are not available.
If you wish to do more than one interactive execution in a row, it may be pertinent to reserve all the resources in advance so that they can be reused for the consecutive executions. You should wait until all the resources are available at one time at the moment of the reservation and not reserve for each execution separately.
Reserving resources (here, for 2 GPU on the default gpu partition) is done via the following command:
The reservation becomes usable after the resources have been granted:
$ salloc -p gpus --ntasks=2 --cpus-per-task=12 --gres=gpu:2 --hint=nomultithread [--other-options] salloc: Granted job allocation 1730516
You can verify that your reservation is active by using the
squeue command. Complete information about the status of the job can be obtained by using the scontrol show job
<job identifier> command.
You can then start the interactive executions by using the
$ srun [--other-options] ./code
Comments: if you do not specify any option for the srun command, the options for
salloc(for example, the number of tasks) will be used by default.
- After reserving resources with
salloc, you are still connected on the front end (you can verify this with the
hostnamecommand). It is imperative to use the
sruncommand so that your executions use the reserved resources.
- If you forget to cancel the reservation with
scancel, the maximum allocation duration (by default or specified with the
--timeoption) is applied and this amount of hours is then counted for the project you have specified. Therefore, in order to cancel the reservation, you must manually enter:
$ exit exit salloc: Relinquishing job allocation 1730516
Batch submission: single GPU#
To submit a single-GPU job in batch on Liger GPU Node, you must first create a submission script:
For a job with 1 GPU in default gpu partition#
#!/bin/bash #SBATCH --job-name=single_gpu # name of job #SBATCH --account=<project_id> # replace <project_id> by your project ID #SBATCH -p gpus # select the gpu partiton #SBATCH --nodelist=<node> # name of the GPU node #SBATCH --ntasks=1 # total number of processes (= number of GPUs here) #SBATCH --gres=gpu:1 # number of GPUs (1/4 of GPUs) #SBATCH --cpus-per-task=11 # number of cores per task (1/4 of the 4-GPUs node) # /!\ Caution, "multithread" in Slurm vocabulary refers to hyperthreading. #SBATCH --hint=nomultithread # hyperthreading is deactivated #SBATCH --time=00:10:00 # maximum execution time requested (HH:MM:SS) #SBATCH --output=gpu_single%j.out # name of output file #SBATCH --error=gpu_single%j.out # name of error file (here, appended with the output file) # cleans out the modules loaded in interactive and inherited by default module purge # loading of modules module load ... # load singularity: container engine to execute the image module load singularity # bind and run your program in the container singularity exec --nv -B <data_folder>:/workspace \ $BASE_DIR/$IMAGE \ <my_script>
To launch a Python script, it is necessary to replace the last line by:
# code execution python -u script_single_gpu.py
Comment: The Python's option
-u(= unbuffered) deactivates the buffering of standard outputs which are automatically effectuated by Slurm.
Submit this script via the sbatch command:#
$ sbatch single_gpu.slurm
Slurm commands reference#
sacct: display accounting data for all jobs and job steps in the Slurm database
sacctmgr: display and modify Slurm account information
salloc: request an interactive job allocation
sattach: attach to a running job step
sbatch: submit a batch script to Slurm
scancel: cancel a job or job step or signal a running job or job step
scontrol: display (and modify when permitted) the status of Slurm entities. Entities include: jobs, job steps, nodes, partitions, reservations, etc.
sdiag: display scheduling statistics and timing parameters
sinfo: display node partition (queue) summary information
sprio: display the factors that comprise a job's scheduling priority
squeue: display the jobs in the scheduling queues, one job per line
sreport: generate canned reports from job accounting data and machine utilization statistics
srun: launch one or more tasks of an application across requested resources
sshare: display the shares and usage for each charge account and user
sstat: display process statistics of a running job step
sview: a graphical tool for displaying jobs, partitions, reservations, and Blue Gene blocks You must add the option
-Ywhen connecy by SSH.
Python is an interpreted object-oriented programming language.
Python is installed on Liger in version 2 and in version 3.
Important note: The support of version 2.7 ended on 01/01/2020. From this date, there is no longer any updates, not even for security patches; only version 3 is maintained.
Python is accessible with the
module command :
To load version 3.7.1
To load version 2.7.12
module load python/2.7.12
If you wish to change to the other version of Python, you must open another session. Anaconda python packages installation is available on Liger but it's recommended today you use container installation and building with docker.
Computattion hours accounting#
On LIGER, computing hours are allocated per project and with accounting differentiation between CPU and GPU hours.
The accounting hours consumed by your jobs are determined on the basis of:
- The number of reserved physical cores × the elapsed time for a CPU or GPU job.
- Or/and the number of reserved GPUs × the elapsed time for a GPU job.
For a GPU type job, the number of reserved GPUs taken into account for the job accounting is not necessarily equal to the number of GPUs requested via the Slurm directive
#SBATCH --gres=gpu:....Indeed, an adjustment can be made according to the number of CPUs which is also requested for the job. For example, requesting a single GPU from a node (i.e. 1/4 of the GPUs) via
#SBATCH --gres=gpu:1and half of the CPUs of the node, leads to the reservation of half of the resources of the node. The job accounting is then calculated on the basis of a reservation of half of the GPUs (so as if you had requested
#SBATCH --gres=gpu:4depending on the type of node).
Note also that during an execution in exclusive mode (the
--exclusive option of the sbatch command or the Slurm directive
#SBATCH --exclusive), you reserve all resources on a gpu node:
- 48 physical cores for cpu partitions.
- 48 physical cores and 4 GPUs for default gpu partition.
The accounting will then be on the basis of:
- The number of reserved nodes × 48 cores × elapsed time for a CPU job.
- The number of reserved nodes × 4 GPUs × elapsed time for a GPU job in default gpu partition..
You can consult your project allocation by using the
Mybalance command or SLURM commands.
GPU monitoring: nvidia-smi#
The NVIDIA System Management Interface (
nvidia-smi) is a command line utility, based on top of the NVIDIA Management Library (NVML), intended to aid in the management and monitoring of NVIDIA GPU devices.
This utility allows administrators to query GPU device state and with the appropriate privileges, permits administrators to modify GPU device state. It is targeted at the TeslaTM, GRIDTM, QuadroTM and Titan X product, though limited support is also available on other NVIDIA GPUs.
NVIDIA-smi ships with NVIDIA GPU display drivers on Linux. Nvidia-smi can report query information as XML or human readable plain text to either standard output or a file. For more details, please refer to the nvidia-smi documentation.
# nvidia-smi +-----------------------------------------------------------------------------+ | NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla V100-SXM2... Off | 00000000:18:00.0 Off | 0 | | N/A 41C P0 57W / 300W | 0MiB / 32510MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 Tesla V100-SXM2... Off | 00000000:3B:00.0 Off | 0 | | N/A 37C P0 53W / 300W | 0MiB / 32510MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 2 Tesla V100-SXM2... Off | 00000000:86:00.0 Off | 0 | | N/A 38C P0 57W / 300W | 0MiB / 32510MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 3 Tesla V100-SXM2... Off | 00000000:AF:00.0 Off | 0 | | N/A 42C P0 57W / 300W | 0MiB / 32510MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
Querying GPU Status#
These are NVIDIA’s high-performance compute GPUs and provide a good deal of health and status information.
List all available NVIDIA devices#
$ nvidia-smi -L GPU 0: Tesla V100-SXM2-32GB (UUID: GPU-5a80af23-787c-cbcb-92de-c80574883c5d) GPU 1: Tesla V100-SXM2-32GB (UUID: GPU-233f07d9-5e4c-9309-bf20-3ae74f0495b4) GPU 2: Tesla V100-SXM2-32GB (UUID: GPU-a1a1cbc1-8747-d8cd-9028-3e2db40deb04) GPU 3: Tesla V100-SXM2-32GB (UUID: GPU-8d5f775d-70d9-62b2-b46c-97d30eea732f)
List certain details about each GPU#
$ nvidia-smi --query-gpu=index,name,uuid,serial --format=csv index, name, uuid, serial 0, Tesla V100-SXM2-32GB, GPU-5a80af23-787c-cbcb-92de-c80574883c5d, 1562720002969 1, Tesla V100-SXM2-32GB, GPU-233f07d9-5e4c-9309-bf20-3ae74f0495b4, 1562520023800 2, Tesla V100-SXM2-32GB, GPU-a1a1cbc1-8747-d8cd-9028-3e2db40deb04, 1562420015554 3, Tesla V100-SXM2-32GB, GPU-8d5f775d-70d9-62b2-b46c-97d30eea732f, 1562520023100
Monitor overall GPU usage with 1-second update intervals#
$ nvidia-smi dmon # gpu pwr gtemp mtemp sm mem enc dec mclk pclk # Idx W C C % % % % MHz MHz 0 57 42 39 0 0 0 0 877 1290 1 54 38 38 0 0 0 0 877 1290 2 57 38 38 0 0 0 0 877 1290 3 57 43 41 0 0 0 0 877 1290
Monitor per-process GPU usage with 1-second update intervals#
$ nvidia-smi pmon # gpu pid type sm mem enc dec command # Idx # C/G % % % % name 0 14835 C 45 15 0 0 python 1 14945 C 64 50 0 0 python 2 - - - - - - - 3 - - - - - - -
in this case, two different python processes are running; one on each GPU; only 2 over 4 GPU are used
Monitoring and Managing GPU Boost#
The GPU Boost feature which NVIDIA has included with more recent GPUs allows the GPU clocks to vary depending upon load (achieving maximum performance so long as power and thermal headroom are available). However, the amount of available headroom will vary by application (and even by input file!) so users should keep their eyes on the status of the GPUs. A listing of available clock speeds can be shown for each GPU on turing01 with V100:
$ nvidia-smi -q -d SUPPORTED_CLOCKS ==============NVSMI LOG============== Timestamp : Mon Nov 23 18:48:39 2020 Driver Version : 450.51.06 CUDA Version : 11.0 Attached GPUs : 4 GPU 00000000:18:00.0 Supported Clocks Memory : 877 MHz Graphics : 1530 MHz Graphics : 1522 MHz Graphics : 1515 MHz Graphics : 1507 MHz [...180 additional clock speeds omitted...] Graphics : 150 MHz Graphics : 142 MHz Graphics : 135 MHz
As shown, the Tesla V100 GPU supports 187 different clock speeds (from 135 MHz to 1530 MHz). However, only one memory clock speed is supported (877 MHz). Some GPUs support two different memory clock speeds (one high speed and one power-saving speed). Typically, such GPUs only support a single GPU clock speed when the memory is in the power-saving speed (which is the idle GPU state). On all recent Tesla and Quadro GPUs, GPU Boost automatically manages these speeds and runs the clocks as fast as possible (within the thermal/power limits and any limits set by the administrator).
To review the current GPU clock speed (here we display 1 GPU), default clock speed, and maximum possible clock speed, run:
$ nvidia-smi -q -d CLOCK ==============NVSMI LOG============== Timestamp : Mon Nov 23 18:56:48 2020 Driver Version : 450.51.06 CUDA Version : 11.0 Attached GPUs : 4 GPU 00000000:18:00.0 Clocks Graphics : 1290 MHz SM : 1290 MHz Memory : 877 MHz Video : 1170 MHz Applications Clocks Graphics : 1290 MHz Memory : 877 MHz Default Applications Clocks Graphics : 1290 MHz Memory : 877 MHz Max Clocks Graphics : 1530 MHz SM : 1530 MHz Memory : 877 MHz Video : 1372 MHz Max Customer Boost Clocks Graphics : 1530 MHz SM Clock Samples Duration : 0.01 sec Number of Samples : 4 Max : 1290 MHz Min : 135 MHz Avg : 870 MHz Memory Clock Samples Duration : 0.01 sec Number of Samples : 4 Max : 877 MHz Min : 877 MHz Avg : 877 MHz Clock Policy Auto Boost : N/A Auto Boost Default : N/A ...
Ideally, you’d like all clocks to be running at the highest speed all the time. However, this will not be possible for all applications. To review the current state of each GPU and any reasons for clock slowdowns, use the
$ nvidia-smi -q -d PERFORMANCE Attached GPUs : 4 GPU 00000000:18:00.0 Performance State : P0 Clocks Throttle Reasons Idle : Not Active Applications Clocks Setting : Not Active SW Power Cap : Not Active HW Slowdown : Not Active HW Thermal Slowdown : Not Active HW Power Brake Slowdown : Not Active Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active ...
Reviewing System/GPU Topology and NVLink with nvidia-smi#
To properly take advantage of more advanced NVIDIA GPU features (such as GPU Direct), it is vital that the system topology be properly configured. The topology refers to how the various system devices (GPUs, InfiniBand HCAs, storage controllers, etc.) connect to each other and to the system’s CPUs. Certain topology types will reduce performance or even cause certain features to be unavailable. To help tackle such questions, nvidia-smi supports system topology and connectivity queries:
$ nvidia-smi topo --matrix GPU0 GPU1 GPU2 GPU3 mlx5_0 mlx5_1 CPU Affinity NUMA Affinity GPU0 X NV2 NV2 NV2 NODE NODE 0,2,4,6,8,10 0 GPU1 NV2 X NV2 NV2 NODE NODE 0,2,4,6,8,10 0 GPU2 NV2 NV2 X NV2 SYS SYS 1,3,5,7,9,11 1 GPU3 NV2 NV2 NV2 X SYS SYS 1,3,5,7,9,11 1 mlx5_0 NODE NODE SYS SYS X PIX mlx5_1 NODE NODE SYS SYS PIX X Legend: X = Self SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI) NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU) PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge) PIX = Connection traversing at most a single PCIe bridge NV# = Connection traversing a bonded set of # NVLinks
Reviewing this section will take some getting used to, but can be very valuable. The above configuration shows 4 Tesla V100 and 2 Mellanox EDR InfiniBand HCA (
mlx5_1) all connected to the first CPU of a server. Because the CPUs are 12-core Xeons, the topology tool recommends that jobs be assigned to the first 12 CPU cores (although this will vary by application).
The NVLink connections themselves can also be queried to ensure status, capability, and health. Readers are encouraged to consult NVIDIA documentation to better understand the specifics.
$ nvidia-smi nvlink --status GPU 0: Tesla V100-SXM2-32GB (UUID: GPU-5a80af23-787c-cbcb-92de-c80574883c5d) Link 0: 25.781 GB/s Link 1: 25.781 GB/s Link 2: 25.781 GB/s Link 3: 25.781 GB/s Link 4: 25.781 GB/s Link 5: 25.781 GB/s GPU 1: Tesla V100-SXM2-32GB (UUID: GPU-233f07d9-5e4c-9309-bf20-3ae74f0495b4) Link 0: 25.781 GB/s Link 1: 25.781 GB/s Link 2: 25.781 GB/s Link 3: 25.781 GB/s Link 4: 25.781 GB/s Link 5: 25.781 GB/s GPU 2: Tesla V100-SXM2-32GB (UUID: GPU-a1a1cbc1-8747-d8cd-9028-3e2db40deb04) Link 0: 25.781 GB/s Link 1: 25.781 GB/s Link 2: 25.781 GB/s Link 3: 25.781 GB/s Link 4: 25.781 GB/s Link 5: 25.781 GB/s GPU 3: Tesla V100-SXM2-32GB (UUID: GPU-8d5f775d-70d9-62b2-b46c-97d30eea732f) Link 0: 25.781 GB/s Link 1: 25.781 GB/s Link 2: 25.781 GB/s Link 3: 25.781 GB/s Link 4: 25.781 GB/s Link 5: 25.781 GB/s
nvidia-smi nvlink --capabilities
Printing all GPU Details#
To list all available data on a particular GPU, specify the ID of the card with -i. Here’s the output from an older Tesla GPU card:
$ nvidia-smi -i 0 -q $ nvidia-smi -i 0 -q -d MEMORY,UTILIZATION,POWER,CLOCK,COMPUTE
SSH on GPU nodes#
You can connect by SSH to a GPU node which were assigned to one of your jobs so that you can monitor the execution of your calculations with tools such as
nvidia-smi (for example).
When one of your jobs is running, the compute nodes assigned to it are visible with the command
squeue -j <job_id> or
squeue -u $USER :
$ squeue -u $USER JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 2042259 gpus my_job my_login R 01:42 12 turing01
In this example, job #
2042259 is running on turing01. You can connect via
ssh with the following command:
$ ssh turing01
Note that you will be automatically disconnected from the node when your job is finished.
If you try to connect to a node on which none of your jobs is running, you will obtain the following error message:
[randria@login02 ~]$ ssh turing01 Access denied: user randria (uid=1014) has no active jobs. Connection closed by 172.30.0.54