Reference#

Useful commands and workflow reference.

Interactive job#

Connection on the front end#

Access to the front end is done via an ssh connection:

$ ssh login@liger.ec-nantes.fr

The resources of this interactive node are shared between all the connected users:

Reminder the interactive on the front end is reserved exclusively for compilation and script development. The Liger front end nodes are not equipped with GPU. Therefore, these nodes cannot be used for executions requiring one or more GPU.

To effectuate interactive executions of your GPU codes on compute nodes with GPUs like turing01 and all viz nodes viz[01-14], you must use one of the following two commands:

The srun command:
to obtain a terminal on a GPU compute node within which you can execute your code,
or to directly execute your code on the GPU partition.
The salloc command to reserve GPU resources which allows you to do more than one execution consecutively.

However, if the computations require a large amount of GPU resources (in number of cores, memory, or elapsed time), it is necessary to submit a batch job.

Obtaining a terminal on a GPU compute node#

It is possible to open a terminal directly on an accelerated compute node on which the resources have been reserved for you (here, 1 GPU on the default gpu partition) by using the following command:

$ srun --pty --ntasks=1 --cpus-per-task=12 --gres=gpu:1 --hint=nomultithread [--other-options] bash

Comments - An interactive terminal is obtained with the --pty option. - The reservation of physical cores is assured with the --hint=nomultithread option (no hyperthreading). - The memory allocated for the job is proportional to the number of requested CPU cores . For example, if you request 1/4 of the cores of a node, you will have access to 1/4 of its memory. For example, on turing01, the --cpus-per-task=12 option allows reserving 1/4 of the node memory per GPU. You may consult our documentation on this subject: Memory allocation on GPU partitions - --other-options contains the usual Slurm options for job configuration (--time=, etc.): See the documentation on batch submission scripts - For multi-project users and those having both CPU and GPU hours, it is necessary to specify on which project account to count the computing hours of the job. - We strongly recommend that you consult our documentation detailing computing hours management on Liger to ensure that the hours consumed by your jobs are deducted from the correct allocation.

The terminal is operational after the resources have been granted:

[randria@login02 ~]$ srun --pty -p gpus --ntasks=1 --cpus-per-task=12 --gres=gpu:1 --hint=nomultithread bash
[randria@turing01 ~]$ hostname
turing01
[randria@turing01 ~]$ printenv | grep CUDA
CUDA_VISIBLE_DEVICES=0  <-- GPU 0
[randria@turing01 ~]$ nvidia-smi -L
GPU 0: Tesla V100-SXM2-32GB (UUID: GPU-5a80af23-787c-cbcb-92de-c80574883c5d) <-- allocated
GPU 1: Tesla V100-SXM2-32GB (UUID: GPU-233f07d9-5e4c-9309-bf20-3ae74f0495b4)
GPU 2: Tesla V100-SXM2-32GB (UUID: GPU-a1a1cbc1-8747-d8cd-9028-3e2db40deb04)
GPU 3: Tesla V100-SXM2-32GB (UUID: GPU-8d5f775d-70d9-62b2-b46c-97d30eea732f)
[randria@turing01 ~]$ squeue -j $SLURM_JOB_ID
    JOBID PARTITION         USER ST         TIME NODES  CPUS QOS          PRIORITY NODELIST(REASON)     NAME
  1730514      gpus      randria  R         3:03     1    12 normal         396309 turing01             bash
[randria@turing01 ~]$ scontrol show job $SLURM_JOB_ID
JobId=1730514 JobName=bash
   Priority=396309 Nice=0 Account=ici QOS=normal
   JobState=RUNNING Reason=None Dependency=(null)
   RunTime=00:03:33 TimeLimit=01:00:00 TimeMin=N/A
   Partition=gpus AllocNode:Sid=login02:22331
   NodeList=turing01
   BatchHost=turing01
   NumNodes=1 NumCPUs=12 CPUs/Task=12 ReqB:S:C:T=0:0:*:1
   MinCPUsNode=12 MinMemoryCPU=8G MinTmpDiskNode=0
   Command=bash

Comments - CUDA_VISIBLE_DEVICES=0 means here we have allocated only 1 GPU, the GPU 0 (if we had 2 GPUs requested, it would be 0,1) . You can also use the variable GPU_DEVICE_ORDINAL - the scontrol show job results show JobState=RUNNING means that your session is active and running.

To leave the interactive mode, use exit command :

[randria@turing01 ~]$ exit

Caution: If you do not yourself leave the interactive mode, the maximum allocation duration (by default or specified with the --time option) is applied and this amount of hours is then counted for the project you have specified.

Interactive execution on the GPU partition#

If you don't need to open a terminal on a compute node, it is also possible to start the interactive execution of a code on the compute nodes directly from the front end by using the following command (here, with 2 GPU on the default gpu partition) :

$ srun -p gpus --ntasks=2 --cpus-per-task=12 --gres=gpu:2 --hint=nomultithread [--other-options] ./my_executable_file

Reserving reusable resources for more than one interactive execution#

Each interactive execution started as described in the preceding section is equivalent to a different job. As with all the jobs, they are susceptible to being placed in a wait queue for a certain length of time if the computing resources are not available.

If you wish to do more than one interactive execution in a row, it may be pertinent to reserve all the resources in advance so that they can be reused for the consecutive executions. You should wait until all the resources are available at one time at the moment of the reservation and not reserve for each execution separately.

Reserving resources (here, for 2 GPU on the default gpu partition) is done via the following command:

The reservation becomes usable after the resources have been granted:

$ salloc -p gpus --ntasks=2 --cpus-per-task=12 --gres=gpu:2 --hint=nomultithread [--other-options]
salloc: Granted job allocation 1730516

You can verify that your reservation is active by using the squeue command. Complete information about the status of the job can be obtained by using the scontrol show job <job identifier> command.

You can then start the interactive executions by using the srun command:

$ srun [--other-options] ./code

Comments: if you do not specify any option for the srun command, the options for salloc (for example, the number of tasks) will be used by default.

Important

After reserving resources with salloc, you are still connected on the front end (you can verify this with the hostname command). It is imperative to use the srun command so that your executions use the reserved resources.
If you forget to cancel the reservation with scancel, the maximum allocation duration (by default or specified with the --time option) is applied and this amount of hours is then counted for the project you have specified. Therefore, in order to cancel the reservation, you must manually enter:

$ exit
exit
salloc: Relinquishing job allocation 1730516

Batch submission: single GPU#

To submit a single-GPU job in batch on Liger GPU Node, you must first create a submission script:

For a job with 1 GPU in default gpu partition#

    #!/bin/bash
    #SBATCH --job-name=single_gpu        # name of job
    #SBATCH --account=<project_id>       # replace <project_id> by your project ID
    #SBATCH -p gpus                      # select the gpu partiton
    #SBATCH --nodelist=<node>            # name of the GPU node
    #SBATCH --ntasks=1                   # total number of processes (= number of GPUs here)
    #SBATCH --gres=gpu:1                 # number of GPUs (1/4 of GPUs)
    #SBATCH --cpus-per-task=11           # number of cores per task (1/4 of the 4-GPUs node)
    # /!\ Caution, "multithread" in Slurm vocabulary refers to hyperthreading.
    #SBATCH --hint=nomultithread         # hyperthreading is deactivated
    #SBATCH --time=00:10:00              # maximum execution time requested (HH:MM:SS)
    #SBATCH --output=gpu_single%j.out    # name of output file
    #SBATCH --error=gpu_single%j.out     # name of error file (here, appended with the output file)

    # cleans out the modules loaded in interactive and inherited by default 
    module purge

    # loading of modules
    module load ...

    # load singularity: container engine to execute the image
    module load singularity 

    # bind and run your program in the container
    singularity exec --nv -B <data_folder>:/workspace \
    $BASE_DIR/$IMAGE \
    <my_script>

To launch a Python script, it is necessary to replace the last line by:

# code execution
python -u script_single_gpu.py

Comment: The Python's option -u (= unbuffered) deactivates the buffering of standard outputs which are automatically effectuated by Slurm.

Submit this script via the sbatch command:#

$ sbatch single_gpu.slurm

Slurm commands reference#

sacct: display accounting data for all jobs and job steps in the Slurm database
sacctmgr: display and modify Slurm account information
salloc: request an interactive job allocation
sattach: attach to a running job step
sbatch: submit a batch script to Slurm
scancel: cancel a job or job step or signal a running job or job step
scontrol: display (and modify when permitted) the status of Slurm entities. Entities include: jobs, job steps, nodes, partitions, reservations, etc.
sdiag: display scheduling statistics and timing parameters
sinfo: display node partition (queue) summary information
sprio: display the factors that comprise a job's scheduling priority
squeue: display the jobs in the scheduling queues, one job per line
sreport: generate canned reports from job accounting data and machine utilization statistics
srun: launch one or more tasks of an application across requested resources
sshare: display the shares and usage for each charge account and user
sstat: display process statistics of a running job step
sview: a graphical tool for displaying jobs, partitions, reservations, and Blue Gene blocks You must add the option -Y when connecy by SSH.

Python#

Python is an interpreted object-oriented programming language.

Installed versions#

Python is installed on Liger in version 2 and in version 3.

Important note: The support of version 2.7 ended on 01/01/2020. From this date, there is no longer any updates, not even for security patches; only version 3 is maintained.

Usage#

Python is accessible with the module command :

To load version 3.7.1

 python/3.7.1

To load version 2.7.12

 module load python/2.7.12

Comments#

If you wish to change to the other version of Python, you must open another session. Anaconda python packages installation is available on Liger but it's recommended today you use container installation and building with docker.

Computattion hours accounting#

On LIGER, computing hours are allocated per project and with accounting differentiation between CPU and GPU hours.

The accounting hours consumed by your jobs are determined on the basis of:

The number of reserved physical cores × the elapsed time for a CPU or GPU job.
Or/and the number of reserved GPUs × the elapsed time for a GPU job.

Note

For a GPU type job, the number of reserved GPUs taken into account for the job accounting is not necessarily equal to the number of GPUs requested via the Slurm directive #SBATCH --gres=gpu:.... Indeed, an adjustment can be made according to the number of CPUs which is also requested for the job. For example, requesting a single GPU from a node (i.e. 1/4 of the GPUs) via #SBATCH --gres=gpu:1 and half of the CPUs of the node, leads to the reservation of half of the resources of the node. The job accounting is then calculated on the basis of a reservation of half of the GPUs (so as if you had requested #SBATCH --gres=gpu:2 or #SBATCH --gres=gpu:4 depending on the type of node).

Note also that during an execution in exclusive mode (the --exclusive option of the sbatch command or the Slurm directive #SBATCH --exclusive), you reserve all resources on a gpu node:

48 physical cores for cpu partitions.
48 physical cores and 4 GPUs for default gpu partition.

The accounting will then be on the basis of:

The number of reserved nodes × 48 cores × elapsed time for a CPU job.
The number of reserved nodes × 4 GPUs × elapsed time for a GPU job in default gpu partition..

You can consult your project allocation by using the Mybalance command or SLURM commands.

GPU monitoring: nvidia-smi#

The NVIDIA System Management Interface (nvidia-smi) is a command line utility, based on top of the NVIDIA Management Library (NVML), intended to aid in the management and monitoring of NVIDIA GPU devices.

This utility allows administrators to query GPU device state and with the appropriate privileges, permits administrators to modify GPU device state. It is targeted at the TeslaTM, GRIDTM, QuadroTM and Titan X product, though limited support is also available on other NVIDIA GPUs.

NVIDIA-smi ships with NVIDIA GPU display drivers on Linux. Nvidia-smi can report query information as XML or human readable plain text to either standard output or a file. For more details, please refer to the nvidia-smi documentation.

Example Output#

# nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:18:00.0 Off |                    0 |
| N/A   41C    P0    57W / 300W |      0MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  Off  | 00000000:3B:00.0 Off |                    0 |
| N/A   37C    P0    53W / 300W |      0MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-SXM2...  Off  | 00000000:86:00.0 Off |                    0 |
| N/A   38C    P0    57W / 300W |      0MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  Tesla V100-SXM2...  Off  | 00000000:AF:00.0 Off |                    0 |
| N/A   42C    P0    57W / 300W |      0MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Querying GPU Status#

These are NVIDIA’s high-performance compute GPUs and provide a good deal of health and status information.

List all available NVIDIA devices#

$ nvidia-smi -L
GPU 0: Tesla V100-SXM2-32GB (UUID: GPU-5a80af23-787c-cbcb-92de-c80574883c5d)
GPU 1: Tesla V100-SXM2-32GB (UUID: GPU-233f07d9-5e4c-9309-bf20-3ae74f0495b4)
GPU 2: Tesla V100-SXM2-32GB (UUID: GPU-a1a1cbc1-8747-d8cd-9028-3e2db40deb04)
GPU 3: Tesla V100-SXM2-32GB (UUID: GPU-8d5f775d-70d9-62b2-b46c-97d30eea732f)

List certain details about each GPU#

$ nvidia-smi --query-gpu=index,name,uuid,serial --format=csv
index, name, uuid, serial
0, Tesla V100-SXM2-32GB, GPU-5a80af23-787c-cbcb-92de-c80574883c5d, 1562720002969
1, Tesla V100-SXM2-32GB, GPU-233f07d9-5e4c-9309-bf20-3ae74f0495b4, 1562520023800
2, Tesla V100-SXM2-32GB, GPU-a1a1cbc1-8747-d8cd-9028-3e2db40deb04, 1562420015554
3, Tesla V100-SXM2-32GB, GPU-8d5f775d-70d9-62b2-b46c-97d30eea732f, 1562520023100

Monitor overall GPU usage with 1-second update intervals#

$ nvidia-smi dmon
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    57    42    39     0     0     0     0   877  1290
    1    54    38    38     0     0     0     0   877  1290
    2    57    38    38     0     0     0     0   877  1290
    3    57    43    41     0     0     0     0   877  1290

Monitor per-process GPU usage with 1-second update intervals#

$ nvidia-smi pmon
# gpu        pid  type    sm   mem   enc   dec   command
# Idx          #   C/G     %     %     %     %   name
    0      14835     C    45    15     0     0   python         
    1      14945     C    64    50     0     0   python    
    2          -     -     -     -     -     -   -
    3          -     -     -     -     -     -   -

in this case, two different python processes are running; one on each GPU; only 2 over 4 GPU are used

Monitoring and Managing GPU Boost#

The GPU Boost feature which NVIDIA has included with more recent GPUs allows the GPU clocks to vary depending upon load (achieving maximum performance so long as power and thermal headroom are available). However, the amount of available headroom will vary by application (and even by input file!) so users should keep their eyes on the status of the GPUs. A listing of available clock speeds can be shown for each GPU on turing01 with V100:

$ nvidia-smi -q -d SUPPORTED_CLOCKS
==============NVSMI LOG==============
Timestamp                                 : Mon Nov 23 18:48:39 2020
Driver Version                            : 450.51.06
CUDA Version                              : 11.0

Attached GPUs                             : 4
GPU 00000000:18:00.0
    Supported Clocks
        Memory                            : 877 MHz
            Graphics                      : 1530 MHz
            Graphics                      : 1522 MHz
            Graphics                      : 1515 MHz
            Graphics                      : 1507 MHz
            [...180 additional clock speeds omitted...]
            Graphics                      : 150 MHz
            Graphics                      : 142 MHz
            Graphics                      : 135 MHz

As shown, the Tesla V100 GPU supports 187 different clock speeds (from 135 MHz to 1530 MHz). However, only one memory clock speed is supported (877 MHz). Some GPUs support two different memory clock speeds (one high speed and one power-saving speed). Typically, such GPUs only support a single GPU clock speed when the memory is in the power-saving speed (which is the idle GPU state). On all recent Tesla and Quadro GPUs, GPU Boost automatically manages these speeds and runs the clocks as fast as possible (within the thermal/power limits and any limits set by the administrator).

To review the current GPU clock speed (here we display 1 GPU), default clock speed, and maximum possible clock speed, run:

$ nvidia-smi -q -d CLOCK
==============NVSMI LOG==============
Timestamp                                 : Mon Nov 23 18:56:48 2020
Driver Version                            : 450.51.06
CUDA Version                              : 11.0

Attached GPUs                             : 4
GPU 00000000:18:00.0
    Clocks
        Graphics                          : 1290 MHz
        SM                                : 1290 MHz
        Memory                            : 877 MHz
        Video                             : 1170 MHz
    Applications Clocks
        Graphics                          : 1290 MHz
        Memory                            : 877 MHz
    Default Applications Clocks
        Graphics                          : 1290 MHz
        Memory                            : 877 MHz
    Max Clocks
        Graphics                          : 1530 MHz
        SM                                : 1530 MHz
        Memory                            : 877 MHz
        Video                             : 1372 MHz
    Max Customer Boost Clocks
        Graphics                          : 1530 MHz
    SM Clock Samples
        Duration                          : 0.01 sec
        Number of Samples                 : 4
        Max                               : 1290 MHz
        Min                               : 135 MHz
        Avg                               : 870 MHz
    Memory Clock Samples
        Duration                          : 0.01 sec
        Number of Samples                 : 4
        Max                               : 877 MHz
        Min                               : 877 MHz
        Avg                               : 877 MHz
    Clock Policy
        Auto Boost                        : N/A
        Auto Boost Default                : N/A
...

Ideally, you’d like all clocks to be running at the highest speed all the time. However, this will not be possible for all applications. To review the current state of each GPU and any reasons for clock slowdowns, use the PERFORMANCE flag:

$ nvidia-smi -q -d PERFORMANCE

Attached GPUs                             : 4
GPU 00000000:18:00.0
    Performance State                     : P0
    Clocks Throttle Reasons
        Idle                              : Not Active
        Applications Clocks Setting       : Not Active
        SW Power Cap                      : Not Active
        HW Slowdown                       : Not Active
            HW Thermal Slowdown           : Not Active
            HW Power Brake Slowdown       : Not Active
        Sync Boost                        : Not Active
        SW Thermal Slowdown               : Not Active
        Display Clock Setting             : Not Active
...

Reviewing System/GPU Topology and NVLink with nvidia-smi#

To properly take advantage of more advanced NVIDIA GPU features (such as GPU Direct), it is vital that the system topology be properly configured. The topology refers to how the various system devices (GPUs, InfiniBand HCAs, storage controllers, etc.) connect to each other and to the system’s CPUs. Certain topology types will reduce performance or even cause certain features to be unavailable. To help tackle such questions, nvidia-smi supports system topology and connectivity queries:

$ nvidia-smi topo --matrix
    GPU0    GPU1    GPU2    GPU3    mlx5_0  mlx5_1  CPU Affinity    NUMA Affinity
GPU0     X  NV2 NV2 NV2 NODE    NODE    0,2,4,6,8,10    0
GPU1    NV2  X  NV2 NV2 NODE    NODE    0,2,4,6,8,10    0
GPU2    NV2 NV2  X  NV2 SYS SYS 1,3,5,7,9,11    1
GPU3    NV2 NV2 NV2  X  SYS SYS 1,3,5,7,9,11    1
mlx5_0  NODE    NODE    SYS SYS  X  PIX
mlx5_1  NODE    NODE    SYS SYS PIX  X

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

Reviewing this section will take some getting used to, but can be very valuable. The above configuration shows 4 Tesla V100 and 2 Mellanox EDR InfiniBand HCA (mlx5_0 and mlx5_1) all connected to the first CPU of a server. Because the CPUs are 12-core Xeons, the topology tool recommends that jobs be assigned to the first 12 CPU cores (although this will vary by application).

The NVLink connections themselves can also be queried to ensure status, capability, and health. Readers are encouraged to consult NVIDIA documentation to better understand the specifics.

$ nvidia-smi nvlink --status
GPU 0: Tesla V100-SXM2-32GB (UUID: GPU-5a80af23-787c-cbcb-92de-c80574883c5d)
     Link 0: 25.781 GB/s
     Link 1: 25.781 GB/s
     Link 2: 25.781 GB/s
     Link 3: 25.781 GB/s
     Link 4: 25.781 GB/s
     Link 5: 25.781 GB/s
GPU 1: Tesla V100-SXM2-32GB (UUID: GPU-233f07d9-5e4c-9309-bf20-3ae74f0495b4)
     Link 0: 25.781 GB/s
     Link 1: 25.781 GB/s
     Link 2: 25.781 GB/s
     Link 3: 25.781 GB/s
     Link 4: 25.781 GB/s
     Link 5: 25.781 GB/s
GPU 2: Tesla V100-SXM2-32GB (UUID: GPU-a1a1cbc1-8747-d8cd-9028-3e2db40deb04)
     Link 0: 25.781 GB/s
     Link 1: 25.781 GB/s
     Link 2: 25.781 GB/s
     Link 3: 25.781 GB/s
     Link 4: 25.781 GB/s
     Link 5: 25.781 GB/s
GPU 3: Tesla V100-SXM2-32GB (UUID: GPU-8d5f775d-70d9-62b2-b46c-97d30eea732f)
     Link 0: 25.781 GB/s
     Link 1: 25.781 GB/s
     Link 2: 25.781 GB/s
     Link 3: 25.781 GB/s
     Link 4: 25.781 GB/s
     Link 5: 25.781 GB/s

nvidia-smi nvlink --capabilities

Printing all GPU Details#

To list all available data on a particular GPU, specify the ID of the card with -i. Here’s the output from an older Tesla GPU card:

$ nvidia-smi -i 0 -q
$ nvidia-smi -i 0 -q -d MEMORY,UTILIZATION,POWER,CLOCK,COMPUTE

source

SSH on GPU nodes#

You can connect by SSH to a GPU node which were assigned to one of your jobs so that you can monitor the execution of your calculations with tools such as top or nvidia-smi (for example).

When one of your jobs is running, the compute nodes assigned to it are visible with the command squeue -j <job_id> or squeue -u $USER :

$ squeue -u $USER
     JOBID  PARTITION         NAME       USER  ST     TIME  NODES  NODELIST(REASON)
   2042259     gpus         my_job   my_login   R    01:42     12  turing01

In this example, job # 2042259 is running on turing01. You can connect via ssh with the following command:

$ ssh turing01

Note that you will be automatically disconnected from the node when your job is finished.

If you try to connect to a node on which none of your jobs is running, you will obtain the following error message:

[randria@login02 ~]$ ssh turing01
Access denied: user randria (uid=1014) has no active jobs.
Connection closed by 172.30.0.54