HPC cluster basics

SLURM: Simple Linux Utility for Resource Management

  • A simple text file with all the requirements for running your job
    • Memory requirement
    • Desired number of processors
    • Length of time you want to run the job
    • Type of queue you want to use (optional)
    • Where to write output and error files
    • Name for your job while running on HPC

Job Script Basics

A typical job script will look like this:

#SBATCH --nodes=1
#SBATCH --cpus-per-task=8
#SBATCH --time=02:00:00
#SBATCH --mem=128G
#SBATCH --mail-user=netid@gmail.com
#SBATCH --mail-type=begin
#SBATCH --mail-type=end
#SBATCH --error=JobName.%J.err
#SBATCH --output=JobName.%J.out


module load modulename


Lines starting with #SBATCH are for SLURM resource manager to request resources for HPC. Some important options are as follows:

--nodes#SBATCH --nodes=1Number of nodes
--cpus-per-task#SBATCH --cpus-per-task=16Number of CPUs per node
--time#SBATCH --time=HH:MM:SSTotal time requested for your job
--output#SBATCH -output filenameSTDOUT to a file
--error#SBATCH --error filenameSTDERR to a file
--mail-user #SBATCH --mail-user user@domain.eduEmail address to send notifications

Job Management Commands

Job StatusCommands
sinfo -alist all queues
squeue list all jobs
squeue -u useridlist jobs for userid
squeue -t Rlist running jobs

Let’s go ahead and give these job management commands a try.

sinfo -a
squeue -t R
#pick a name you saw when you typed squeue and specify all the jobs by that person with the following option
squeue -u first.lastname

There can be a lot of information using those two commands. I have created some useful alias’ that change the output to something more informative.

alias sq='squeue -o "%8i %12j %4t %10u %20q %20a %10g %20P %10Q %5D %11l %11L %R"'
alias si='sinfo -o "%20P %5D %14F %8z %10m %10d %11l %16f %N"'

Where (A/I/O/T) = available/idle/other/total

You can place those alias’ into your ~/.bashrc file and it will automatically load every time you log in.

Exercise: Add these two alias’ above to your ~/.bashrc file

nano ~/.bashrc

Job scheduling commands

CommandsFunctionBasic UsageExample
sbatchsubmit a slurm jobsbatch [script]$ sbatch job.sub
scanceldelete slurm batch jobscancel [job_id]$ scancel 123456

Interactive Session

To start a interactive session execute the following:

# this command will give 1 Node with 1 cpu in the brief-low queue for a time of 00 hours: 01 minutes: 00 seconds

salloc -N 1 -n 1 -p brief-low -t 00:01:00

# You can exit out of an interactive node by typing exit and hitting return

Interactive sessions are very helpful when you need more computing power than your laptop or desktop to wrangle the data or to test new software prior to submitting a full batch script.