HPC cluster basics
SLURM: Simple Linux Utility for Resource Management
- A simple text file with all the requirements for running your job
- Memory requirement
- Desired number of processors
- Length of time you want to run the job
- Type of queue you want to use (optional)
- Where to write output and error files
- Name foryour job while running on HPC
Job Script Basics
A typical job script will look like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 #!/bin/bash #SBATCH --nodes=1 #SBATCH --cpus-per-task=8 #SBATCH --time=02:00:00 #SBATCH --mem=128G #SBATCH --email@example.com #SBATCH --mail-type=begin #SBATCH --mail-type=end #SBATCH --error=JobName.%J.err #SBATCH --output=JobName.%J.out cd $SLURM_SUBMIT_DIR module load modulename your_commands_goes_here
Lines starting with
#SBATCH are for
SLURM resource manager to request resources for HPC. Some important options are as follows:
|Number of nodes|
|Number of CPUs per node|
|Total time requested for your job|
|STDOUT to a file|
|STDERR to a file|
|Email address to send notifications|
Job Management Commands
|list all queues|
|list all jobs|
|list jobs for userid|
|list running jobs|
Let’s go ahead and give these job management commands a try.
1 2 3 4 5 sinfo -a squeue squeue -t R #pick a name you saw when you typed squeue and specify all the jobs by that person with the following option squeue -u first.lastname
There can be a lot of information using those two commands. I have created some useful alias’ that change the output to something more informative.
1 2 alias sq='squeue -o "%8i %12j %4t %10u %20q %20a %10g %20P %10Q %5D %11l %11L %R"' alias si='sinfo -o "%20P %5D %14F %8z %10m %10d %11l %16f %N"'
You can place those alias’ into your
~/.bashrc file and it will automatically load every time you log in.
Exercise: Add these two alias’ above to your
1 nano ~/.bashrc
Job scheduling commands
|submit a slurm job||sbatch [script]||$ sbatch job.sub|
|delete slurm batch job||scancel [job_id]||$ scancel 123456|
To start a interactive session execute the following:
1 2 3 4 5 # this command will give 1 Node with 1 cpu in the brief-low queue for a time of 00 hours: 01 minutes: 00 seconds salloc -N 1 -n 1 -p brief-low -t 00:01:00 # You can exit out of an interactive node by typing exit and hitting return
Interactive sessions are very helpful when you need more computing power than your laptop or desktop to wrangle the data or to test new software prior to submitting a full batch script.