Slurm has been upgraded to version 22.05
We’ve just upgraded the version of Slurm powering our School’s Slurm Compute Cluster from version 20.11 to version 22.05. Read on to find out what’s new, and learn about a change that may affect a very small number of users.
New scancel –me option
Slurm’s scancel command has gained a handy new –me option. This allows you to easily cancel all of your current Slurm jobs, whether they’re already running or waiting in the queue.
This can be really useful if you’ve just submitted a bunch of jobs and then realised you’ve made a wee mistake somewhere. I’m always submitting a bunch of broken jobs, so this is handy to have!
Use it as follows:
scancel --me
Obviously use this with care – it really will kill all of your jobs and it doesn’t ask for confirmation first!
Note also that it’s not possible for users to kill other people’s jobs.
Recovering a past job’s batch script
This new version of Slurm supports storing every job’s batch script permanently in our accounting database, and we’ve enabled this feature as we think it’s really useful.
This means that you can now recover the batch script from one of your previous jobs via the following command:
scontrol write batch_script [job_id]
This will save the batch script to a file called slurm-[job_id].sh.
This feature probably won’t be something you use very often, but it could perhaps be handy if you need to look back at an old job and can’t remember where you put its batch script, or if you subsequently deleted it.
Note that this will only work for jobs submitted after today’s upgrade, i.e. jobs with Job ID greater than 641900.
Change to Slurm’s handling of –cpus-per-task (-c) in batch scripts
This new version of Slurm has made a change to how the –cpus-per-task (or simply -c) option works inside batch scripts.
If you’re using this in a batch script to launch a code with Slurm’s srun command, then you may now need to add the following to your batch script before your srun command:
export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK
The most common usage of –cpus-per-task on our cluster is to run multi-threaded codes – typically using OpenMP. In these cases, you’ll usually be setting the OMP_NUM_THREADS variable in a similar way to above, and this will continue to be sufficient to run your code.
So, if you’re running an OpenMP code by following the examples in our Slurm Batch Script Examples page then you won’t need to make any changes here. In particular, I’ve tested all of the examples in this page and didn’t need to change any of them for this new version of Slurm. So hopefully this change will only impact some more esoteric jobs. Indeed, most current jobs on our cluster are either serial or MPI (controlled via the –ntasks or -n option) and are not affected by this change.
Full list of changes
A full list of the changes introduced in this new version (22.05) and the intermediate version (21.08) we’ve skipped over today can be found in Slurm’s official release notes pages below:
Most of this aimed at Slurm administrators so is fairly technical, and a lot of the things mentioned there aren’t relevant to our wee cluster.
What is our Slurm Compute Cluster?
The School’s Slurm Compute Cluster is a general purpose computing facility in the School of Physics & Astronomy, providing a growing pool of fairly powerful Linux computers for efficiently running “bigger” computations
Documentation and guidance on using the cluster can be found via the following link:
Help!
If you have any questions about these new nodes or our Slurm Compute Cluster, please contact us:
- You can email the School Helpdesk: sopa-helpdesk@ed.ac.uk
- Alternatively, you can post in the SoPA Research Computing space in Teams.
Comments are closed
Comments to this thread have been closed by the post author or by an administrator.