Tweaks to our Slurm utilities
We’ve made a couple of minor improvements to the handy seff & sinfo-nodes Slurm utilities. Read on for more details.
Improvements to seff
Our wee seff tool generates a summary report about a job on our Slurm Compute Cluster once it has completed.
This report includes details on the job’s resource usage, and can be really handy for understanding your jobs’ resource usage so that you can potentially improve this for future jobs.
The seff tool is bundled with Slurm, however we’ve already modified it quite heavily to make it work better with our cluster and make the output a bit more helpful.
And we’ve just made some further improvements! These are described below.
Energy usage is now measured in kilowatt hours (kWh)
Our seff report gives you an estimate of how much energy was consumed by the compute node(s) that ran your job.
This previously reported your energy usage in kilojoules or megajoules.
We’ve now changed it to report in kilowatt hours (kWh), which is probably more meaningful to most people – especially in light of the recent energy price crisis!
(Hopefully) better CPU time measurements for jobs that don’t complete successfully
We had noticed that seff almost always under-reports CPU usage for jobs that didn’t complete successfully – for example, if the job exceeded its time limit or got cancelled. We think this is due to a bug in how Slurm records its accounting data.
We’ve therefore changed seff to use an alternative CPU usage calculation in these cases, which should result in more helpful CPU usage data. We’ve added a wee note to the seff report output to let you know when it’s using this alternative calculation.
Changes to the sinfo-nodes & sinfo-partitions helper script
We provide a couple of handy helper scripts called sinfo-nodes & sinfo-partitions, which wrap round Slurm’s core sinfo utility to give you handy summary details about our own Slurm compute nodes & partitions – including information that makes sense in our wee Cluster, but excluding stuff that’s not so useful for us.
We’ve just made the following minor tweaks to these scripts:
- We’ve added an AVAIL_FEATURES column to sinfo-nodes. This gives you details about the “features” available on each specific node. Currently these features provide details about the type of CPU on each node.
- We’ve dropped the AVAIL column from both sinfo-nodes & sinfo-partitions. This was never very useful on our cluster.
- We’re now showing details of all user-facing partitions and nodes, thus showing you nodes & partitions that you might not currently have access to.
About our Slurm Compute Cluster
The Slurm Compute Cluster is a general purpose computing facility in the School of Physics & Astronomy, providing a growing pool of fairly powerful Linux computers for efficiently running “bigger” computations
Documentation and guidance on using the cluster can be found via the following link:
If you have any questions about these new nodes or our Slurm Compute Cluster, please contact us:
- You can email the School Helpdesk: email@example.com
- Alternatively, you can post in the SoPA Research Computing space in Teams.