Tweaks to our Slurm utilities

We’ve made a couple of minor improvements to the handy seff & sinfo-nodes Slurm utilities. Read on for more details.

Improvements to seff

Our wee seff tool generates a summary report about a job on our Slurm Compute Cluster once it has completed.

This report includes details on the job’s resource usage, and can be really handy for understanding your jobs’ resource usage so that you can potentially improve this for future jobs.

The seff tool is bundled with Slurm, however we’ve already modified it quite heavily to make it work better with our cluster and make the output a bit more helpful.

And we’ve just made some further improvements! These are described below.

Energy usage is now measured in kilowatt hours (kWh)

Our seff report gives you an estimate of how much energy was consumed by the compute node(s) that ran your job.

This previously reported your energy usage in kilojoules or megajoules.

We’ve now changed it to report in kilowatt hours (kWh), which is probably more meaningful to most people – especially in light of the recent energy price crisis!

(Hopefully) better CPU time measurements for jobs that don’t complete successfully

We had noticed that seff almost always under-reports CPU usage for jobs that didn’t complete successfully – for example, if the job exceeded its time limit or got cancelled. We think this is due to a bug in how Slurm records its accounting data.

We’ve therefore changed seff to use an alternative CPU usage calculation in these cases, which should result in more helpful CPU usage data. We’ve added a wee note to the seff report output to let you know when it’s using this alternative calculation.

Changes to the sinfo-nodes & sinfo-partitions helper script

We provide a couple of handy helper scripts called sinfo-nodes & sinfo-partitions, which wrap round Slurm’s core sinfo utility to give you handy summary details about our own Slurm compute nodes & partitions – including information that makes sense in our wee Cluster, but excluding stuff that’s not so useful for us.

We’ve just made the following minor tweaks to these scripts:

We’ve added an AVAIL_FEATURES column to sinfo-nodes. This gives you details about the “features” available on each specific node. Currently these features provide details about the type of CPU on each node.
We’ve dropped the AVAIL column from both sinfo-nodes & sinfo-partitions. This was never very useful on our cluster.
We’re now showing details of all user-facing partitions and nodes, thus showing you nodes & partitions that you might not currently have access to.

About our Slurm Compute Cluster

The Slurm Compute Cluster is a general purpose computing facility in the School of Physics & Astronomy, providing a growing pool of fairly powerful Linux computers for efficiently running “bigger” computations

Documentation and guidance on using the cluster can be found via the following link:

Slurm Compute Cluster

Help!

If you have any questions about these new nodes or our Slurm Compute Cluster, please contact us:

You can email the School Helpdesk: sopa-helpdesk@ed.ac.uk
Alternatively, you can post in the SoPA Research Computing space in Teams.

Tweaks to our Slurm utilities / Linux & Scientific Computing - Physics & Astronomy by blogadmin is licensed under a Creative Commons Attribution CC BY 3.0

Posted by David McKain on 26th May 2023

Categories

News & updates

Tags

No tags have been added to this post.

Previous post

Minor compiler refresh

Next post

HOWTO: Installing the PolyLogTools Mathematica package on School Ubuntu Linux hosts

Comments are closed

Comments to this thread have been closed by the post author or by an administrator.

Linux & Scientific Computing – Physics & Astronomy