New Slurm compute nodes available now/soon!

We recently purchased 7 new nodes for our Slurm Compute Cluster. 4 of these are available to use now. The remaining 3 will be available soon. We’ve also repurposed 3 nodes from our old SGE Compute Cluster. Read on for more details.

What’s happening?

We’re pleased to announce that we’ve just added four brand new compute nodes to our Slurm Compute Cluster.

These new nodes are called phcompute026-029. They’ve been added to the “long” partition and are available to use now.

All Slurm Compute Cluster users can use these new nodes. (If you don’t have Slurm access yet, please ask us to set up an account for you. We can add any member of staff or PhD student in the School of Physics & Astronomy. We can also add advanced Undergraduate & MSc students who are taking computational projects that might benefit from Slurm access, and occasionally collaborators from other Schools or institutions on a case by case basis.)

You don’t need to change your Slurm batch scripts to use these new nodes – if you’re already submitting jobs to the long partition then your job might run on one of these nodes if they have sufficient free resources to run your job.

We have a further three identical nodes (phcompute030-032) coming online soon. I’m currently hoarding these in order to finish off some testing & benchmarking… but I’ll hopefully be all done by the end of February. I’m contemplating making these three nodes available initially for users who’d like to do some guided benchmarking & optimisation of their codes and workflows. So they’ll probably have slightly restricted access at first, before becoming generally available in the same way as phcompute026-029.

About phcompute026-032

Our 7 new nodes have the same shape as most of our existing nodes – in particular they all have AMD EPYC CPUs, 64 CPU cores per node and 256 GB RAM per node.

However these new nodes have newer 3rd generation EPYC 7543 CPUs, compared to the 2nd generation EPYC 7452 nodes we bought as phcompute001-020.

As you’d maybe expect, the new nodes are a bit higher spec than the original nodes. They have a faster clock speed (2.8 GHz – 3.7 GHz, as compared to 2.35 GHz – 3.35 GHz) and twice as much Level 3 memory cache. So we’re seeing slightly better performance for both serial and parallel codes so far, but I’d like to see more benchmarking and comparison between our newer and older compute nodes.

On the other hand, these new CPUs are also much hungrier for power than the original CPUs! So we’ve had to upgrade the power supplies for the server rack that houses all of our compute stuff, which is why it has taken a bit of time to get these ones up and running.

Also…. three new old nodes: phcomputecl1-3

Following the recent decommissioning of our legacy SGE Compute Cluster, we’ve decided to reuse the 3 best compute nodes from the SGE cluster in our Slurm Cluster, as they’re actually pretty decent.

These old nodes were previously called phcomputecm01-03, but we’ve renamed them as phcomputecl1-3 and rebuilt them to use our Ubuntu Linux platform so that they have exactly the same software stack as our other Slurm compute nodes (and indeed all of our current compute hosts).

These nodes have Intel CPUs with 32 cores (for phcomputecl1-2) and 36 cores (phcomputecl3) respectively, so they’re quite different from the rest of the AMD EPYC nodes in our Slurm cluster. I’ve therefore created a brand new partition called legacy specifically to house these nodes.

You’ll need to explicitly submit jobs to this partition if you want to use these older nodes. You can do this via the –partition=legacy option in your batch scripts and/or sbatch command. (Note that if you’re not bothered which node you get, you can submit jobs to run on either the long or cm-legacy partition with –partition=long,legacy.)

Finally note that these 3 legacy nodes are now out of their hardware warranty period and we’ll be scrapping them when they fail. So don’t get too attached to them!

About our Slurm Compute Cluster

The Slurm Compute Cluster is a general purpose computing facility in the School of Physics & Astronomy, providing a growing pool of fairly powerful Linux computers for efficiently running “bigger” computations

Documentation and guidance on using the cluster can be found via the following link:

Slurm Compute Cluster

Help!

If you have any questions about these new nodes or our Slurm Compute Cluster, please contact us:

You can email the School Helpdesk: sopa-helpdesk@ed.ac.uk
Alternatively, you can post in the SoPA Research Computing space in Teams.

New Slurm compute nodes available now/soon! / Linux & Scientific Computing - Physics & Astronomy by blogadmin is licensed under a Creative Commons Attribution CC BY 3.0

Posted by David McKain on 8th February 2023

Categories

News & updates

Tags

No tags have been added to this post.

Previous post

Optimisations for OpenBLAS enabled on newer hardware

Next post

New software: Singularity

Comments are closed

Comments to this thread have been closed by the post author or by an administrator.

Linux & Scientific Computing – Physics & Astronomy