Any views expressed within media held on this service are those of the contributors, should not be taken as approved or endorsed by the University, and do not necessarily reflect the views of the University in respect of any particular issue.

Magnus Hagdorn

Magnus Hagdorn

Research Software Engineer

June Ceph Update – the Queen’s CURRY

We had the opportunity to try out a cold start of all our servers located in the College Server Room on the weekend. We shut down all machines on Friday 11th June to allow contractors to work on the electricity supply on Saturday.

From my perspective the most interesting aspect was shutting down the ceph cluster as we hadn’t done that since we started using CephFS for general file serving. During the day on Friday we reduced the number of active MDS to 1. This took a few minutes until all data were flushed. After six on Friday night we shut down all compute boxes and VMs reducing the number of CephFS clients. Finally we also shut down the virtualisation service. We then took the CephFS offline:
ceph fs set one down true
This took a few minutes again flushing all metadata to the ceph metadata pool. Next we shut down the ceph cluster:
ceph osd set noout
ceph osd set nobackfill
ceph osd set norecover

followed by
ceph osd set norebalance
ceph osd set nodown
ceph osd set pause

We then switched off the ceph nodes. Finally, we shutdown the two remaining servers which provide DHCP and DNS services. The whole procedure took about 1 hour.

The electrical work was carried out Saturday morning and was completed by 14:00. Time to switch everything back on. This is where we hit the first issue. All machines were switched off including the DHCP servers. Our remote controls get their IP addresses from DHCP so we couldn’t switch on the servers remotely. Ah well, nothing a bike ride couldn’t fix. The DHCP/DNS servers came up fine. Next I switched on the ceph nodes. Once all ceph nodes were up again a reversed setting the various options to shutdown the ceph cluster above. Finally, we re-enabled the CephFS
ceph fs set one down false
This procedure was nice and quick and took about 5 minutes. Next on the list was the virtualisation service. Which came up without any problems. This also started all the virtual machines including the file server frontends. We used this opportunity to switch the frontends to Ubuntu. Finally, we switched on all compute boxes. We were back in business in 1 hour from a cold start.

We spotted one more problem: we are using NFS to export our storage to Linux desktops. Root could access it but there was some locking issues for users. We are using a ganesha active-active NFS server cluster. It turned out the shared state DB was in error because we had removed the old SL7 servers. Once this was resolved NFS worked again.

All in all the super CURRY worked out quite well. Decoupling the file servers from the storage is a huge a benefit. It allowed us to switch OS without having to transfer any data. My impression is that the new ubuntu based file server frontends are more snappy as well.

Leave a reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

This site uses Akismet to reduce spam. Learn how your comment data is processed.

css.php

Report this page

To report inappropriate content on this page, please use the form below. Upon receiving your report, we will be in touch as per the Take Down Policy of the service.

Please note that personal data collected through this form is used and stored for the purposes of processing this report and communication with you.

If you are unable to report a concern about content via this form please contact the Service Owner.

Please enter an email address you wish to be contacted on. Please describe the unacceptable content in sufficient detail to allow us to locate it, and why you consider it to be unacceptable.
By submitting this report, you accept that it is accurate and that fraudulent or nuisance complaints may result in action by the University.

  Cancel