Replacement of failed SAN controller
We have received a replacement for the SAN controller that failed following the power cut of the 11th of November. We had to abort our plan to replace it yesterday (Monday 18th) after we discovered there’s not enough slack in the cables to manoeuvre the failed controller out and the replacement controller in. Unfortunately this now means we will have to turn off the SAN box to do the replacement.
This SAN box (ifevo4 as we call it) currently serves about 50TB of data, all of which will be unavailable while it is powered down to replace the controller. Given the disruption this may cause, we plan to do the work starting at 10:30am on Sunday 24th November.
To minimise the disruption, for the servers that have both local disk storage and SAN mounted storage, we will unmount the SAN storage and leave the local disk data available. This means that, apart from a couple of short breaks (a couple of minutes each), most home directories will remain available.
For the rest of the SAN mounted data (mostly group space) it will be unavailable for the duration of the controller swap, which should take between 30mins and an hour.
To check if your home directory is on local disk, run the “homedir” command. If it says your home directory is on either a /vicepa, /vicepb or /vicepc partition, then you will be fine (apart from the brief interruptions). eg in my case:
neilb> homedir
neilb (Neil Brown) : nessie/vicepc : /afs/inf.ed.ac.uk/user/n/neilb : free
162.2G (used 64%)
So I’m on server “nessie” and partition “/vicepc”, so should be fine. We realise that some users are still on SAN mounted space, and between now and Sunday, we’ll be moving who we can to local disk.
All other networked file space will be unavailable during the replacement, eg everything under /group or /afs/inf.ed.ac.uk/group.
If there are any major problems with the planned date and time, please get in touch as soon as possible, but the longer we run on a single controller the bigger the risk of further unplanned failures.
Neil
Services Unit