Forum power failure on 11th November
We suffered a power failure in the Informatics Forum on Monday 11th November 2013, starting at about 11am and ending at about 1pm. We still have no information from Estates and Buildings as to why this failure occured.
Many users were surprised to find that whilst power to their desktops and the network wasn’t interrupted, many of the School’s servers shutdown shortly after 11am.
Emergency power for servers based in the Forum is provided by a pair of UPS. These are primarily intended to allow us to weather short (eg a few mins) power interruptions and to cleanly shutdown servers for longer interruptions. When both UPSes are fully functional, we have around 45 minutes of runtime on battery (given our current power load). Unfortunately, one of the UPSes has been out of action for a number of months, reducing our runtime on battery to 20 minutes.
Emergency power for offices and the network is provided by a single building UPS. This has a runtime of around 3 hours on battery, given our current power load. It is worth noting that the energy overhead of the building UPS is quite high, and consideration is being given to withdrawing it from service. No other University building has this level of cover for offices.
When power was reinstated at 1pm, the majority of services resumed reasonably quickly. However, the hardware failure of a disk controller in one of the storage arrays had a knock on effect for a number of services – eg AFS. Power to some less critical services (eg Hadoop cluster) wasn’t immediately restored, just in case the power had dropped again.
You can read our post mortem here.
Thanks for the information about this Alastair.
From Bill Bordass’ energy report, the estimated energy overhead of the building UPS is 934 kWh a day, based on the experiment from January to March 2013 when the building UPS was turned off. This equates to 17.8% of the Forum’s energy consumption in 2012 (1,918,673 kWh), or about 500kg CO2 per day or £34,091 per year. Some of the overhead is due to the cooling required by the UPS. In fact the savings might not be quite that great, as the building UPS supplies the cooling for the server room, so a small UPS would be needed to take over the server room cooling. But I think the smaller UPS could run in an less energy-intensive mode, because the server room cooling could cope with a delay of the order of seconds for the UPS to kick in.
With my energy co-ordinator hat on, I’d say that the benefit we get from the building UPS isn’t worth the cost or the CO2 emissions.
From the point of view of the School, however, it is rational to keep the UPS on, as the electricity is on the University’s budget, not ours. I understand that energy devolution has been put on hold for an indefinite period.
From the University’s point of view, turning off our UPS would make sense in the context of the aim to reduce emissions by 3% a year. However, E&B seem to be pretty busy at the moment, so I don’t know when any UPS replacement might happen.