What can a global IT outage teach us as a society?

On the 19th of July (past Friday) we had a global IT outage which impacted hospitals, trains, air travel, supermarkets and many other businesses. It seems like it was caused by a small/inconspicuous update to a bit of anti-virus software for Windows operating systems which seemed to ‘break’ Windows. Although the issue has been fixed, there are still some lingering issues caused by how widespread the issue was and the fact that in order to fix it, it seems to need human intervention, likely actually at the device itself.

This is not my area of expertise and I’m not going to go into the details of how it happened. However, I do think that it has something to teach us all. I was watching in interest/horror as it was unfolding, before the cause was known. We weren’t impacted, thank goodness.

It was bad, but it could have been a lot worse.

First of all, Linux/Unix and Apple devices were not impacted at all and so although the impact was wide, there were many businesses and services which weren’t impacted. That’s because either they had non-Windows devices and/or because they didn’t use CrowdStrike. It’s good not to have all of your eggs in one basket. Having choice, although it probably makes things more expensive, gives a bit of space if things go wrong. I think it’s critical to weigh the pros and cons of this though and find the best middle ground.

Secondly, it’s super important to have business continuity plans in place wherever possible. We are so reliant on technology these days what happens when it disappears? In the early hours of Friday when it looked like Microsoft was taking action (with no explanation of what the possible impact was) we started to consider other ways of communicating as a team (slack/whatsapp), just-in-case Teams and email went down. At least we would be able to talk to each other/coordinate things should the worst happen.

In this day an age, cloud providers roll out changes so regularly, we barely notice. These changes are happening all of the time – often to keep us safe – and have likely kept us secure and saved us from many days like Friday in the past (but we haven’t noticed because nothing broke). So there’s a far bigger risk if we don’t get those regular updates and patches but in this case the update did cause a problem. This will happen sometimes and it’s important we understand that and try to put mitigations in place where that’s possible, prioritising where lives are in danger or people are unsafe.

And as a fellow IT person, I feel terrible for the poor soul(s) who put that update out, completely unaware of the chaos it would cause.

Jul 22, 2024

What can a global IT outage teach us as a society? / The witterings and musings of a learning technologist by blogadmin is licensed under a Creative Commons Attribution CC BY 3.0

Report this page

To report inappropriate content on this page, please use the form below. Upon receiving your report, we will be in touch as per the Take Down Policy of the service.

Please note that personal data collected through this form is used and stored for the purposes of processing this report and communication with you.

If you are unable to report a concern about content via this form please contact the Service Owner.

Your name Your email address Please enter an email address you wish to be contacted on. Report description Please describe the unacceptable content in sufficient detail to allow us to locate it, and why you consider it to be unacceptable.
By submitting this report, you accept that it is accurate and that fraudulent or nuisance complaints may result in action by the University.

Cancel

Share

Leave a Reply Cancel reply

Report this page