Potential Problem Analysis – What could possibly go wrong?
Many service management disciplines encounter risk as, in the real world, perfect knowledge is not possible – much of what we do will involve a degree of uncertainty.
Whether we are implementing changes to complex systems, devising disaster scenarios for continuity planning, attempting to respond to a major incident with incomplete information or predicting technology advancements, we are predicting the future with various degrees of confidence.
ITIL and PRINCE2 have a best practice stablemate in the form of M_o_R (Management of Risk, and an irritatingly formatted acronym…) and the University has guidance on risk management, however the Kepner-Tregoe approach is in harmony with these.
Please note that risk analysis applies equally to opportunities and threats – sometimes a desirable outcome is not guaranteed and needs to be secured, just as a threat might need to avoided.
1. Identify potential problems
We need to ask what might go wrong with our intended course of action, or what opportunity might arise – but I’ll mercifully stop duplicating the good now! M_o_R would insist that all the potentialities be listed and only then whittled down to those worth developing further (based on impact, probability and proximity), but often for our operational purposes an exhaustive listing of risks isn’t required. Tools such as PESTLE analysis (and extensions) can help ensure that nothing is overlooked.
2. Identify likely causes
Each of the risks (potential problems) will have causes – note the plural, as there are likely to be multiple combining or contributory causes of something going wrong. Identifying the multiple causes of the potential problem gives multiple targets for the next step. A fire requires fuel, oxygen and a source of ignition, but removing only one will prevent it.
3. Take preventative action
Having identified the causes of the potential problem, we can now plan to avoid or reduce them. Risks that can’t be avoided completely can still have the impact reduced before they happen (i.e. don’t put all your eggs in one basket!). M_o_R includes other risk response options which are less operationally useful – “risk transfer” (this is not simply passing the buck, often this means some form of insurance), “risk sharing” (spreading it between more entities). Then there’s the classic “risk acceptance”…
4. Plan contingent action
Stopping the risk being realised may be the most desirable option, but may not be possible or cost effective. Sometimes we have to hope that it won’t happen, but prepare for it in case it does… This may be the classic rollback plan, or it may be some form of mitigation. Mitigation can be considered to be an impact reduction action, after-the-fact. Every first aid response or fire evacuation plan is a contingent action – the University may aim to prevent staff injury and fires, but would be negligent to ignore such contingency plans in case the prevention fails!
Documented, tested contingency plans are the gold standard – as Matt often says “we’ll restore from backup” is an aspiration not a plan!
5. Set triggers
Whilst this might be considered part of the contingency planning, it’s often overlooked, so worth drawing out separately – how will we know that the contingency actions need to be taken? At what point will the rollback be triggered? Deciding the triggers in advance will save the hassle of the “we should press on” versus “we should go back” argument in the heat of the crisis. As always, SMART triggers will be easier than vague statements.
Despite promising not to, I can’t resist coming back to opportunities that should be exploited or maximised – as it allows me to end with ABBA…