Problem Analysis: Insensibly one begins to twist facts…
Problem Analysis is probably the most well known of the Kepner-Tregoe thinking processes, and the one referenced in the ITIL textbooks. Almost certainly your current job description includes a “problem solving” section, yet I suspect that problem solving rarely proceeds via a systematic process and instead is often intuitive and sporadic. How many times has a problem been investigated superficially before being dropped into the “too difficult” bin? Or where the initial hunch, “I know what’s wrong! We just need to…” has led to a dead end, delay or the even situation being made worse?
I never guess. It is a shocking habit — destructive to the logical faculty.
Our existing Problem Management process (unsurprisingly!) covers the management of problems and known errors, but gives no guidance on how to investigate problems, so it’s probably worth spending some time on the practicalities.
As I mentioned in my first Kepner-Tregoe blog post, I once considered Problem Analysis as just a description of the scientific method, but that’s an oversimplification.
Counter-intuitively, the goal in a systematic problem investigation is to slow down – to prevent jumping to conclusions.
It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.
1. Describe Problem
Never trust to general impressions… but concentrate yourself upon details.
The first step is to state the problem and describe it as specifically, completely and factually as possible. Having a precise statement of the problem frames all subsequent work, and can (mis-)direct efforts – consider “puddle on the bathroom floor” versus “bathroom sink leaking”!
Describing the problem means stepping through “is and is not”.
What specifically is happening, and what is not happening
Where is impacted, and where is not
When is it happening and when is it not (including since when!)
Extent/Scope How many, what size what’s the trend, and the inverse of these
2. Identify Possible Causes
One should always look for a possible alternative, and provide against it.
Only now should possible causes be listed. Knowledge and experience will provide possible causes, as well the Problem description – what stands out? What’s distinctive (“it’s only impacting…”) or what’s changed? Several possible alternatives are likely to present themselves, and all possible causes should be included (again to prevent jumping to conclusions).
3. Evaluate Possible Causes
…when you have eliminated the impossible, whatever remains, however improbable, must be the truth…
The possible causes can be compared to the description. For each possible cause, does it fully explain every “is and is not” pair? Are there things that the possible cause does not explain? Perhaps there’s something that can only be explained if something else is also happening or true? If this is the cause, what other evidence might there be to support it? Following all of these questions might cause new lines of enquiry to be opened up, either to check conflicting information or to gather additional data points.
The goal here is to identify the most probable cause, that best explains the evidence.
4. Confirm True Cause
We balance probabilities and choose the most likely. It is the scientific use of the imagination.
With a probable cause in mind, it needs to be verified; assumptions need to be checked, predictions confirmed, observations or experiments made, or simply a fix implemented to see if it succeeds! Note that this last “suck it and see” is not a blind leap in the dark but rather a considered response on the basis of the evidence!
At this stage, we may rule out the possible cause – this is a positive result! Additional data will likely be gained for the problem description and new “is and is not” information gleaned for repeating the steps.
5. Think Beyond the Fix
Education never ends Watson. It is a series of lessons with the greatest for the last.
Having established the cause, what else? If we document a workaround tailored for this problem, we now have a Known Error that can be used by service desks to reduce the impact to our users. We may also wish to implement other mitigation to reduce the severity or frequency of incidents arising from this problem.
Can it be prevented from recurring or eliminated permanently? A Request for Change should be raised.
Finally, perhaps this problem or a similar problem exists elsewhere (yet to be reported)? Proactive problem management can now seek this out on the basis of what we’ve learned.
“Elementary, my dear Watson”, as Holmes never said!