Burning down risk - risk management in Scrum PDF Print E-mail
Written by Ed Willis   
Thursday, 08 March 2012 00:00

How can we improve our management of risk in Scrum?  How can we do so without slowing the pace of our Scrum project with excess overhead?

Risk assessment - you're doing it wrong!

A friend once said to me that Scrum projects don’t really manage risk but rather wait until risks become issues before dealing with them.  In a sense, I agree with him – the iterative nature of Scrum development collapses the waveform of risk probability, thus either removing risks entirely or turning them into issues.  In that sense, Scrum already addresses risk to a degree.  All these Scrum practices all work to reduce risk on the project:

  • incremental and iterative development
  • demos
  • retrospectives

Mike Cohn in User Stories Applied offers this advice in user story prioritization:

But, even when going after the juicy bits first, we still need to consider risk when prioritizing stories.  Many developers have a tendency to want to do the riskiest stories first.  Sometimes this is appropriate but the decision must still be made by the customer.

One important thing to realize is that Scrum’s ability to bring risky features forward in time, and develop them to a shippable state well in advance of overall project completion, offers risk mitigation possibilities that represent a sea change over those provided by waterfall lifecycles.

So this article is not aiming to perfectly manage risk but rather to improve on what Scrum already provides, while simultaneously ensuring any new practices can fit into the process in a natural manner that does not significantly increase overhead.

A sampling of existing techniques

Risk management has been tackled more than a couple of times in Scrum thought.  These few examples seem fairly representative of current approaches.

From the Agile Body of Knowledge site comes this approach:

Risk burndown graph

This is a straight-forward adaptation of the older technique of measuring risk on two axes (impact and probability) using arbitrary units and then multiplying the two values together to arrive at a single quantity to use to compare risks against one another.   Once each risk is quantified, summing these values yields an overall risk exposure.  Measuring risk exposure per sprint (or burning down risk) is a natural use of this measure in a Scrum content.

This approach is useful for these reasons:

  • It uses the burndown chart the team and stakeholders are already familiar with.
  • It builds on the iterative nature of the Scrum lifecycle.

But it falls a little short in these areas:

  • Total risk exposure has no meaningful units associated with it.  Other than big numbers being bad, it’s hard to interpret the value.
  • It’s poorly integrated into the rest of our Scrum measures – for example, if the release burndown looks great but risk exposure is off the charts, where does the project stand overall?

That first problem is neatly addressed by choosing an intuitive unit of risk – for example, weeks of schedule or effort over-run is a unit of risk impact that most projects would find meaningful.  The first place I came across a technique like this was in Steve McConnell’s Rapid Development.  In it, he lays out an approach where risks are enumerated and then assessed for probability of occurrence and impact to schedule.  Multiplying those estimates and then summing across the set of risks yields a total risk exposure in weeks of schedule over-run.

Risk ProbabilityImpact (weeks)Risk exposure (weeks)
Productivity loss to the team due to support requests may be higher than expected. 66% 10 6.6
The reporting server integration feature (and required web services API) is currently out of scope – if we add it back in, it will be very expensive to build and will exert continuing costs on other features from that point forward. 20% 10 2
Dramatically new UI presentation does not satisfy beta users leading to rework. 33% 15 5
Attaining interface compatibility with our partner’s product takes longer to achieve than planned which increases the effort required for features X, Y and Z. 50% 4 2
Cutting edge library features don’t work as expected, requiring work-arounds to be developed which increases the effort required for features U and V. 33% 6 2
Feature T’s estimate is based on the assumption that we can implement it entirely in python.  If that proves impossible given performance needs, we will have to implement parts of it in native extensions, which will be more expensive. 80% 6 4.8
Total risk exposure 22.4

Compared to the earlier technique, this approach does a better job of concretizing the potential impact of the risks – so the team and stakeholders get a much better sense of how bad the current risk exposure is.  But it doesn’t reflect the iterative nature of the Scrum lifecycle.

Mike Cohn presents an alternative that essentially merges the two approaches above. He quantifies risk exposure as is shown above but tracks it over time in a burndown chart.

Risk Burndown

This is much better but still suffers from being a separate view into the release from the release burndown, where it’s more likely to be ignored.  In addition, the separate presentation makes it harder to get an overall feel for project status when the release and risk burndown charts are saying very different things.

A modest improvement

Building on Mike’s approach, I propose the following:

  • As the team grooms the product backlog, they add items to it that represent risks and call out specific product backlog items that are risky enough to warrant risk assessment.
  • For each risk or risky product backlog item, the team assesses:
    • the probability that the risk will be realized using a small subset of probability values (e.g. 10%, 20%, 33%, 50%, 66%, 80%, 90%), and
    • the impact of the risk using story points exactly as they use them for sizing stories.
  • Total risk is measured in total story points of risk.
  • Remaining risk and product backlog items versus the plan are tracked in the same release burndown chart.

Building risk assessment into the Scrum process

Risk assessment fits most naturally into the release planning and product backlog grooming sessions the team already holds.

During these sessions, keep an eye out for these kinds of problems:

  • The team is really struggling to come up with a size estimate on a product backlog item – they keep asking different variants of the question “what if” during the discussion.
  • The overall release plan is still viewed as an optimistic projection even though the team has been careful to base it on their actual performance.

The former suggests that the risk of the product backlog item be assessed independent of its size.  The team can assume a given approach will work when sizing the item but then address the possibility of that approach not being acceptable in a separate risk assessment.  For example regarding this risk described above:

"Cutting edge library features don’t work as expected, requiring work-arounds to be developed which increase the effort required for features U and V.”

We could size stories U and V assuming this risk is not realized and then separately size the impact and probability of the risk.

Regarding the second problem – skepticism about the release plan – explore the team’s reservations and try to capture them as overall project risks.  These risks aren’t associated with any particular story but can still be assessed for impact and probability as described earlier.

Sizing risk is only a little harder than is sizing features.  Some useful questions to ask to help move the discussion along include:

  • What would you like to have in your back pocket to deal with that issue if it actually occurred?
  • If developing the feature from scratch cost X, how bad would redoing it be?

One important thing to keep in the back of your mind is to avoid what Alistair Cockburn has termed “complete-ism” – only assess risk in particularly meaningful cases.  Certainly, product backlog items with risk assessments associated with them should be in the minority.

Lastly, much as story point estimation cuts down on debate by restricting estimates to a specific set of values (e.g. the Fibonacci sequence or the powers of two), we would be wise to reduce the risk probability values to a smaller set – for example 10%, 20%, 33%, 50%, 66%, 80% and 90%.

Returning to the earlier example risks, we might end up with the following entries in our product backlog:

Product backlog itemSize (SP)ProbabilityImpact (SP)Risk Exposure (SP)
Productivity loss to the team due to support requests may be higher than expected. 66% 8 5.28
The reporting server integration feature (and required web services API) is currently out of scope – if we add it back in, it will be very expensive to build and will exert continuing costs on other features from that point forward. 20% 8 1.6
Dramatically new UI presentation does not satisfy beta users leading to rework. 33% 16 5.33
Feature X 2
Feature Y 8
Feature Z 4
Attaining interface compatibility with our partner’s product takes longer to achieve than planned which increases the effort required for features X, Y and Z. 50% 4 2
Feature U 8
Feature V 4
Cutting edge library features don’t work as expected, requiring work-arounds to be developed which increases the effort required for features U and V. 33% 8 2.67
Feature T 8
Feature T’s estimate is based on the assumption that we can implement it entirely in python.  If that proves impossible given performance needs, we will have to implement parts of it in native extensions, which will be more expensive. 80% 4 3.2
Totals 34 20

A worked example

Note the below charts were generated by an MS Excel product backlog template I’ve developed which can be obtained here.

Let’s close the discussion with an example that shows how building risk assessment into the Scrum process can help the team monitor risk and adjust its plans accordingly.

The team and the Product Owner meet to develop and groom the initial product backlog.  At the beginning of the first sprint, the release burndown looks like so:

Initial release and risk burndown

Total remaining risk and product backlog (both measured in story points) are baselined using totals taken from this initial session.  The planned burndown uniformly burns down both through the planned duration of the project.

For the first sprint, the team delivers some risky features – both the remaining risk and the remaining story points are reduced as a result.

Sprint 1

Note that the SP Delivered and Risk Resolved points on the chart show the outcome of the first sprint – so the team delivered a little over 20 story points of product backlog and resolved 10 story points of risk.

In sprint 2, the team delivered only features and did not address any product backlog risk.  In addition, new risk was added to the product backlog.  Total product backlog delivered jumped to around 30 or so story points, but remaining risk increased.  Overall, though, the team is ahead of plan.

Sprint 2

In sprint 3, subsequent product backlog grooming has uncovered additional features, revised some risk assessments upwards and uncovered new risks.  Delivered product backlog fell to 22 story points and, as was the case in the previous sprint, no product backlog risk was resolved.

Sprint 3

At this point the Product Owner and the team deem the risk exposure excessive and plan their fourth sprint accordingly.  They plan spikes into risky features to better assess what the real risk is, broaden their testing to leave less chance of defects escaping into later sprints and undertake other measures intended to mitigate, remove or more precisely assess product backlog risk.  Total story points of product backlog delivered in the sprint fall to less than 20 – their lowest level thus far – but resolved risk jumps to nearly 40.  As a result, both overall remaining product backlog and remaining risk are in line with the plan.

Sprint 4

I'd like to thank Selaine Henriksen for her help in editing this article.

 

Last Updated on Thursday, 15 March 2012 14:46