Evening the Odds with Monte Carlo Forecasts

by Hunter Tammaro

If you've ever been in a leadership role on a project or product rollout, you've probably been asked for a forecast. People forecast for many reasons—for budgeting, for timing future projects, for balancing the load on team members with specialized skills, or for timing the rollout of marketing efforts, etc. Traditional forecasts that provide a single "done" date are surprisingly ill-suited for these decisions. They don't give you a sense of how sure confident you can be in the target date, and it's hard to come up with risk mitigation strategies if you don't know whether you're likely to miss your target by a day, a week, a month, or a year.

The Monte Carlo technique takes a different approach. Instead of providing a single outcome that we pretend is definitive, Monte Carlo presents a range of possibilities and their likely probabilities. It does this through repeated random sampling of historical data. For project forecasts, that means we won't get just one date—we'll know how likely it is that our project will finish on any date (within the range of possibilities).

This kind of result gives you a lot of power when planning a project, but it can make some people uncomfortable. Expressing uncertainty might seem to defeat the purpose of forecasting, but the uncertainty was always there—Monte Carlo simply brings it into the light and allows you to make informed decisions about it. Just as Agile embraces uncertainty in requirements and uses it to unlock the creativity of teams, Monte Carlo gives you a way to do the same for your forecasts.

To illustrate, I will step through how to forecast a completion date for a software team using Monte Carlo. Assume the team has a series of product backlog items (PBIs) they are working against.

Collecting Data

The first step with any forecasting technique is to gather data on your past performance. You want a measure that describes both the rate at which the team completes work and the variability in that rate. One such measure is takt time: the time from when a team a team completes one PBIs until it completes the next one. Often, this is measured in work days. If you think of the backlog as the team's to-do list, the takt times tell us how often items get crossed off that list. (If you're familiar with the term's usage in Lean - the average time between production starts required to meet demand - its usage here is roughly the individual times that make up that average.)

For instance, let's say a team completes one PBI on Monday and another on Tuesday: the takt time between these is one day. If it completes one item on Friday and the next on Monday, I still count that as one day. If it completes two items on the same day, the takt time between them is zero. It's fine for this to happen occasionally - though if your team often completes multiple items on the same day, you should consider another measure.

Here's an example timeline for a team completing six items, and the five takt times measured:

Takt time examples

From this, we've learned that our team completes PBIs anywhere from the same day as the last completed item to three days later, with the next day being the most likely.

Running a Simulation

Now we want to know what the future could hold for this team. To do that, for each of the remaining PBIs in the backlog, we randomly select one of the takt times we've just measured. (It's okay if some of our takt times are selected more than once - it's even okay if some aren't selected at all. What matters is that we choose them randomly.) Then we simply add up the selected takt times to determine the total remaining time. This is one simulation of the team's future development pace.

For instance, if our team above has six more PBIs to complete, that could look like this:

Example Monte Carlo simulation (1 iteration)

In this case, we ended up with a total of 10 days. That is, if the team continues to complete PBIs at the same rate it has in the past, it could take 10 days to finish all six left in its backlog. This is just one possible future for the team, and it's where traditional forecasts stop. But we want to go further.

Running Lots of Simulations

We want to know about every possible future for our team, so we repeat the simulation over and over, saving the outcome each time. Each simulation randomly selects a different set of takt times and gives us a different result. Continuing our example, here's another four simulations we might run:

Example Monte Carlo simulation (4 iterations)

Each of these represents another way the takt times we measured could be combined for our next six PBIs, and therefore each is a possible future for our team. We would continue to run more simulations until we're confident we've explored the whole space of possibilities.

How many simulations does that take? Unfortunately, there's no easy answer. It depends on the size of the backlog you're simulating, the range of the takt times you've measured and the precision you need in your result. But because computers typically handle the simulating for you, I advise people to aim high. I usually start with around 100 simulations and add more if the results seem "noisy."

Now that we know what's possible, all that remains is to determine which possibilities are the most likely. We do this by checking how often each outcome occurs in our simulations. The more we see a particular result among our simulations, the more likely that result is in reality. If we continue to run 100 simulations for our example case, we might find this:

Results of Monte Carlo simulation (100 iterations)

Here we see that, though it's possible our team could finish its backlog in four days or less, it's not likely - only four of our 100 simulations had that result. The most likely outcome is actually seven to eight days because 27 out of the 100 simulations were in that range. Of course, for project forecasts, we care more about not missing the forecast than we do about meeting it precisely, so I also measure the cumulative percentage of simulations that complete in that time or sooner. We can tell, for instance, that even though seven to eight days is the single most likely timeframe for our team to complete its backlog, it only has a 55% chance of finishing by then. If I were planning a release or reporting to a client, I'd be more likely to communicate the 11-12 day window, where there's a 94% chance the team will be done.

Other Approaches

Predicting a date using takt times isn't the only way to apply Monte Carlo principles. The specifics of the forecast can be tweaked to fit any context. For example, what if your team is very large and regularly completes multiple PBIs on the same day? Takt time would almost always be zero in this case, making it a poor measure of how the team completes work. Here we could build a forecast around throughput - the number of items the team completes each day. After measuring throughputs for a while, we'd simulate by randomly selecting throughputs until their sum was enough to cover our projected backlog, and then summarize our results as before.

Or imagine a team trying to forecast how many PBIs they expect to complete before a hard deadline. Rather than defining our simulations by the size of the backlog, simulations could keep picking takt times until they hits the set date, and count up how many simulated PBIs were able to fit. Then our results would tell us how likely each backlog size was to be completed in time.

The Monte Carlo method is endlessly flexible, and gives us a much more insightful forecast than traditional methods. I hope you'll give it a try!