Quantifying the spreadsheet modeling risk

A scientific approach to address a hidden problem in all organizations

Why quantifying modeling risk matters

When testing the ideas that eventually led to River, I spoke with many analysts who build decision-support models in Excel. Almost everyone told me the same thing:

They know there’s a significant risk of errors.
They wish there were better tools.
But they assume that with enough late nights debugging, they can reduce the risk to an acceptable level.

The question is: is that assumption actually true?

As a scientist, I believe the only way to answer that is to define what the risk is, and then find a reasonable way to quantify it.

What do we mean by modeling risk?

For decision-support models, risk means:

The probability that the model leads us to make the wrong decision

A model may produce wrong insights if:

The input data is inaccurate.
The conceptual logic is not a good representation of reality.
The model implementation is incorrect.
The outputs are misinterpreted (e.g., scenario analysis mistakes).

In this article, we focus only on #3: errors introduced during model construction, assuming the other steps are sound.

Sources of modeling risk

Errors tend to come from three closely related factors:

Source	Description
Structure	The overall layout makes it difficult to verify logic.
Calculations	Mistakes in formulas or data handling.
Robustness	The model behaves incorrectly under new or edge-case inputs.

In practice, these tend to move together:

A more complex structure → harder to understand formulas → harder to anticipate edge cases.

So instead of treating them separately, we can approximate:

The risk of error increases with the number of parameters the model exposes to users.

A parameter is anything the user can change: a cell reference, a hard-coded number, a lookup target, etc. Every one of these is a chance for things to go wrong.

How does this apply in Excel?

Excel models are built cell-by-cell. Every formula contains parameters that must be correct, and every reference is an opportunity for an error.

Example:

=VLOOKUP(A5,D5:F6,2,FALSE)

This formula has 8 parameters:

The lookup value: 2 (one cell address = row + column reference)
The table range: 4 (two cell addresses)
The column index: 1
The infamous match flag: 1

You can see how this adds up quickly once formulas are repeated across rows, time periods, and scenarios.

This is why even simple models can contain hundreds of parameters, and real-world models often contain tens or hundreds of thousands.

A simple example

A basic valuation model — historical P&L in, assumptions applied, future cashflows projected, then discounted — might contain around 500 parameters (download link if you're interested).

Let's assume the probability of a mistake in any single parameter after careful review is very low, say one in a million, which is explained as:

One makes an error in a parameter 1% of the time
Careful review catches all but 1% of these errors
Quality and sanity checks only leave 1% of these.

So for this simple valuation model the probability of error is:

500 parameters * 1-in-1,000,000 chance of error = 0.05%

For a small model, this is manageable. Errors, if they occur, are usually spotted by normal sanity checks.

But what about a real model?

Most real-world models are not small. They scale roughly like this:

Model Size	Typical use case	Typical scale
Small	Basic forecasting	A few thousand formulas
Medium	Full valuation models with scenarios and sensitivities	Tens of thousands of formulas
Large	System simulation / multi-scenario analysis	~100,000 formulas
Very large	Full-sector simulations, complex policy models	~1,000,000 formulas

The size grows mainly because formulas must be repeated for every row, every month, or every scenario, and each repetition brings more places for errors to hide.

Staying below a safe risk threshold

Even if we assume a very low error rate per parameter, the probability of a mistake still grows quickly as models become larger.

Below we calculate the risk with our error rate of 1 in 1,000,000 parameters and 4 parameters per formula.

Size	Typical # of formulas	# of parameters	Probability of error in the model	Risk
Small	3,000	12,000	1%	Manageable risk — errors can usually be caught before they cause damage.
Medium	25,000	100,000	10%	Meaningful risk of hidden errors.
Large	100,000	400,000	40%+	Almost certainly contain errors somewhere, even if they are not obvious at first
Very large	1,000,000	4,000,000	400+	Certainly contain many errors

Even with very optimistic assumptions on review quality, the chance of hidden errors grows quickly with scale. Medium and large models almost always contain mistakes — they just often don’t matter immediately.

But they tend to surface when:

New data behaves differently
Scenarios push beyond typical bounds
Someone new inherits the file

This is where organizations get unpleasant surprises.

If we consider a 1% error risk acceptable, that corresponds to roughly 3,000 formulas – about the size of a model with 3–4 reasonably sized worksheets.

Beyond that, risk increases sharply unless something fundamental about the modeling approach changes.

Can we reduce this risk in Excel?

Best practices help:

Use clear, modular structure (the FAST standard)
Use Named Ranges or Tables instead of cell references
Implement data quality and sanity checks

But these techniques still require discipline, still depend on manual review, and do not prevent parameter proliferation.

To reduce modeling risk meaningfully, we need to reduce the number of exposed parameters.

How River changes this

The River version of our simple model has around 20 parameters, instead of 500.

This is possible because:

References are by columns, not cells
Formulas apply uniformly across rows (no pull-down errors)
Relationships are visually explicit
Data quality checks are automatic
Many parameters simply don’t need to exist in the first place

Even better, as data size grows, parameter count in River stays almost constant, while in Excel it grows linearly.

The result is a 30x to 400x reduction in risk, depending on model size. And for very large models, the reduction can be over 1,000×.

This means that for all models commonly developed in business, River brings risk level back to well below 1%.

Conclusion

Modeling risk in Excel is real, significant, and generally underappreciated.

The number of parameters — and therefore the number of opportunities for error — tends to grow rapidly as models become more realistic.

If we care about making reliable, defensible decisions, we need modeling environments that:

Make structure transparent
Eliminate unnecessary parameters
Reduce the burden on manual checking

River was designed to do exactly that. Even if you continue working in Excel, adopting these principles will improve reliability.

The 4 critical mistakes in Excel VLOOKUP formulas

And what you can do about it.