Why quantifying modeling risk matters
When testing the ideas that eventually led to River, I spoke with many analysts who build decision-support models in Excel. Almost everyone told me the same thing:
- They know there’s a significant risk of errors.
- They wish there were better tools.
- But they assume that with enough late nights debugging, they can reduce the risk to an acceptable level.
The question is: is that assumption actually true?
As a scientist, I believe the only way to answer that is to define what the risk is, and then find a reasonable way to quantify it.
What do we mean by modeling risk?
For decision-support models, risk means:
The probability that the model leads us to make the wrong decision
A model may produce wrong insights if:
- The input data is inaccurate.
- The conceptual logic is not a good representation of reality.
- The model implementation is incorrect.
- The outputs are misinterpreted (e.g., scenario analysis mistakes).
In this article, we focus only on #3: errors introduced during model construction, assuming the other steps are sound.
Sources of modeling risk
Errors tend to come from three closely related factors:
| Source | Description |
| Structure | The overall layout makes it difficult to verify logic. |
| Calculations | Mistakes in formulas or data handling. |
| Robustness | The model behaves incorrectly under new or edge-case inputs. |
In practice, these tend to move together:
A more complex structure → harder to understand formulas → harder to anticipate edge cases.
So instead of treating them separately, we can approximate:
The risk of error increases with the number of parameters the model exposes to users.
A parameter is anything the user can change: a cell reference, a hard-coded number, a lookup target, etc. Every one of these is a chance for things to go wrong.
How does this apply in Excel?
Excel models are built cell-by-cell. Every formula contains parameters that must be correct, and every reference is an opportunity for an error.
Example:
=VLOOKUP(A5,D5:F6,2,FALSE)
This formula has 8 parameters:
- The lookup value: 2 (one cell address = row + column reference)
- The table range: 4 (two cell addresses)
- The column index: 1
- The infamous match flag: 1
You can see how this adds up quickly once formulas are repeated across rows, time periods, and scenarios.
This is why even simple models can contain hundreds of parameters, and real-world models often contain tens or hundreds of thousands.
A simple example
A basic valuation model — historical P&L in, assumptions applied, future cashflows projected, then discounted — might contain around 500 parameters (download link if you're interested).
Let's assume the probability of a mistake in any single parameter after careful review is very low, say one in a million, which is explained as:
- One makes an error in a parameter 1% of the time
- Careful review catches all but 1% of these errors
- Quality and sanity checks only leave 1% of these.
So for this simple valuation model the probability of error is:
500 parameters * 1-in-1,000,000 chance of error = 0.05%
For a small model, this is manageable. Errors, if they occur, are usually spotted by normal sanity checks.
But what about a real model?
Most real-world models are not small. They scale roughly like this:
| Model Size | Typical use case | Typical scale |
| Small | Basic forecasting | A few thousand formulas |
| Medium | Full valuation models with scenarios and sensitivities | Tens of thousands of formulas |
| Large | System simulation / multi-scenario analysis | ~100,000 formulas |
| Very large | Full-sector simulations, complex policy models | ~1,000,000 formulas |
The size grows mainly because formulas must be repeated for every row, every month, or every scenario, and each repetition brings more places for errors to hide.
Staying below a safe risk threshold
Even if we assume a very low error rate per parameter, the probability of a mistake still grows quickly as models become larger.
Below we calculate the risk with our error rate of 1 in 1,000,000 parameters and 4 parameters per formula.
| Size | Typical # of formulas | # of parameters | Probability of error in the model | Risk |
Small | 3,000 | 12,000 | 1% | Manageable risk — errors can usually be caught before they cause damage. |
Medium | 25,000 | 100,000 | 10% | Meaningful risk of hidden errors. |
Large | 100,000 | 400,000 | 40%+ | Almost certainly contain errors somewhere, even if they are not obvious at first |
Very large | 1,000,000 | 4,000,000 | 400+ | Certainly contain many errors |
Even with very optimistic assumptions on review quality, the chance of hidden errors grows quickly with scale. Medium and large models almost always contain mistakes — they just often don’t matter immediately.
But they tend to surface when:
- New data behaves differently
- Scenarios push beyond typical bounds
- Someone new inherits the file
This is where organizations get unpleasant surprises.
If we consider a 1% error risk acceptable, that corresponds to roughly 3,000 formulas – about the size of a model with 3–4 reasonably sized worksheets.
Beyond that, risk increases sharply unless something fundamental about the modeling approach changes.
Can we reduce this risk in Excel?
Best practices help:
- Use clear, modular structure (the FAST standard)
- Use Named Ranges or Tables instead of cell references
- Implement data quality and sanity checks
But these techniques still require discipline, still depend on manual review, and do not prevent parameter proliferation.
To reduce modeling risk meaningfully, we need to reduce the number of exposed parameters.
How River changes this
The River version of our simple model has around 20 parameters, instead of 500.
This is possible because:
- References are by columns, not cells
- Formulas apply uniformly across rows (no pull-down errors)
- Relationships are visually explicit
- Data quality checks are automatic
- Many parameters simply don’t need to exist in the first place
Even better, as data size grows, parameter count in River stays almost constant, while in Excel it grows linearly.
The result is a 30x to 400x reduction in risk, depending on model size. And for very large models, the reduction can be over 1,000×.
This means that for all models commonly developed in business, River brings risk level back to well below 1%.
Conclusion
Modeling risk in Excel is real, significant, and generally underappreciated.
The number of parameters — and therefore the number of opportunities for error — tends to grow rapidly as models become more realistic.
If we care about making reliable, defensible decisions, we need modeling environments that:
- Make structure transparent
- Eliminate unnecessary parameters
- Reduce the burden on manual checking
River was designed to do exactly that. Even if you continue working in Excel, adopting these principles will improve reliability.