Agents Can’t Replace Monte Carlo. They Can Make It Honest.
Why the AI-forecasting crowd is selling a downgrade, and what the interesting configuration actually looks like.
Why agentic AI fails as a replacement for Monte Carlo simulation in agile forecasting
How confident-sounding prose hides the calibration problem LLMs are structurally bad at
The hybrid configuration where agents complement probabilistic forecasting instead of replacing it
A staunch MCS advocate’s changed position after months of agentic workflow experimentation
Monte Carlo gives you a distribution. An agent gives you prose that feels like a distribution.
Sit with that for a second, because it’s the whole argument. Every vendor currently pitching AI-powered delivery forecasting is, whether they know it or not, selling you the second thing and calling it the first.
And if you’re a producer who’s ever had to defend a ship date to a publisher with real money on the line, you should find that terrifying.
I’ve earned the right to say this the hard way. I’ve spent years on the other side of the argument, defending probabilistic forecasting to anyone who’d listen and several people who wouldn’t.
I’ve written about Actionable Agile, about throughput histograms, about the Troy Magennis school of Monte Carlo simulation, about 85th percentiles and the difference between a forecast and a wish. I’ve pushed back on producers who wanted to replace MCS with gut-feel spreadsheets, and I’ve pushed back harder on executives who wanted to replace it with a single-point estimate because distributions made them uncomfortable.
MCS is one of the few things in agile forecasting that actually works, and I’ve been loud about that for a long time.
So this isn’t a hit piece from someone who never understood what Monte Carlo was doing. It’s the opposite.
The reason I’m writing it is that for the past several months I’ve been elbow-deep in agentic workflows in production contexts, and the experience has started doing something uncomfortable to how I think about forecasting. Not because agents turn out to be secretly better at the maths. They’re worse.
The discomfort is more interesting than that, and it took me a while to see it clearly enough to put into words.
Here’s the thing that should make any serious forecasting practitioner nervous. An agent that tells you “about nine sprints” in the same confident tone whether it reasoned carefully about your throughput history or pattern-matched on the vibes of the ticket titles is not a forecasting tool. It’s a liability pretending to be one.
And this is not my crank opinion. OpenAI’s September 2025 paper on hallucination showed that next-token training objectives and the benchmark leaderboards that shape them actively reward confident guessing over calibrated uncertainty. Models learn to bluff because bluffing scores points.
The literature now has a term of art for the specific failure mode: high-confidence hallucinations. Outputs where the model’s probability distribution is sharply peaked around the wrong answer, immune to the entropy-based detection methods that catch the more obvious kinds of wrongness. The model is certain, fluent, and incorrect.
Put that in front of a publisher and call it a forecast. I’ll wait.
Monte Carlo’s output is a distribution. An LLM’s output is prose that feels like one. That’s a downgrade dressed up as an upgrade.
This would be an academic curiosity if the market weren’t already moving. Digital.ai’s 2025 State of Agile report has AI adoption in agile workflows at 84%, up from 68% the year before, with early adopters experimenting with agentic systems that act autonomously. The same report notes that only 49% of organisations have governance guardrails in place.
Adoption is outpacing oversight by nearly a factor of two, and the vendor ecosystem is happily filling the gap. Planisware markets Oscar for “forecast generation, analysis, and natural-language insights”. Wrike sells predictive risk scoring that “dynamically updates project risk scores” and “proposes fixes”. Forecast, the PSA platform, is publicly building out MCP and “agentic AI” capability for delivery prediction.
I’m not picking on any of these companies individually. I’m pointing at the category. The category is being built in public, sold into enterprises, and pitched at producers, while the calibration problem underneath it remains structurally unsolved.
Now here’s the part where I stop sharpening the knife and say the thing that actually matters, because a pure takedown would be cheap and also wrong.
The hallucinated-confidence problem disqualifies agents from one specific job: outputting numbers that anyone should bet on. It does not disqualify them from everything.
And the work I’ve been doing with agents in production has convinced me that there’s a different job they can do that Monte Carlo structurally cannot, and that’s where the real conversation should be.
Monte Carlo is a dumb, honest machine. That’s its virtue. You feed it historical throughput, it resamples, it tells you what the numbers say.
It does not know that your tech lead just quit. It does not know that the three tickets you’re about to pull all touch a save-game serialisation system nobody on the team fully understands. It does not know that the external API your milestone depends on went through a breaking change last Thursday that no one’s logged yet.
MCS treats work items as interchangeable draws from an urn, and that assumption is load-bearing for the whole statistical apparatus. When reality diverges from the assumption, MCS can’t tell you. It wasn’t built to.
Good producers already adjust for these signals in their heads. Every experienced producer I know has a mental model that looks something like “the burndown says we’re fine but I’m overriding it to amber because Dave is on leave in sprint four and the combat module has a 2x historical cycle-time penalty.”
That override is the qualitative layer sitting on top of statistical forecasting, and it’s always been invisible, unauditable, and lost the moment the producer leaves the room.
An agent, pointed at the right job, can make that layer legible for the first time.
The agent’s job is to tell you why this week’s distribution is lying to you. Leave the ship date to Monte Carlo.
What that looks like in practice is less sexy than the vendors want to admit, which is part of why they keep trying to sell the other thing.
An agent sits alongside your Monte Carlo setup. MCS stays exactly where it is, untouched, still the thing you show the exec, still the probabilistic backbone. The agent’s job is to read the tickets, the commit history, the Slack threads, the dependency graph, the actual content of the work, and surface the qualitative signals that should make you distrust this week’s distribution.
Not generate the forecast. Flag when to mistrust it.
“Your 85th percentile says nine sprints, but three of your next ten tickets touch the save system your senior engineer flagged as fragile in October, and she’s on leave in sprint four.” That’s a sentence a statistical model cannot produce and a good producer already thinks. The agent’s contribution is making it explicit, auditable, and repeatable across a team that doesn’t have a senior producer sitting in every standup.
The reason this configuration works is subtle, and it’s the one thing I want anyone reading this to take away. The agent’s core weakness, confident-sounding prose with uncertain grounding, stops being a bug in this setup, because the agent is no longer being asked to output a number.
The prose is the point. The qualitative signal is the deliverable. Hedging, uncertainty, “this might be worth looking at” language, all of it becomes appropriate rather than disqualifying.
You’ve moved the agent from a job it’s structurally bad at, quantitative forecasting with calibrated uncertainty, to a job it’s good at, surfacing contextual signals from unstructured data. And Monte Carlo keeps doing the thing only Monte Carlo can do, which is give you an honest distribution you can defend in a grown-up conversation about risk.
Monte Carlo gives you a distribution. An agent gives you prose that feels like one.
Replace MCS with an agent and you lose the thing that made probabilistic forecasting trustworthy to non-technical stakeholders in the first place. Augment MCS with an agent pointed at the signals MCS was never designed to see, and you get something neither tool could deliver alone.
The vendors selling you the first version are selling you a downgrade. The configuration that actually works is the one where neither tool is pretending to be the other.




