The calibration tax: Why data teams can't just "move faster"
For all that AI speeds up, it doesn't solve risk calibration
“Why can’t we just move faster?” Heard that before? Take a short trip with me.
Scene 1: A data scientist on an AI product team has an idea for a model change that could improve a customer-facing decision. The change is small. They think it’s safe. They also think it might require review, because everything must be tested. That’s the norm. Two weeks later, the change ships. It was always going to be approved. “Couldn’t we have just done this faster?”
Scene 2: A different team moves quickly on a model retrain that subtly shifts a customer-facing behavior. No one stopped them, because no one was sure they needed to, because “speed is what matters”. A month later, it becomes clear something should have been escalated. The retro lasts a week. Everyone agrees that the process failed and that experiments are necessary to deal with risk.
Here’s the surprise: this was the same team and the same data scientist, months apart. How? The team didn’t have a clear answer to “where is the line?” One paid the velocity tax up front. The other paid it later, in trust. Both were perceived as a miss, but it had nothing to do with the team itself.
“Why can’t we move faster?” It isn’t for lack of desire. What looks like a speed problem in most AI data teams is actually something else underneath: a risk-calibration problem. The risk model the company runs on is implicit, inconsistent, and unevenly distributed across functions. Velocity dies in the gap between what people are technically allowed to do and what they think they’re allowed to do.
The invisible “risk line”
We talk about risk in AI data products like it is an external exposure the company manages: regulators, customers, model failure modes, drift, and fairness violations. That framing is right but partial. The bigger cost in most companies with which I’m faimilar is internal. People do not know where the line is, so they treat every line as if it could be the live one.
The standard answer is to build a risk framework. Most companies have one. The framework lives inside a risk or compliance function, but that is not enough. It a) rarely reaches the people doing the work and b) if it does, it doesn’t do so in the same vocabulary they use for their work. The data scientist above was not blocked by the framework. They were blocked because they couldn't read it from where they were standing.
That distinction matters more than it sounds. The fix to risk-as-exposure is the framework. The fix to risk-as-friction is shared calibration: a map that says, with usable specificity, what the team can ship without escalation, what needs a second pair of eyes, and what genuinely requires the executive layer to weigh in. Without that map, your data team is paying a tax every week and being thrashed between being too slow and taking too much risk.
The cost of not defining the line
What does the cost look like? Here are four costs, in roughly increasing order of how hard they are to see.
Over-escalation. The default-conservative tax. When in doubt, ask. The ask is cheap for the individual making it, and expensive for the company in aggregate. Mature data orgs accumulate enormous amounts of this without noticing.
Shadow routes. When the official path feels too slow, people find workarounds. They route through informal channels, ship under different banners, or quietly batch changes so any review feels lighter. The work gets done. The institutional learning that would have come from doing it in the open is lost.
Hedged product decisions. No single call is wrong, but the cumulative drift is large. Models get more conservative than they need to be. Thresholds creep. Features that would have been bold get filed down. You do not see the tax in any one decision. You see it in the trajectory.
Velocity loss with no signature. This is the worst kind, because no one can attribute it to a specific decision. The team feels slow. Reviews go in circles. People say things like “we should be moving faster” without a target for the energy. This is the calibration tax in its purest form. You cannot manage what you have not measured, and you cannot measure what you have not calibrated.
What “explicit” risk definition actually looks like
Here’s a framework that most teams really need to “move faster,” but with the right calibration for risk that is aligned among ICs, managers, and the executive team. In practice, they need examples from their domains that fall into each of these categories so the framework becomes “real” for them.
1. Routine. The team ships without escalation. The risk is well-understood and well-bounded.
2. Reviewed. A second pair of eyes inside the function is sufficient. The risk is understood, but the decision benefits from another perspective.
3. Escalated. The executive layer weighs in. The risk is consequential, or the precedent is.
4. Unknown. We know we do not know. This band exists, it gets named, and the response is a structured investigation rather than vague caution.
The map gets revisited on a real cadence. Risk that was unknown last quarter and known now moves down. The risk that has become more consequential as the business has grown moves up. It is a living artifact, not a quarterly compliance deliverable.
Risk tolerance must translate
“Who owns risk?” In many companies, this isn’t clear. Some might tell you it is legal or a risk. Some might tell you it is whatever the exec team decides. In all of them, the risk isn’t codified in a way that enables data, product, and engineering teams to make tradeoffs on the margin.
Every IC, every manager, every executive, and every board member should be able to describe, in their own words, what we know, what we do not, and where we are deliberately taking risk. The vocabulary has to be the same. The bands have to be the same. The cadence has to be the same. When that lands, three things happen.
The data scientist in the opening scene knows their change is routine and ships it.
The data scientist running the retrain knows the model behavior shift sits in the reviewed band, and pulls in a peer before deploying.
The executive layer spends its time on the genuinely uncertain band, where its judgment is actually load-bearing.
In this world, the data team and the risk function and the product layer and the executive layer all share a calibrated model of what the company knows. That shared model is the missing operating layer.
Why this enables velocity and where to start
This might sound like more process, and more process slows things down. My experience runs the other way. Companies with calibrated, shared risk maps move faster on routine work and more deliberately on consequential work. The 80% of the decisions that should have been routine actually are. The 20% that should have been deliberate is visible, rather than buried under the same conservative posture as everything else.
So, where to start? Write the map down. Treat it as a living document with a real owner and a real cadence. Run risk reviews in the same room as product reviews, not in a parallel track. The same vocabulary, the same people, the same artifacts.
Teach the vocabulary widely enough that any IC can describe their own work in it. The test is whether you can walk into a team and have three people tell you the same story about where they sit on the map.
In retros, flag the absence of clarity rather than the presence of risk. The question “Did we know this was in the reviewed band before we shipped?” is more useful than “Should we have been more careful?”
Move things down the map deliberately as understanding grows. Make that move visible, so the org sees that the calibration is real and that being more cautious last quarter does not bind us to being equally cautious this quarter.
The velocity gap in most AI data orgs is not a permission problem at the people level. It is a clarity problem at the leadership level. Once we take that seriously, we can “move faster” in the ways that make sense for the company.

Though I come from an academic background, I can imagine a situation in which a team faces a strict deadline and perhaps over-cleans data just to ensure their model passes backtesting. One problem might be that this introduces a kind of backtesting pass-structural risk tradeoff. The over-cleaning might introduce a sharp likelihood surface that’s not robust enough when that data are shuffled by resampling. I’ve seen this with pre-processing in GJR-GARCH. Ironically, even if this were quickly passed, the breakage of the model could slow or stop progress during a regime shift down the road.