The Centaur, Not the Oracle: Why the Best AI Coach Still Needs a Human

There’s a story from chess that I keep coming back to as I build Afitpilot.

Everyone remembers 1997 — the year IBM’s Deep Blue beat Garry Kasparov, and the press declared that the machines had finally won. Nate Silver tells that story brilliantly in The Signal and the Noise, including the detail that the move which rattled Kasparov most, the one that convinced him the computer was operating on some higher plane, was almost certainly a software bug. The machine got confused, played a near-random move, and a world champion read genius into noise.

But the more interesting story comes after 1997, and it’s the one that actually matters for anyone building with AI today.

Weak human + good machine + good process

Once engines could clearly out-calculate humans, chess didn’t simply hand the crown to the computers. A new format appeared — “freestyle” or “advanced” chess — where humans were allowed to consult engines while they played. And the results were genuinely surprising.

The winners weren’t the grandmasters. They weren’t the strongest supercomputers running alone, either. The teams that kept winning were often amateurs paired with ordinary machines who had figured out something more valuable than either raw human skill or raw compute: a process.

Kasparov’s claim — the workflow outranks the talent

01 — Kasparov’s claim

The workflow outranks the talent

Three contestants: a grandmaster running an engine with a sloppy process, the strongest engine playing alone, and an amateur running an engine. Flip the amateur’s process and watch the finishing order rearrange.

The amateur’s process:

— Amateur + engine — a sharp process

— Strongest engine — alone

— Grandmaster + engine — a sloppy process

Illustrative — the point is the order Kasparov observed in freestyle chess, not the exact scores.

Kasparov summarised it as something close to a formula. A weak human player, plus a machine, plus a good process could beat a strong human player with a machine and an inferior process — and could beat the strongest engine running on its own. The edge wasn’t intelligence. It was the workflow connecting the two.

This is the “centaur” model, and it’s the idea Silver points toward in the book: the best outcomes don’t come from humans alone, or AI alone, but from the partnership. The machine calculates faster than you ever will. The human decides what actually matters.

What each side is actually good at

Once you see it, you can’t unsee it. Humans and machines aren’t competing for the same job — they’re good at almost completely different things.

The machine is an amplifier of probabilistic reasoning. It aggregates patterns, holds enormous context, stays consistent across thousands of decisions, never gets tired, never forgets. It is brilliant at “given everything I’ve seen, here is the likely answer.”

The human is good at the parts the machine has no real grip on. Framing the problem in the first place. Reading context. Noticing when something is weird — the anomaly, the outlier, the thing that doesn’t fit. Assigning meaning. Adapting the goal when the situation changes. And, crucially, sensing when the model is simply wrong.

The machine tells you what’s probable. The human knows when “probable” is beside the point.

The trap every AI startup is walking into

Here’s why this is on my mind. The dominant instinct in AI right now — and I feel the pull of it constantly — is to build the oracle. The fully autonomous system that knows everything. “The AI coach has all the answers. Just trust it.”

It’s a seductive pitch and, in coaching specifically, it’s wrong.

Coaching is a domain where the relevant information is split across three places that don’t talk to each other cleanly:

The athlete owns the subjective experience — how the session actually felt, the niggle that isn’t an injury yet, the bad week at work that’s quietly wrecking recovery.
The coach owns intuition and pattern recognition built over years — the read on someone’s mental state, the judgment call on when to push and when to back off.
The AI owns memory, scale, probability, and consistency — fatigue trends, load calculations, the ability to track hundreds of athletes continuously without dropping a thread.

An oracle pretends one of those three can absorb the other two. It can’t. The strongest system isn’t replacement — it’s division of labour:

AI handles pattern aggregation, fatigue and load trends, memory, consistency, and first-draft suggestions. The human handles interpretation, edge cases, psychology, trade-offs, and strategic judgment.

That’s centaur coaching. And it’s the architecture I’m actually building toward.

My own frustrations were trying to tell me this

The funny thing is that the case for the centaur model was hiding inside my own bug reports.

When I lean too hard on the LLM, it produces plans that are plausible but generic — competent on average, occasionally confidently wrong. Roughly one in five outputs needs a human correction. For a long time I treated that number as a problem to be engineered away: get it to zero, ship the oracle.

But flip it around. Humans alone produce excellent individual judgments and cannot scale that judgment across hundreds of athletes, week after week, without burning out. The machine alone scales beautifully and quietly makes mistakes no experienced coach would. Neither is the answer on its own.

Where each coaching approach breaks under scale

02 — Where each approach breaks

Judgment doesn’t scale. The machine does. The centaur holds.

A solo coach gives brilliant attention — until there are too many athletes to track. The AI alone scales flat but plateaus below an expert and quietly errs. Move the slider and watch the gap open.

Solo coach AI alone Centaur

Athletes 50

Solo coach

—

AI alone

—

Centaur

—

Illustrative curves. The AI line carries a quiet ~20% rate of plans needing correction — which is exactly what the centaur sends back to the coach.

The leverage isn't human judgment or machine consistency. It's human judgment times machine consistency. That correction rate isn't a failure of the system — handled well, it's the system. The whole point of the Timeline and the human-in-the-loop workflow is to make those corrections cheap, fast, and visible, so the coach's attention lands exactly where the machine is least trustworthy.

Silver's deeper point: think like a fox

There's a second thread in The Signal and the Noise that maps onto coaching almost too neatly. Silver's broader thesis is deeply Bayesian: good prediction is iterative, uncertainty is unavoidable, models have to keep updating as new evidence arrives, and overconfidence is the thing that quietly kills accuracy. The best forecasters, in his framing, are "foxes" who hold many small, revisable hypotheses — not "hedgehogs" with one big confident theory they defend to the death.

Fox vs hedgehog — the fixed plan vs the one that updates

03 — Fox vs hedgehog

The fixed plan vs the one that updates

A 12-week block. The amber line is what the athlete can actually absorb each week — the evidence. The static plan ignores it; the adaptive plan tracks it. Trigger a hard week and watch the overreach (red) pile up under the plan that won't bend.

What the athlete can absorb Prescribed load Overreach

Accumulated overreach

—

Verdict

—

Illustrative simulation. "Readiness" stands in for the feedback a coach reads — sRPE, sleep, a rough week — that should update the next prescription.

Adaptive coaching is fox thinking. An athlete's response to training is probabilistic, not deterministic. Every plan is really a hypothesis. Feedback — RPE, how the session went, whether the load landed — is evidence that should update the priors. A good system recalibrates continuously and is honest about what it doesn't yet know.

By that standard, the static 12-week PDF plan is pure hedgehog: one confident theory, written once, defended against all incoming evidence until the block ends. An adaptive system that admits uncertainty and updates is the fox. I know which one I'd want coaching me.

The bet

So this is the bet underneath the product. The winning systems of the next decade probably won't be the ones that replace people. They'll be the ones that make high-agency humans dramatically more effective — that take a good coach and give them the reach of a hundred.

Not the oracle. The centaur.

The machine calculates. The human decides what matters. Build the seam between them well, and that's the whole edge.

References

Kasparov, Garry. "The Chess Master and the Computer." The New York Review of Books, 11 February 2010. https://www.nybooks.com/articles/2010/02/11/the-chess-master-and-the-computer/ — Backs the freestyle-chess result and the exact formula. Kasparov's wording: weak human plus machine plus a better process beat a strong computer alone and, more surprisingly, a strong human plus machine running an inferior process. The winners weren't grandmasters with top hardware but a pair of amateur Americans running three computers, "coaching" them to look deep into positions. Theoreti Theoreti
Silver, Nate. The Signal and the Noise: Why So Many Predictions Fail — but Some Don't. Penguin Press, 2012. — Backs the Deep Blue / 1997 narrative, the Bayesian forecasting thesis, and the fox-vs-hedgehog framing as applied to prediction. Wikipedia
Tetlock, Philip E. Expert Political Judgment: How Good Is It? How Can We Know? Princeton University Press, 2005. — The underlying forecasting research. The fox/hedgehog image in Silver's book is borrowed from Tetlock, who in turn adapted it from Isaiah Berlin. If you want to credit where the evidence for foxes outperforming actually comes from, it's this, not Silver. Bookforum
Berlin, Isaiah. The Hedgehog and the Fox: An Essay on Tolstoy's View of History. 1953. — Origin of the metaphor itself. Berlin cribbed it from Tolstoy and ultimately from a line of Greek poetry (the poet Archilochus). Optional — only worth including if you want the full lineage.