Open Source Consulting for the Cognitive Revolution

April 2, 2026

AI Mistakes vs Human Mistakes: Why AI Can Fix Errors Forever

We are more willing to tolerate repeated human error than a single visible machine failure. That instinct made sense in the industrial age. It becomes much less useful in the age of AI.

The Mistake We Fear More Than Our Own

There is a line of reasoning Neil deGrasse Tyson used in a discussion about self-driving cars that has stayed with me because it exposes something deeply irrational about the way we evaluate technological risk. His point, paraphrased cleanly, was that people are often more afraid of the rare mistake an autonomous system might make than of the endless stream of mistakes human beings already make every day. The difference, however, is that a human error tends to remain part of the statistical background of life, while a software error can be identified, corrected, and, in the best case, eliminated permanently.

That observation matters far beyond autonomous driving. It reaches into the center of how organizations are currently struggling with AI. We still react to machine mistakes as if they were moral failures, while we react to human mistakes as if they were unfortunate but natural. We tolerate recurrence when it comes from people. We demand near-perfection when it comes from systems. That asymmetry is emotionally understandable, but strategically it has become a terrible habit.

It is also one of the main reasons AI adoption still feels so messy in practice. Companies do not only have to decide whether a system is capable. They also have to decide what kind of failure they are willing to live with. The old world was built around recurring, distributed, human imperfection. The new world increasingly offers something different: centralized, observable, and potentially fixable error.

That changes the whole conversation.

AI Mistakes vs Human Mistakes: The Difference That Changes Everything

The easiest way to understand this shift is to step away from AI for a moment and go back to something much more familiar. Before spreadsheets became normal, long tables of numbers had to be calculated manually. If someone skipped a line, copied a value into the wrong field, mistyped a number, or forgot a step in the logic, the result was wrong. The mistake could be detected and corrected, but the real limitation was that nothing prevented the same person, or another person, from making exactly the same mistake again the next day.

Excel changed that logic completely. Once the formula was right, the system stopped reproducing the original error. If there was a problem, it usually lived in the formula itself. And once the formula was fixed, that particular mistake was retired. Not reduced. Not made less likely. Retired.

This idea connects directly to earlier thoughs on the Trust Tax in Agentic AI, where the real challenge is not capability, but the willingness to rely on outcomes.

That distinction is more profound than it sounds. Human work is full of recurring error because human behavior is recurring error. People get tired. They rush. They misread things. They forget steps. They improvise badly. They overestimate their attention. They repeat habits that once worked and no longer do. Most organizations have built their control systems around this fact. Review structures, approval processes, escalation paths, four-eyes principles, and layers of management all exist because recurring human imperfection is treated as a given.

Software changed some of that already. AI has the potential to change much more of it. But only if we understand that the real shift is not merely one of speed or capability. It is a shift in the nature of mistakes themselves.

Then Generative AI Made It Complicated Again

This is where things become much less comfortable.

In deterministic systems, the phrase “fix once, fix forever” feels intuitive. In probabilistic systems, it suddenly becomes much harder to say with confidence. I am connecting more and more MCPs with ChatGPT to streamline parts of my own workflow. In theory, this should be a very clean progression. If a task is clear and the tools are connected, the assistant should increasingly behave like a useful extension of my process. In practice, however, the experience is still highly uneven.

I could blindly tell it to solve tasks for me, but the results are still too hit-and-miss for that. Drafting a presentation still often leads to robotic language. I find myself reapplying the same guardrails, restating the same expectations, and correcting the same tonal drift. That does not mean the system is useless. Far from it. It means the promise of “fix it once, fix it forever” feels much more distant in generative workflows than it did in classical software.

A recent example made that painfully obvious. I asked ChatGPT to update a table in Coda.io by comparing it to a CSV and using the CSV as the source of truth. Instead of cleanly reconciling the data, it mixed the file I had provided with topics from earlier conversations and started creating complete line items on its own. Some of those entries had no basis in the current task at all. The result was not just wrong. It was wrong with confidence.

That matters, because visible confidence changes the emotional texture of failure. A mistyped number in Excel is annoying. A system inventing rows in a table while behaving as if it is helping you feels like a breach of trust. It makes you question not only the current output, but also every other place where the system may be making mistakes you cannot see as clearly.

That is exactly where the old comfort of software breaks down.

The Difference We Do Not Explain Well Enough

A large part of the confusion comes from how badly most people still distinguish deterministic and probabilistic behavior in plain language.

A deterministic system is like a train route between Munich and Berlin. The switches are known. The path is predefined. If the inputs are the same, the result will be the same every single time. The strength of the system lies in repeatability. The weakness appears when the world introduces variables that were not explicitly built into the route.

A probabilistic system behaves differently. Imagine the train breaks down because of bad weather. Suddenly the next step is not fixed. Some passengers stay because waiting may still be faster than changing twice. Others reroute because they believe the delay will get worse. Some decisions depend on crowding, comfort, urgency, or weather conditions elsewhere. The path is no longer determined by a fixed sequence of switches. It is chosen in context.

That is much closer to how generative and agentic systems operate. They do not simply follow tracks. They navigate possibilities under constraints.

This also ties into the idea of Cognitive Leverage, where the goal is not perfect execution, but amplified decision-making.

This is also where my own Coda mistake becomes useful as an example. The problem was not only that the model failed. The problem was that I had tried to force probabilistic behavior into deterministic expectations. If I had defined the task more narrowly and more explicitly, something like “analyse each line item, compare it with the CSV, then correct anything using the CSV as the source of truth,” the result would likely have been much better. I was asking for zero tolerance while still operating inside a system that needed better boundaries.

That does not make probabilistic systems inferior. Quite the opposite. In the right contexts, they do not just outperform deterministic systems. They make certain tasks possible that would have been practically unreachable before. If I want a twenty-five slide draft of a presentation created within guardrails, with sources, structure, and suggestions that still leave room for my judgment, a deterministic workflow will not get me there in any realistic amount of time. A probabilistic one can.

But this also means we need a different mental model. The promise is no longer “this will never deviate.” The promise becomes “this can be guided, evaluated, corrected, and made increasingly reliable.” That is less emotionally satisfying. It is also far more powerful.

Why Trust Breaks Faster Than It Builds

This is where the Trust Tax returns with full force.

Privately, I still do not have systems that I am willing to leave entirely unverified before publishing or committing results. That is not because I am philosophically opposed to AI. It is because my current threshold of trust is not yet there. I understand, at least intellectually, that deeper technical knowledge of probabilistic behavior would likely make this easier and, thankfully, I have a broader set of skills in the organizational context to achieve that. I also understand that my own prompting discipline can improve. But those truths do not change the lived experience that trust is fragile when errors come wrapped in conviction.

That fragility is one of the defining characteristics of the current AI moment. Systems are often good enough to impress and unreliable enough to unsettle. They save enough time to become attractive and create enough uncertainty to make full delegation uncomfortable. They can feel like extraordinary collaborators one moment and slightly unhinged interns the next.

That last part is important. Reviewing AI output is, in many situations, not fundamentally different from reviewing the work of a less experienced colleague. If I ask a junior team member to draft something for me, I will still validate it before it goes out. I do not consider that an indictment of the person. It is simply responsible supervision. The strange thing is that we often accept this calmly for humans while treating the same pattern in AI as proof that the system cannot be trusted.

Some of that is reasonable. Some of it is not.

We forgive recurring human mistakes because they feel familiar. We distrust machine mistakes because they feel alien. But familiarity is not the same thing as quality.

Quality Engineering Becomes Much More Important Than It Sounds

A recent discussion on my previous post about the Trust Tax sharpened this point in exactly the right direction. A work colleague argued that Quality Engineering is no longer a supporting function in the context of Agentic AI. It becomes a core enabler. He was right.

If scaling AI means scaling human supervision alongside it, then little has actually been solved. The promise of leverage collapses under the weight of verification. A system that still requires a human shadow for every significant action is not yet a true force multiplier. It is just a new type of dependency.

What makes the role of Quality Engineering newly important is that it can no longer be understood as a downstream inspection function. It is not enough to look at the output and ask whether it seems acceptable. Quality has to be embedded into the conditions of execution from the start. Guardrails, evaluation logic, fallback behavior, expected thresholds, edge cases, escalation paths, and the definition of acceptable deviation all have to be built into the operating model.

That does not put me in disagreement with Sebastian. It extends his point. If Quality Engineering is understood not only as ensuring that the system behaves correctly but also as understanding what expected human behavior and acceptable outcome should look like, then it becomes one of the most critical translation layers in the entire AI stack.

In classical digital systems, QE often protected the release. In agentic systems, QE increasingly protects the trust model.

We Are Still Waiting for the Excel Moment of Probabilistic AI

This may be the most useful way to frame where we are.

Classical software already taught us what it means to retire a mistake. Spreadsheet logic did not eliminate human work, but it did eliminate whole classes of repetitive human error. Once the formula was correct, the recurrence stopped.

With probabilistic systems, we are not there yet in the same clean way. We are still in a stage where many organizations can see the upside but cannot yet rely on the repeatability. The initial investment is high, both financially and behaviorally. The old calculation of work hours multiplied by hourly rate versus result no longer describes the economics well. Now the equation includes fixed development cost, variable cost for usage, evaluation effort, verification burden, and a delayed payoff that only appears after the system has learned enough about the problem space and after the organization has learned enough about the system.

This is one reason why so many AI initiatives feel disappointing at first. The tradeoffs arrive early. The leverage comes later. And because the field is still green for many companies, the immediate experience is often one of increased effort rather than relief. That does not mean the economics are bad. It means the break-even point is misunderstood.

The companies that experiment early are not necessarily winning because they are already extracting maximum value. They are winning because they are learning sooner where value actually lives. They are discovering what can be trusted, what still needs verification, what must remain human, and which mistakes can eventually be retired instead of merely managed.

That is not a side effect of experimentation. It is the point of it.

The Real Risk Is Not That AI Fails, But That It Fails Differently

This is also why the discussion becomes much more serious once we move from generative assistance into agentic execution.

Used badly, generative AI may give you a bad paragraph, an ugly slide, an invented reference, or a tone that feels strangely robotic. That is inconvenient and sometimes embarrassing, but usually recoverable. Used badly, agentic AI can do damage in the real world. It can move too early, act on bad assumptions, create a false signal of certainty, or trigger consequences that are much harder to unwind.

Journalism offers some of the clearest recent examples of this failure pattern. AI-assisted writing that was inadequately validated before publication did not just create factual errors. It caused reputational damage. In other professional contexts, undervalidated AI output has led to deliverables so unreliable that they damaged client trust and, in some cases, the commercial relationship itself. I do not need to name names for the pattern to be obvious.

That is why agentic systems should still be considered generally dangerous in the current phase. Not because the technology is inherently malevolent, but because it is still widely misunderstood. GenAI used badly often gives you something you do not like. Agentic AI used badly can harm organizations in ways that have operational, financial, or reputational consequences.

The right response is not fear. It is better onboarding. A new employee is not allowed to operate without context, oversight, and boundaries simply because they are enthusiastic. An agentic system should be treated with at least the same seriousness. Guardrails, observation, staged responsibility, and carefully defined thresholds are not bureaucratic obstacles. They are the conditions under which trust can become rational.

The Burden of Verification Is the Real ROI Test

This is where many business cases for AI quietly become incoherent.

If I need to approve more output from an AI system than I would from a capable intern, the system is increasing my burden rather than reducing it. In that case, it may still be an impressive demo, but it is not yet economic leverage.

That is why the wrong question is often “Can the AI do it?” The better question is “What does it cost me to trust what it does?” That cost is not just financial. It includes attention, cognitive load, review cycles, approval effort, process redesign, and the emotional cost of uncertainty.

For many organizations, the early phase of AI adoption therefore means more work, not less. Development must happen. Experiments must be structured. Failures must be observed. Evaluation logic must be defined. Human verification remains necessary. The payoff does not come from replacing effort instantly. It comes from eventually moving mistakes out of the recurring human layer and into a correctable systems layer.

That is a longer game than most boardroom narratives suggest. But it is also the more realistic one.

The Organizations That Benefit Will Think About Mistakes Differently

The deeper implication is not that AI is magical. It is that it changes the strategic meaning of error.

In the old world, mistakes were a management problem. You trained people, added controls, improved process discipline, and hoped recurrence would be reduced. In the new world, at least for parts of the system, mistakes can increasingly become an engineering problem. That does not mean all mistakes disappear. It means that some categories of mistakes no longer have to remain part of the normal statistical background of work.

This is why the Tyson observation lands so well outside the context of self-driving cars. We are still emotionally calibrated to a world where human imperfection feels normal and machine imperfection feels intolerable. But as AI systems mature, that instinct becomes less useful. The more important question is no longer who made the mistake. It is whether the system is capable of learning from it in a way that permanently improves future execution.

Most companies still underestimate AI because they think the real challenge is technological. In many cases it is not. The harder challenge is behavioral. Leaders need the patience to tolerate an initial investment before leverage appears. Teams need to develop a new relationship to supervision, trust, and quality. Organizations need to stop treating every machine mistake as a final verdict while continuing to accept recurring human imperfection as the price of doing business.

That mindset will become expensive.

Fix Once, Fix Forever

The optimistic version of all of this is not that AI is already reliable enough to be trusted blindly. It is that we are entering a world in which trust can increasingly be built on the retirement of errors rather than on the management of repetition.

We are not fully there yet, especially not in generative and agentic contexts. The Excel moment of probabilistic systems still lies ahead. But the direction is already visible. The organizations that will benefit most are not the ones waiting for perfect certainty. They are the ones learning early which mistakes belong to humans, which mistakes belong to systems, and which mistakes can be eliminated entirely once the right guardrails and quality logic are in place.

That is a very different future from the one most people imagine when they hear AI discussed in public. It is less about sudden replacement and more about changing the economics of error.

Humans will continue to make mistakes. That is not going away. But if we treat AI seriously enough, some mistakes no longer need to remain part of the system forever. And once that becomes normal, we will have to admit something uncomfortable.

We were never actually good at living with repeated failure.

We were just used to it.

Share:

Related

The AI Productivity Paradox in Organization

The Effort Illusion: The AI Productivity Paradox

For centuries we have treated effort as proof of value. The harder something looked, the easier it was to trust. AI quietly breaks that assumption. When high-quality work suddenly becomes easier to produce, many organizations do not celebrate the leverage — they question the legitimacy of the result. The real disruption of AI may not be automation. It may be forcing organizations to confront how much they still trust effort more than judgment.

Read More »