Skip to content

The Hot Mess of AI: How Does Misalignment Scale with Model Intelligence and Task Complexity?

Authors: Alexander Hägele¹², Aryo Pradipta Gema¹³, Henry Sleight⁴, Ethan Perez⁵, Jascha Sohl-Dickstein⁵

¹Anthropic Fellows Program ²EPFL ³University of Edinburgh ⁴Constellation ⁵Anthropic

Published: February 2026

Resources: Paper, Code

Research Context: Completed as part of the first Anthropic Fellows Program during Summer 2025.

tl;dr

The authors decompose frontier reasoning model errors into bias and variance components. Their key finding: "as tasks get harder and reasoning gets longer, model failures become increasingly dominated by incoherence rather than systematic misalignment."

Introduction

The paper examines two contrasting failure modes for advanced AI systems: coherent pursuit of misaligned goals versus incoherent, self-undermining behavior. It builds on prior "hot mess theory" research that found intelligent entities judged subjectively less coherent. The central research question asks whether model failures increasingly resemble unsystematic errors as intelligence and task difficulty increase.

Measuring Incoherence: A Bias-Variance Decomposition

The framework uses classical decomposition where:

  • Error = Bias² + Variance
  • Bias represents systematic, consistent errors
  • Variance captures unpredictable, variable errors

Error incoherence is calculated as: Variance / Error, yielding values between 0 (completely systematic) and 1 (entirely random).

Key Findings

Finding 1: Longer Reasoning Increases Incoherence

Across all tested tasks and models, extended reasoning sequences produce increasingly incoherent errors, measurable through reasoning tokens, agent actions, or optimizer steps.

Finding 2: Complex Relationship Between Intelligence and Coherence

Results vary by context:

  • Synthetic tasks show increased incoherence with model size
  • Expert surveys indicate larger models behave less coherently
  • Benchmark performance is mixed -- easy tasks show more coherent errors in larger models, while hardest tasks show unchanged or increased incoherence

Finding 3: Natural Overthinking Dominates Reasoning Budgets

Models that spontaneously reason longer show dramatic incoherence spikes compared to deliberately increased reasoning allocations, which produce modest coherence improvements.

Finding 4: Ensembling Reduces Incoherence

Aggregating multiple samples decreases variance, producing more coherent behavior, though practical limitations exist for irreversible real-world tasks.

Why Expect Incoherence?

The paper argues large transformer models function as dynamical systems rather than optimizers. "Constraining a generic dynamical system to act as a coherent optimizer is extremely difficult." Training models to optimize becomes increasingly challenging without guaranteed scaling benefits.

The Synthetic Optimizer Experiment

Researchers trained transformers to predict steepest descent optimization steps:

  • Larger models reduce bias faster than variance
  • Incoherence grows with trajectory length
  • Models learn correct objectives before learning reliable pursuit

Implications for AI Safety

  1. Variance dominates on extended tasks - difficult problems show predominantly incoherent failures
  2. Scale doesn't guarantee coherent errors - larger models improve accuracy without reducing incoherence
  3. Reward hacking becomes relatively important - if AI behaves as hot mess rather than coherent wrong-optimizer, training-time goal specification matters more
  4. Framework limitations - reliable measurement requires well-defined targets, limiting assessment of open-ended or hidden objectives

Conclusion

Using bias-variance analysis, researchers found longer reasoning consistently increases error incoherence and intelligence doesn't guarantee coherent failures. They argue findings should inform AI risk discussions and prioritization of safety research.

Acknowledgements

The team thanks Andrew Saxe, Brian Cheung, Kit Frasier-Taliente, Igor Shilov, Stewart Slocum, Aidan Ewart, David Duvenaud, and Tom Adamczewski for contributions.