Why Earth System Models Fail in Practice

Earth system models (ESMs) are often evaluated in terms of their ability to reproduce historical climate statistics or large-scale mean states. But in practice, their most important role is not hindcasting — it is providing probabilistic structure for future risk.

This creates a fundamental tension: models can be “good” in a climatological sense while still being structurally weak in the variables that matter for prediction under change.

Three recurring failure modes stand out:

1. Structural incompleteness

Many key processes are either parameterised or absent altogether — especially in land surface and disturbance systems. Examples include:

These are not second-order effects; they shape system-level feedbacks.

2. Emergent behaviour without constraint

ESMs often produce emergent dynamics that are not directly constrained by observations. This is especially true in coupled carbon–climate feedbacks, where compensating errors can yield plausible global fluxes for the wrong reasons.

3. Evaluation mismatch

Most benchmarking focuses on present-day climatology. However, predictive skill depends on transient response behaviour under forcing trajectories — a fundamentally different target.

The result is a paradox: models can appear skillful in evaluation metrics while still being poorly calibrated for forward prediction.

A more honest framing is that ESMs are not predictive instruments in a strict sense, but evolving hypothesis spaces for Earth system behaviour.

The question then becomes not whether they are “right”, but: how reliably they structure uncertainty about future states.