Goodhart’s Law

“When a measure becomes a target, it ceases to be a good measure.” — popularised form (Marilyn Strathern, 1997), after Charles Goodhart (1975)

The Claim

A metric that correlates with a desired outcome loses that correlation once it becomes the target. The act of optimising for the measure decouples it from the underlying thing it was supposed to indicate.

The short form: you get what you measure, not what you wanted.

Why It Holds

Any metric is a proxy. Before it is a target, behaviour is generated by whatever actually causes good outcomes, and the metric incidentally reflects that. Once the metric is a target, behaviour is generated to move the metric — and the cheapest way to move a metric is rarely the thing that produced the correlation.

The result is a predictable drift:

  1. Metric is correlated with outcome. (Both flow from good underlying work.)
  2. Metric is promoted to target. Rewards attached.
  3. Behaviour shifts toward moving the metric directly.
  4. The cheap paths to moving the metric (gaming, narrow optimisation, lying) don’t produce the underlying outcome.
  5. The metric goes up; the outcome doesn’t. The correlation breaks.

The Software-Engineering Version

Every metric you might plausibly grade engineers on has a textbook failure mode:

MetricHow it breaks
Lines of codeVerbosity, copy-paste, refactoring aversion.
Commits per weekTrivial commits, reverted-to-unreverted churn.
Tickets closedSmall easy tickets preferred; hard ones left to rot.
Unit test countTrivial tautological tests; real assertions weakened.
Code coverage %Tests that exercise code without checking behaviour.
Code review turnaroundRubber-stamp approvals.
Bug count (lower is better)Reclassifying bugs as “feature requests.”
On-call incidentsRaising the threshold for “incident”; suppressing alerts.

The pattern is universal: every metric generates a cheap strategy that moves the number without moving the underlying quality.

In This Wiki

  • Directly an principal-agent-problem. Agents optimise the measured outcome; principals want the underlying outcome. The gap between the two is the space in which Goodhart operates. Naval’s framing of principal-agent aligns exactly.
  • A flavour of reinforcement. You are training the agent on the reward signal you provide. If the reward signal is a proxy, you will get behaviour that maximises the proxy. This is Skinner’s operant conditioning on an organisational scale.
  • Connects to goal-gradient-effect. Proximity to a quantified target accelerates effort near the threshold — but it’s effort toward the number, not the outcome. Quarterly sales targets, engineering OKRs, and weight-loss apps all exhibit the gradient acceleration and the gaming.
  • Explains why nudge-theory needs care. Choice architecture that targets an easily-gameable measure breaks the same way. Thaler’s school-cafeteria nudges worked because “fruit consumption” is hard to game; test-score nudges have the opposite record.
  • Connects to wysiati. If the only thing visible to decision-makers is the metric, the metric is the world. Goodhart is WYSIATI weaponised by incentive design.
  • Antidote from first-principles-thinking. Ask what you actually want, not what you can cheaply measure. The right answer is usually “multiple independent signals, weighted by judgment” rather than a single KPI.

Campbell’s Law

Donald Campbell (1976) stated a nearly identical principle in social science: “The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.” Campbell’s formulation came first in the social sciences; Goodhart’s in monetary policy. They have since merged in popular usage.

Taleb’s Corollary

Nassim Taleb’s version: any metric reported by a party who benefits from gaming it is uninformative. Implies you should trust only metrics reported by adversaries or external measurement.

The Gilb Counter-Weight

Gilb’s Law (“anything that needs to be quantified can be measured in some way better than not measuring it”) is the optimistic complement. Gilb + Goodhart together: measure, but don’t target; or target multiple independent measures so no single one becomes cheap to game.

Practical Defences

  1. Don’t grade on a single metric. Multiple correlated signals force behaviour into the region where all of them move together — closer to the underlying outcome.
  2. Rotate metrics. Metrics lose signal once optimised. Change them before they’re gamed.
  3. Measure outputs, not inputs. Lines of code is an input; user-visible quality is an output. Outputs are harder to fake.
  4. Use judgment. The best defence is one Naval, Munger, and Goodhart would all endorse: small numbers of high-quality evaluators forming considered opinions. Expensive, unscalable, works.

Sources

  • source—laws-of-software-engineering — in the Planning cluster.
  • Charles Goodhart, “Problems of Monetary Management: the UK Experience” (1975) — original.
  • Marilyn Strathern (1997) — popularised “when a measure becomes a target” phrasing.
  • Donald Campbell (1976) — parallel formulation in social science.