Murphy’s Law

“Anything that can go wrong, will go wrong.”

The Claim

At sufficient scale, every possible failure mode occurs. If a failure is possible, it is not a question of whether it will happen, only of when and in what combination with others.

Why It Holds

Three compounding mechanisms:

  1. Combinatorial exposure. A system that runs a trillion operations per year will hit any one-in-a-billion failure roughly a thousand times. What is “rare” in a developer’s head is common at production scale.
  2. Correlated failures. Murphy is stronger than independence would predict, because failures often share a cause. A power outage takes down the primary and its “independent” backup. A bad deployment breaks all regions at once. Failures are clustered, not Poisson.
  3. Murphy arrives at the worst time. There is a real statistical effect behind the folk version: the likelihood of observing a failure during a high-stakes moment is higher than during routine operation, because high-stakes moments involve more load, more attention, and more components. “Any system under observation fails during the demo” is both folklore and empirically supported.

The Finagle / Sod Strengthening

A stronger version, sometimes called Finagle’s Law or Sod’s Law: anything that can go wrong will go wrong, at the worst possible time. This adds an adversarial twist: failures are not randomly distributed but clustered around moments of maximum consequence. The engineering interpretation is not superstition — it’s a mathematical point about Bayes’s Theorem. Conditional on you caring about the outcome (demo, launch, on-call incident), failure detection probability rises, because more people are watching more things more carefully.

In This Wiki

  • The inverse of Postel’s Law. Murphy tells you failures will happen; Postel tells you how to be robust when they do. “Be liberal in what you accept” is Postel’s engineering answer to Murphy.
  • Connects to law-of-unintended-consequences. Both are about the surprise-space of complex systems. Murphy is the retail version; unintended consequences is the wholesale version (not just individual failures but emergent ones).
  • Justifies the testing-pyramid. You cannot predict which paths will fail; you can only make the prediction cheap by running tests that cover the space.
  • Underpins bus-factor. Murphy’s Law applied to people: the key engineer will get hit by a bus, because the population of “key engineers” is large enough that the single-digit annual probability hits someone every year.
  • A counterweight to Sturgeon’s Law. Sturgeon says 90% of everything is crap; Murphy says the other 10% will eventually go crap too, given time and scale.
  • Relates to broken-windows-theory. Small failures, left unrepaired, signal tolerance and invite larger ones. Murphy guarantees they will arrive.

Practical Corollaries in Software

  1. Assume every network call can fail. Retry, timeout, circuit-break.
  2. Assume every disk write can be partial. Atomicity is a property you must design, not assume.
  3. Assume every user input is hostile. Defence-in-depth, even on “internal” endpoints.
  4. Rehearse failures. Chaos engineering (Netflix’s Chaos Monkey, Google’s DiRT exercises) makes Murphy a scheduled event rather than a surprise. This is inversion applied to reliability: instead of hoping failures don’t happen, deliberately cause them.
  5. Runbooks for the worst case. Every dependency your service has will be unavailable at some point; every storage system will be corrupted at some point; every key will be leaked at some point. Design the response before the event.

The Optimist’s Reading

Murphy’s Law is often read as pessimistic folklore. Its engineering value is the opposite: it demands that you treat rare events as inevitable and design for them in advance. The people who take Murphy seriously build systems that survive the events that sink the people who don’t.

Sources

  • source—laws-of-software-engineering — in the Quality cluster.
  • Named after Edward A. Murphy Jr., a USAF engineer (1949 Gee Whiz rocket-sled incident). The aphorism pre-dates Murphy’s naming; the engineering-design reading was codified by the US Air Force’s reliability programmes.