Probability Theory
The mathematical framework for reasoning about uncertainty. Built from three axioms (Kolmogorov, 1933), probability theory provides the rigorous foundation for statistics, decision-making, risk assessment, and much of modern science.
The Axiomatic Foundation
Probability is defined on a sample space S (all possible outcomes) and assigns a number P(A) to each event A (subset of S). Three axioms govern everything:
- Non-negativity: P(A) ≥ 0
- Normalization: P(S) = 1
- Additivity: P(A ∪ B) = P(A) + P(B) when A ∩ B = ∅
From these three rules, all of probability follows — complements (P(A’) = 1 − P(A)), inclusion-exclusion, conditional probability, independence, distributions, the law of large numbers.
This axiomatic approach is itself a powerful example of first principles thinking: rather than asking “what is probability really?”, Kolmogorov said “here are the rules it must obey” and built the entire theory from there.
Core Concepts
Conditional Probability
P(A | B) = P(A ∩ B) / P(B) — the probability of A given that B has occurred.
This leads to two foundational results:
- Theorem of Total Probability: Break a problem into exhaustive cases, compute each, sum up
- Bayes’ Theorem: P(A | B) = P(B | A) · P(A) / P(B) — how to update beliefs when new evidence arrives
Independence
Events A and B are independent if P(A ∩ B) = P(A) · P(B). For random variables, independence means the joint distribution factors into the product of marginals.
Independence is the key simplifying assumption in probability. When things are independent, variances add, expectations of products factor, and complex systems become tractable.
Random Variables and Distributions
A random variable maps outcomes to numbers. Its behavior is captured by:
- PMF (discrete): P(X = x) for each value x
- PDF (continuous): f(x) where P(a ≤ X ≤ b) = ∫ f(x) dx
- Expected value: E(X) — the long-run average
- Variance: Var(X) = E(X²) − [E(X)]² — how spread out the values are
Key distributions include Bernoulli, Binomial, Poisson, Normal, and Exponential — each modeling a different type of random phenomenon.
Covariance and Correlation
- Covariance measures how two variables move together: Cov(X,Y) = E(XY) − E(X)E(Y)
- Correlation normalizes covariance to [−1, 1]: a unit-free measure of linear association
- Independent variables have zero covariance, but zero covariance does not guarantee independence
Why It Matters Beyond Mathematics
As a Mental Model
Charlie Munger places probability — specifically “the elementary math of permutations and combinations” originating from Fermat and Pascal — among his most important mental models. The ability to think probabilistically about outcomes, expected values, and base rates is foundational to good judgment.
“If you don’t get this elementary, but mildly unnatural, mathematics of elementary probability into your repertoire, then you go through a long life like a one-legged man in an ass-kicking contest.” — Charlie Munger
Decision-Making Under Uncertainty
Naval Ravikant’s concept of judgment — “knowing the long-term consequences of your actions” — is fundamentally probabilistic. Good judgment means weighing uncertain outcomes by their probabilities and magnitudes (expected value), not just considering the most likely outcome.
Cognitive Biases as Probability Errors
Many cognitive biases documented in behavioral-psychology are systematic errors in probabilistic reasoning:
- Base rate neglect: Ignoring prior probabilities when evaluating evidence (the clinical test problem — see source—notes-on-probability)
- anchoring-bias: Failing to properly update from a prior
- Conjunction fallacy: Judging P(A ∩ B) > P(A), violating basic axioms
- Gambler’s fallacy: Assuming independent events are dependent
- loss-aversion: Overweighting the probability of loss relative to equivalent gains
Munger’s “Psychology of Human Misjudgment” is essentially a catalog of ways humans fail at probabilistic reasoning, making probability theory both the diagnosis and the cure.
The Bayesian Worldview
Bayes’ Theorem extends beyond a mathematical formula into a way of thinking: start with a prior belief, update it as evidence arrives, and let the math tell you how much to update. This connects to:
- fallibilism — all knowledge is conjectural and subject to revision (Bayesian updating is the mathematical formalization of this)
- first-principles-thinking — base your reasoning on evidence and logic, not received wisdom
- high-agency — the willingness to update beliefs and act on better information
Historical Origins
Probability theory emerged from the Fermat-Pascal correspondence (1654), where Blaise Pascal and Pierre de Fermat solved the “Problem of Points” — how to fairly divide stakes in an interrupted gambling game. This practical problem gave birth to an entire mathematical discipline.
Key milestones:
- 1654: Fermat-Pascal correspondence (combinatorial probability)
- 1713: Bernoulli’s Ars Conjectandi (law of large numbers)
- 1763: Bayes’ Theorem published posthumously
- 1812: Laplace’s Théorie analytique des probabilités (systematic treatment)
- 1933: Kolmogorov’s Grundbegriffe der Wahrscheinlichkeitsrechnung (axiomatic foundation)
Connections
- bayes-theorem — The most practically important result; connects probability to epistemology
- mental-models — Probability as one of Munger’s foundational models
- judgment — Probabilistic reasoning as the basis of good judgment
- behavioral-psychology — Cognitive biases as systematic probability errors
- first-principles-thinking — Kolmogorov’s axiomatic approach as first-principles reasoning
- fallibilism — Bayesian updating as mathematical fallibilism
- large-language-models — Built on probability distributions over token sequences
- transformer-architecture — Attention mechanism computes conditional probabilities
Sources
- source—notes-on-probability — Peter J. Cameron’s lecture notes (primary mathematical source)
- source—poor-charlies-almanack — Munger on probability as a foundational mental model