Probability Theory

The mathematical framework for reasoning about uncertainty. Built from three axioms (Kolmogorov, 1933), probability theory provides the rigorous foundation for statistics, decision-making, risk assessment, and much of modern science.

The Axiomatic Foundation

Probability is defined on a sample space S (all possible outcomes) and assigns a number P(A) to each event A (subset of S). Three axioms govern everything:

Non-negativity: P(A) ≥ 0
Normalization: P(S) = 1
Additivity: P(A ∪ B) = P(A) + P(B) when A ∩ B = ∅

From these three rules, all of probability follows — complements (P(A’) = 1 − P(A)), inclusion-exclusion, conditional probability, independence, distributions, the law of large numbers.

This axiomatic approach is itself a powerful example of first principles thinking: rather than asking “what is probability really?”, Kolmogorov said “here are the rules it must obey” and built the entire theory from there.

Core Concepts

Conditional Probability

P(A | B) = P(A ∩ B) / P(B) — the probability of A given that B has occurred.

This leads to two foundational results:

Theorem of Total Probability: Break a problem into exhaustive cases, compute each, sum up
Bayes’ Theorem: P(A | B) = P(B | A) · P(A) / P(B) — how to update beliefs when new evidence arrives

Independence

Events A and B are independent if P(A ∩ B) = P(A) · P(B). For random variables, independence means the joint distribution factors into the product of marginals.

Independence is the key simplifying assumption in probability. When things are independent, variances add, expectations of products factor, and complex systems become tractable.

Random Variables and Distributions

A random variable maps outcomes to numbers. Its behavior is captured by:

PMF (discrete): P(X = x) for each value x
PDF (continuous): f(x) where P(a ≤ X ≤ b) = ∫ f(x) dx
Expected value: E(X) — the long-run average
Variance: Var(X) = E(X²) − [E(X)]² — how spread out the values are

Key distributions include Bernoulli, Binomial, Poisson, Normal, and Exponential — each modeling a different type of random phenomenon.

Covariance and Correlation

Covariance measures how two variables move together: Cov(X,Y) = E(XY) − E(X)E(Y)
Correlation normalizes covariance to [−1, 1]: a unit-free measure of linear association
Independent variables have zero covariance, but zero covariance does not guarantee independence

Why It Matters Beyond Mathematics

As a Mental Model

Charlie Munger places probability — specifically “the elementary math of permutations and combinations” originating from Fermat and Pascal — among his most important mental models. The ability to think probabilistically about outcomes, expected values, and base rates is foundational to good judgment.

“If you don’t get this elementary, but mildly unnatural, mathematics of elementary probability into your repertoire, then you go through a long life like a one-legged man in an ass-kicking contest.” — Charlie Munger

Decision-Making Under Uncertainty

Naval Ravikant’s concept of judgment — “knowing the long-term consequences of your actions” — is fundamentally probabilistic. Good judgment means weighing uncertain outcomes by their probabilities and magnitudes (expected value), not just considering the most likely outcome.

Cognitive Biases as Probability Errors

Many cognitive biases documented in behavioral-psychology are systematic errors in probabilistic reasoning:

Base rate neglect: Ignoring prior probabilities when evaluating evidence (the clinical test problem — see source—notes-on-probability)
anchoring-bias: Failing to properly update from a prior
Conjunction fallacy: Judging P(A ∩ B) > P(A), violating basic axioms
Gambler’s fallacy: Assuming independent events are dependent
loss-aversion: Overweighting the probability of loss relative to equivalent gains

Munger’s “Psychology of Human Misjudgment” is essentially a catalog of ways humans fail at probabilistic reasoning, making probability theory both the diagnosis and the cure.

The Bayesian Worldview

Bayes’ Theorem extends beyond a mathematical formula into a way of thinking: start with a prior belief, update it as evidence arrives, and let the math tell you how much to update. This connects to:

fallibilism — all knowledge is conjectural and subject to revision (Bayesian updating is the mathematical formalization of this)
first-principles-thinking — base your reasoning on evidence and logic, not received wisdom
high-agency — the willingness to update beliefs and act on better information

Historical Origins

Probability theory emerged from the Fermat-Pascal correspondence (1654), where Blaise Pascal and Pierre de Fermat solved the “Problem of Points” — how to fairly divide stakes in an interrupted gambling game. This practical problem gave birth to an entire mathematical discipline.

Key milestones:

1654: Fermat-Pascal correspondence (combinatorial probability)
1713: Bernoulli’s Ars Conjectandi (law of large numbers)
1763: Bayes’ Theorem published posthumously
1812: Laplace’s Théorie analytique des probabilités (systematic treatment)
1933: Kolmogorov’s Grundbegriffe der Wahrscheinlichkeitsrechnung (axiomatic foundation)

Connections

bayes-theorem — The most practically important result; connects probability to epistemology
mental-models — Probability as one of Munger’s foundational models
judgment — Probabilistic reasoning as the basis of good judgment
behavioral-psychology — Cognitive biases as systematic probability errors
first-principles-thinking — Kolmogorov’s axiomatic approach as first-principles reasoning
fallibilism — Bayesian updating as mathematical fallibilism
large-language-models — Built on probability distributions over token sequences
transformer-architecture — Attention mechanism computes conditional probabilities

Sources

source—notes-on-probability — Peter J. Cameron’s lecture notes (primary mathematical source)
source—poor-charlies-almanack — Munger on probability as a foundational mental model

🪴 PG Notes

Explorer

Probability Theory

Probability Theory

The Axiomatic Foundation

Core Concepts

Conditional Probability

Independence

Random Variables and Distributions

Covariance and Correlation

Why It Matters Beyond Mathematics

As a Mental Model

Decision-Making Under Uncertainty

Cognitive Biases as Probability Errors

The Bayesian Worldview

Historical Origins

Connections

Sources

Graph View

Table of Contents

Backlinks