Notes on Probability — Peter J. Cameron

University lecture notes for MAS108 (Probability I) at Queen Mary, University of London, December 2000. A rigorous first-semester course building probability from Kolmogorov’s axioms through random variables, standard distributions, and joint distributions.

Key Ideas

1. Axiomatic Probability (Chapter 1)

Probability is built from three axioms (Kolmogorov, 1933):

P(A) ≥ 0 for any event A
P(S) = 1 where S is the entire sample space
P(A ∪ B) = P(A) + P(B) when A and B are disjoint (additivity)

Everything else — complements, inclusion-exclusion, independence — is derived from these three. This axiomatic approach sidesteps the philosophical question “What is probability?” and instead says: here are the rules; let’s see what follows.

The inclusion-exclusion principle generalizes additivity to overlapping events: P(A ∪ B) = P(A) + P(B) − P(A ∩ B)

2. Sampling and Combinatorics (Chapter 1)

Four fundamental sampling modes:

	Ordered	Unordered
With replacement	nᵏ	n+k-1 choose k
Without replacement	n!/(n-k)!	n choose k

The binomial coefficient n choose k = n! / (k!(n-k)!) is the workhorse of discrete probability. Cameron emphasizes that choosing the right sampling model is often the hardest part of a problem.

3. Independence (Chapter 1)

Two events A and B are independent if P(A ∩ B) = P(A) · P(B). This is the definition — not “one has no influence on the other.” Cameron warns: independence is surprisingly hard to detect by inspection, and should only be assumed when explicitly given or when dealing with independent physical processes (separate coin tosses, die rolls).

Mutual independence of n events requires all 2ⁿ − n − 1 subset conditions to hold, not just pairwise independence.

4. Conditional Probability and Bayes’ Theorem (Chapter 2)

Conditional probability: P(A | E) = P(A ∩ E) / P(E)

Theorem of Total Probability: If B₁, …, Bₙ partition S, then P(A) = Σ P(A | Bᵢ) · P(Bᵢ)

Bayes’ Theorem:

P(A | B) = P(B | A) · P(A) / P(B)

The clinical test example is the most striking illustration: a test that is 99% sensitive and 95% specific, applied to a disease with 0.1% prevalence, yields only a 1.94% chance the patient is actually a carrier given a positive result. This is because the vast majority of positives come from the 99.9% of non-carriers who have a 5% false positive rate. The base rate dominates.

“There is a very big difference between P(A | B) and P(B | A).”

This connects directly to base rate neglect — one of the most common cognitive errors in behavioral psychology.

Birthday Paradox: Using iterated conditional probability, Cameron shows that with 23+ people in a room, the chance of a shared birthday exceeds 50%. The calculation uses the chain rule: P(all different) = (1 − 1/365)(1 − 2/365)···(1 − (n−1)/365).

5. Discrete Random Variables (Chapter 3)

A random variable is a function from sample space to real numbers. Key quantities:

Expected value: E(X) = Σ xᵢ · P(X = xᵢ) — the “long-run average”
Variance: Var(X) = E(X²) − [E(X)]² — measures spread
Key property: For independent X, Y: E(XY) = E(X)·E(Y) and Var(X+Y) = Var(X) + Var(Y)

Standard discrete distributions:

Distribution	Parameters	E(X)	Var(X)	Use case
Bernoulli(p)	p	p	p(1−p)	Single trial, success/failure
Binomial(n,p)	n, p	np	np(1−p)	Count of successes in n trials
Hypergeometric(n,M,N)	n, M, N	nM/N	—	Sampling without replacement
Geometric(p)	p	1/p	(1−p)/p²	Trials until first success
Poisson(λ)	λ	λ	λ	Rare events in fixed interval

6. Continuous Random Variables (Chapter 3)

Continuous r.v.s use a probability density function (PDF) instead of a mass function. P(a ≤ X ≤ b) = ∫ₐᵇ f(x) dx.

Standard continuous distributions:

Distribution	Parameters	E(X)	Var(X)
Uniform(a,b)	a, b	(a+b)/2	(b−a)²/12
Normal N(μ,σ²)	μ, σ²	μ	σ²
Exponential(λ)	λ	1/λ	1/λ²

The normal distribution is the most important: the bell curve f(x) = (1/σ√(2π)) exp(−(x−μ)²/(2σ²)). Any normal can be standardized to N(0,1) via Z = (X − μ)/σ.

7. Joint Distributions, Covariance, Correlation (Chapter 4)

Covariance: Cov(X,Y) = E(XY) − E(X)·E(Y) — measures linear association
Correlation: corr(X,Y) = Cov(X,Y) / √(Var(X)·Var(Y)) — normalized to [−1, 1]
Independent r.v.s have Cov = 0 (but Cov = 0 does not imply independence)
Conditional expectation: E(X | Y = y) — the expected value of X given knowledge of Y

Historical Note

Cameron traces probability’s origins to the Fermat-Pascal correspondence (mid-17th century) — two mathematicians solving the “Problem of Points” (how to fairly divide stakes in an interrupted game of chance). This is exactly the historical lineage Munger cites when listing “elementary math of permutations and combinations” as one of his most important mental models.

Connections

probability-theory — The concept page synthesizing probability across sources
bayes-theorem — The most practically important theorem in the notes; connects to decision-making under uncertainty
mental-models — Munger explicitly lists Fermat/Pascal probability as a foundational model
behavioral-psychology — Base rate neglect (the clinical test example) is one of the key cognitive biases
anchoring-bias — Base rate neglect is a form of anchoring on the test result rather than the prior probability
judgment — Bayesian reasoning is the mathematical framework for good judgment under uncertainty
large-language-models — Modern LLMs implicitly encode probabilistic reasoning; probability distributions are fundamental to their architecture

Source Details

Author: Peter J. Cameron
Institution: Queen Mary, University of London
Course: MAS108, Probability I
Date: December 2000
Pages: 94
Raw file: raw/probability.pdf

🪴 PG Notes

Explorer

Notes on Probability — Peter J. Cameron

Notes on Probability — Peter J. Cameron

Key Ideas

1. Axiomatic Probability (Chapter 1)

2. Sampling and Combinatorics (Chapter 1)

3. Independence (Chapter 1)

4. Conditional Probability and Bayes’ Theorem (Chapter 2)

5. Discrete Random Variables (Chapter 3)

6. Continuous Random Variables (Chapter 3)

7. Joint Distributions, Covariance, Correlation (Chapter 4)

Historical Note

Connections

Source Details

Graph View

Table of Contents

Backlinks