I’m reviewing sample spaces and I’m unsure how to define them when I only care about a summary of the experiment.
Example: toss a fair coin three times. I only care about the number of heads, so I set the sample space to S = {0,1,2,3}. Then I figured each outcome should be equally likely (1/4 each), so P(at least one head) = 3/4. But the solution I saw lists all 8 sequences (HHH, HHT, …, TTT) and gets a different result.
Why is my sample space not valid? If I’m only observing the number of heads, shouldn’t those four values be the elementary outcomes, and equally likely? Does a sample space have to have equally likely outcomes to be “correct”? Or must it always include the most detailed outcomes even if I don’t observe them?
I tried drawing a tree diagram and grouping sequences by the same number of heads, but then the groups are different sizes, which seems to break the 1/4 idea. Not sure if that’s the right way to think about it. I also tried the same approach with the sum of two dice (sample space {2,…,12}) and ran into the same issue, so I’m probably missing a principle.
Any help appreciated!
















3 Responses
Great question! Your sample space {0,1,2,3} is absolutely valid if you only care about the number of heads-but the trap is assuming those four outcomes are equally likely. The underlying eight sequences (HHH, HHT, …, TTT) are equally likely, and when you “compress” them to the number of heads, the groups have sizes 1, 3, 3, 1, so the probabilities should be 1/8, 3/8, 3/8, 1/8 for 0,1,2,3 heads respectively. That’s why P(at least one head) = 1 − P(0 heads) = 1 − 1/8 = 7/8, not 3/4. In fancy-but-helpful terms, you can think of the number of heads N as a function from the detailed space to {0,1,2,3}; it “pushes forward” the uniform measure to the binomial distribution P(N=k) = C(3,k)/2^3. So no, a sample space does not have to have equally likely outcomes to be correct, and it doesn’t have to include the most detailed outcomes either-you just need to assign the right probabilities on whatever space you choose. Your tree diagram instinct was spot on: the unequal group sizes are exactly why the probabilities aren’t uniform. Same story with sums of two dice: the state space {2,…,12} is fine, but the probabilities follow 1,2,3,4,5,6,5,4,3,2,1 out of 36, not 1/11 each. I might be over-nerding it a bit, but the key principle is: compress if you like, just carry along the correct weights from the finer description.
You’re totally allowed to use S = {0,1,2,3} as the sample space if you only record the number of heads; the catch is that those four outcomes aren’t equally likely. What you’ve really done is “merge” the 8 equally likely sequences into 4 buckets: 0 comes from {TTT} (1 way), 1 from {HTT, THT, TTH} (3 ways), 2 from {HHT, HTH, THH} (3 ways), and 3 from {HHH} (1 way). So the correct probabilities on {0,1,2,3} are P(0)=1/8, P(1)=3/8, P(2)=3/8, P(3)=1/8, and then P(at least one head)=1−P(0)=7/8. A sample space doesn’t have to have equally likely outcomes to be “correct,” and it doesn’t have to list the most detailed outcomes either; you just need mutually exclusive, exhaustive outcomes with the right probabilities. In math-speak, you can think of “number of heads” as a random variable on the detailed space, and when you switch to its values {0,1,2,3}, the probabilities come along by summing over the sequences that map there (the binomial counts C(3,k)). Your tree diagram instinct was spot on: the groups are different sizes, which is exactly why the probabilities on {0,1,2,3} aren’t 1/4 each. Same story for the sum of two dice: {2,…,12} is a fine sample space, but the probabilities are uneven (1,2,3,4,5,6,5,4,3,2,1 over 36). Tiny analogy: it’s like scooping a handful of mixed candy and then only caring about the color-if there are more red pieces in the bag, “red” won’t be a 1/4 chance just because there are four colors; it depends on how many actual candies got lumped into that color.
It’s totally fine to use S = {0,1,2,3}; just remember “number of heads” is a summary that many different flip-sequences can lead to-like several roads to the same cafe-so those four values aren’t equally likely, and their probabilities must be inherited from the 8 equally likely sequences. Worked example: 0 heads (TTT) = 1/8, 1 head (HTT,THT,TTH) = 3/8, 2 heads (HHT,HTH,THH) = 3/8, 3 heads (HHH) = 1/8, so P(at least one head) = 7/8.