I’m revising my stats fundamentals and my brain keeps buffering on box plots. I think I get the five-number summary, but I’m wobbly on where the whiskers are supposed to end when there are outliers.
Example I’m practicing with: 2, 3, 3, 4, 5, 5, 6, 9, 10, 10, 11, 30.
My attempt:
– Median = (5 + 6)/2 = 5.5
– Q1 = median of the lower six = (3 + 4)/2 = 3.5
– Q3 = median of the upper six = (10 + 10)/2 = 10
– IQR = 10 − 3.5 = 6.5
– Fences: Q1 − 1.5·IQR = −6.25, Q3 + 1.5·IQR = 19.75
So I’m guessing 30 is an outlier.
Here’s where I’m stuck: do the whiskers go to the min and max of the data (2 and 30) and then I mark 30 as an outlier, or do the whiskers stop at the most extreme non-outlier values (2 and 11) and then 30 is a separate point? I sketched the box from 3.5 to 10 with a line at 5.5, but I can’t decide where the top whisker should end.
Tiny tangent that might be the real culprit: different sources give me different quartiles. A calculator gave me Q1 = 3.75 for this same set (not 3.5), which shifts the fences a bit. For box plots by hand, which quartile convention should I use so I’m not marked wrong? And if a data value lands exactly on a fence, is that counted as an outlier or included with the whisker?
How do I apply the rule consistently here? I’m trying to strengthen my fundamentals and keep getting tripped up by these edge cases.
















3 Responses
I picture a box plot like a little courtyard with a fence: the box is the house (from Q1 to Q3), the median is the front door, and the whiskers are the paths that stretch out until they hit the fence. The fence lives at Q1 − 1.5·IQR and Q3 + 1.5·IQR. The usual rule is: whiskers go to the most extreme data values that are still inside the fence, and anything past the fence gets plotted as a separate dot. For your data, Q1 = 3.5, Q3 = 10, so IQR = 6.5, fences are −6.25 and 19.75. That makes 30 an outlier. So the whiskers should stop at 2 on the low end and 11 on the high end, and 30 gets its own dot. If a value lands exactly on a fence, some people treat it as an outlier and keep the whisker just short of it (I usually do that to be “safe”), while others include it with the whisker-pick one rule and stick to it for consistency. A few hand-drawn box plots even draw the whisker out to the fence value itself, even if no data fall there; that’s fine as long as outliers are still shown as separate points.
About your tangent: quartiles do come in multiple flavors (median-of-halves, Tukey’s hinges, “inclusive/exclusive” calculator versions), which is why you saw Q1 = 3.75 instead of 3.5. For hand work, match your course’s convention or clearly state the one you’re using; the outlier call for a very large value like 30 won’t flip under any common method here. Quick mini-example to cement it: take 1, 2, 3, 4, 100. Median = 3; Q1 = (1 + 2)/2 = 1.5; Q3 = (4 + 100)/2 = 52; IQR = 50.5; fences are −74.25 and 127.5. No outliers, so the whiskers run all the way to 1 and 100. It’s a nice reminder that “big” doesn’t automatically mean “outlier”-it only counts if it’s past the fence.
In a Tukey boxplot the whiskers extend to the most extreme data values that are within the 1.5×IQR fences (values beyond are plotted as outliers), and a value exactly on a fence is included with the whisker, not treated as an outlier.
Example: using Tukey quartiles for your data gives Q1=3.5, Q3=10, IQR=6.5, fences −6.25 and 19.75, so the whiskers end at 2 and 11 and 30 is plotted separately; different tools use different quartile conventions, so when working by hand state your convention (e.g., Tukey) and apply it consistently.
You’ve got the right idea. In a Tukey (modified) box plot, the whiskers extend to the most extreme data values that are not beyond the fences, not to the min and max regardless. For your data, with Q1 = 3.5, Q3 = 10, IQR = 6.5, the fences are −6.25 and 19.75, so 30 is an outlier. The upper whisker stops at the largest value ≤ 19.75, which is 11; the lower whisker stops at the smallest value ≥ −6.25, which is 2. You don’t draw to the fence unless a data point lies exactly there. A value exactly on a fence is not an outlier; it’s included in the whisker. The quartile discrepancy (Q1 = 3.5 vs 3.75) comes from different conventions: “median of halves” (Tukey’s hinges) gives 3.5, while a common interpolation method gives 3.75. Both put 30 beyond the upper fence here, so the whiskers are still 2 and 11. For hand work, use the convention your course specifies; if none is given, Tukey’s hinges are standard and easy to execute, and it’s good practice to state the rule you used. Does your class or software specify a quartile method, and do they ask for “modified” box plots (with 1.5·IQR outliers) explicitly?