I’m revising my statistics fundamentals and box plots keep doing a little dance in my brain. I thought they were straightforward, but I’m stuck on the quartiles and whiskers. When I split the data to find Q1 and Q3, am I supposed to include the median in the lower/upper halves or exclude it? I keep making a mistake when the number of data points is even vs. odd, and different sources seem to use different rules.
Also, the whiskers. Sometimes I see them go all the way to the min and max, and other times they stop at some 1.5×IQR boundary and then there are little dots for outliers. Which convention should I actually use when I’m practicing or taking a test? If I’m just given a box plot with no notes, how can I tell which rule was used?
Another thing: what happens with lots of repeated values? Do quartiles have to land on actual data points, or can they be between them? And if I only have a frequency table (or grouped data), is there a sensible way to draw a box plot without the raw list? If yes, what’s the usual method people expect?
Follow-up: when I’m trying to read skew from a box plot, is a longer upper whisker always a sign of right skew, or could a couple of outliers mess with that picture? I’d love a simple rule-of-thumb or checklist I can stick to while I strengthen my basics.
















3 Responses
Great questions – box plots look simple, but there are a couple of “choose-your-own-adventure” rules hiding under the hood! Two common conventions for Q1 and Q3 are: (a) Tukey’s hinges (for odd n, include the median in both halves; for even n, just split), and (b) the “exclusive median”/quantile approach (for odd n, exclude the median from both halves and often allow interpolation). Both are valid; pick one and be consistent (and on a test, follow your course’s stated convention). Quartiles don’t have to be actual data values – with interpolation they can sit between data points – and repeated values often make quartiles land right on that repeated number (sometimes even collapsing the box if IQR=0). Quick example: data 1,2,3,4,5,6,7,8,30 (n=9). Median is 5. Tukey hinges: Q1=median of 1..5=3, Q3=median of 5..30=7, so IQR=4. Exclusive-median: Q1=median of 1..4=(2+3)/2=2.5, Q3=median of 6..30=(7+8)/2=7.5, so IQR=5. For even n the two methods typically agree because there’s no single middle point to include/exclude.
Whiskers come in two flavors: (1) min-to-max, or (2) Tukey-style fences at Q1−1.5×IQR and Q3+1.5×IQR, with points beyond plotted as outliers. Using the example above, Tukey hinges give fences −3 and 13; the upper whisker stops at 8 and 30 is an outlier dot. With the exclusive-median quartiles, fences are −5 and 15; still the upper whisker stops at 8 and 30 is an outlier. In a min–max plot, the whiskers would stretch to 1 and 30 and there’d be no outlier dots. Which to use? Unless told otherwise, I’d default to the Tukey 1.5×IQR style (it’s the most diagnostic). If you’re handed a plot with no notes, outlier dots are the giveaway that 1.5×IQR fences were used; if there are no dots, you can’t always tell – both conventions can land whiskers on the min/max when there are no outliers.
Reading skew: a handy rule of thumb is to look at the box first, then the in-fence whiskers. Right-skew tends to show the median left of center in the box and a longer upper whisker; left-skew is the mirror image. A few extreme values can fake a “long whisker” in the min–max style, but in Tukey-style they appear as separate dots – don’t let one or two dots override the story told by the box and the whiskers inside the fences. For frequency tables/grouped data, the usual approach is to use cumulative frequencies to locate the 25th, 50th, and 75th percentiles (e.g., at 0.25N, 0.50N, 0.75N) and linearly interpolate within the containing class; plot those as your quartiles and proceed with whichever whisker rule you’re using. Happy box-spotting – once you lock in your convention, the patterns really start to pop!
You’re not alone-box plots do a little shimmy in my head too. My go-to rule (though I admit I still second-guess it) is: sort the data, then if n is odd, exclude the median and take the medians of the lower and upper halves as Q1 and Q3; if n is even, just split in half. Some books include the median in both halves when n is odd (Tukey’s hinges), and I think either convention is fine as long as you’re consistent-on exams they usually accept either if your method is clear. For whiskers, there are two common setups: all the way to min/max, or out to the last points within 1.5×IQR with any beyond plotted as “outliers.” If nothing is said, I usually assume the 1.5×IQR version (not sure that’s universal!), and the presence of outlier dots is a giveaway; without dots, you sometimes can’t tell. Repeats are okay: quartiles can land between data points by interpolation, though when sketching quickly I sometimes just pick the nearest data value. From a frequency table, the usual classroom method is to find cumulative frequencies and locate the 0.25(n+1), 0.5(n+1), 0.75(n+1) positions inside the relevant classes via linear interpolation, then draw the box; whiskers by either rule. For skew, a longer upper whisker and the median sitting closer to Q1 suggest right skew, but a few high outliers can fake that signal-check both whisker lengths and where the median sits in the box. Nice refresher: https://www.khanacademy.org/math/statistics-probability/summarizing-quantitative-data/box-and-whisker-plots
I like a simple checklist: split your data, excluding the median when there’s an odd count (include it in neither half) and just halving when even; quartiles don’t have to be actual data points (ties are fine-interpolate), and unless told otherwise use Tukey’s rule where whiskers extend to the most extreme points within 1.5×IQR (dots beyond are outliers)-if you see dots, it’s 1.5×IQR; if not, whiskers probably hit min/max. Quick example: for 1,2,3,4,5,6,7, you get Q1=2, median=4, Q3=6, IQR=4, fences at −4 and 12 so whiskers reach 1 and 7; a longer upper whisker with the median sitting low in the box suggests right-skew, but a couple of high outliers can fake that, so glance at both whisker lengths and where the median sits inside the box.