Statistics – scatter graphs

3 Responses

Sofia Marin says:

May 8, 2026 at 2:06 pm

Short answer: for the ordinary least-squares regression of y on x (with an intercept), the fitted line always goes through the mean point (x̄, ȳ); that’s a theorem. When you’re drawing a line “by eye” for exam-style scatter graphs, it’s a good habit (not a law) to aim the line so it passes near that centroid, because you’re trying to mimic the regression line-just don’t contort the slope to hit it if the overall fit then looks worse. About “balancing”: least squares balances vertical residuals in the sense that their sum is zero (and, more strongly, it minimizes the sum of squared vertical residuals), not that there are equal numbers of points above and below. So don’t count dots; think “equal total up-ness and down-ness,” with big deviations counting more. And yes, the distance used is vertical (in y), not perpendicular-that perpendicular idea is a different method (orthogonal regression) and won’t, in general, go through (x̄, ȳ). Tiny example: points (1,2), (2,3), (3,5), (4,4) have x̄=2.5, ȳ=3.5; the least-squares line is y=0.8x+1.5, which passes through (2.5,3.5). Residuals are about −0.3, −0.1, +1.1, −0.7: three below, one above, yet they add to 0. For your sketching mental model: draw a straight line that threads the cloud, roughly through (x̄, ȳ), and choose the slope so the vertical residuals visually “balance” in total size, not in count.

Reply
Ruby Shaw says:

May 9, 2026 at 2:05 pm

Short answer: the least-squares regression line (the calculated one) always goes through the mean point (x̄, ȳ) and measures “error” vertically. If you’re drawing a by-eye best-fit line for an exam, you don’t have to nail it exactly, but aiming to pass near (x̄, ȳ) is a great anchor, and you should think in terms of vertical residuals, not perpendicular ones. The “errors balance” idea for least squares means the sum of the vertical residuals is zero, not that you must have the same number of points above and below. You can absolutely have more points above than below as long as the positives and negatives cancel overall. Here’s a quick illustration of that: residuals +4, +1, −5 sum to 0 even though two are above and one is below.

With your numbers, you’re already doing the right thing. You had x̄ = 4 and ȳ ≈ 61.4, and a by-eye slope of about 6.33. If you want the line to pass through the mean point (mimicking the regression property), just set the intercept to ȳ − slope × x̄ ≈ 61.4 − 6.33×4 ≈ 36.1, which gives y ≈ 6.33x + 36.1. That line predicts 67.7 at x = 5, so the residual for the point (5, 68) is 68 − 67.7 = +0.3, a vertical difference. Perpendicular distances would produce a different “best” line (that’s orthogonal regression), which isn’t what standard school/exam regression uses. For a tidy mental model in exam sketches: pick a slope that makes the vertical deviations look roughly balanced in size above and below, and let the line go near (x̄, ȳ). If you want to compute its equation from your sketch, read off two convenient points on your drawn line (not necessarily data points) to get the slope, then adjust the intercept so it goes through (x̄, ȳ).

If you’d like a proof-y peek at why least squares passes through (x̄, ȳ) and has residuals that sum to zero, this Khan Academy explainer is nice: https://www.khanacademy.org/math/ap-statistics/bivariate-data-ap/least-squares-regression/v/why-the-least-squares-regression-line-always-passes-through-the-mean-of-x-and-y

Reply
Declan Hughes says:

May 11, 2026 at 2:05 pm

Short answer: the exact least-squares regression line (with an intercept) always goes through the mean point (x̄, ȳ). If you’re drawing a by-eye “best-fit” line for an exam, you don’t have to force it exactly through that dot, but using (x̄, ȳ) as an anchor is a smart trick. Eyeball a sensible slope from the overall trend, then set the intercept to b = ȳ − m x̄ so your line goes through the mean. That’s exactly what you did, and it’s a solid method.

On “balancing errors”: for least squares, it’s the sum of vertical residuals that equals zero, not “equal numbers of points above and below.” Counting points is a dodgy heuristic-one big outlier can outweigh five tiny residuals. Think in terms of vertical deviations (because you’re predicting y from x), not perpendicular distances. Perpendicular distances are for a different beast (orthogonal/Deming regression), not what school exams usually want. So the right mental model: draw a straight line through the middle of the cloud, roughly around (x̄, ȳ), with the total vertical “ups” and “downs” looking about balanced, and read the slope using two well-separated points on your drawn line.

Hope this helps!

Reply

About Us

About

Community Hub

Resources

More Resources

Puzzle Types

Concepts

Learn More

About Us

About

Community Hub

Resources

More Resources

Puzzle Types

Concepts

Learn More

Does the line of best fit on a scatter graph have to go through the mean point?

3 Responses

Leave a Reply Cancel reply

Join Our Community