Does the line of best fit on a scatter graph have to go through the mean point?

I’m practicing scatter graphs and I’m weirdly obsessed with getting the line of best fit “right.” I plotted hours studied (x) vs test score (y), and the points look pretty linear (which is satisfying!).

Here’s where I’m stuck: I read that the least squares regression line always goes through the mean point (x̄, ȳ). Is that supposed to be true even when I’m just drawing a best-fit line by eye on paper? Should I force my line to pass through the mean dot, or is that only for the calculated regression line?

My attempt: from my data I got x̄ = 4 and ȳ ≈ 61.4. I drew a line through (4, 61.4), then estimated the slope using two points near the edges and ended up with something like y ≈ 6.33x + 36.1. For x = 5 this predicts ≈ 67.7, and for the actual point (5, 68) I called the residual 68 − 67.7 ≈ 0.3 (so, above the line). That part seems okay.

But I’m confused about two things:
– When people say the errors should “balance,” do they mean equal numbers of points above and below the line, or that the sum of vertical residuals should be zero? I kept trying to make the counts equal and my line started looking wrong.
– For the “distance from the line,” should I be using the vertical difference (in y) or the shortest (perpendicular) distance? I first used perpendicular distances and got different residuals.

What’s the right mental model for exam-style, by-eye scatter graphs here? Aim for (x̄, ȳ)? Try to balance vertical residuals? Or am I mixing up the exact regression rules with the eyeballed approach?

Any help appreciated!

3 Responses

  1. Short answer: for the ordinary least-squares regression of y on x (with an intercept), the fitted line always goes through the mean point (x̄, ȳ); that’s a theorem. When you’re drawing a line “by eye” for exam-style scatter graphs, it’s a good habit (not a law) to aim the line so it passes near that centroid, because you’re trying to mimic the regression line-just don’t contort the slope to hit it if the overall fit then looks worse. About “balancing”: least squares balances vertical residuals in the sense that their sum is zero (and, more strongly, it minimizes the sum of squared vertical residuals), not that there are equal numbers of points above and below. So don’t count dots; think “equal total up-ness and down-ness,” with big deviations counting more. And yes, the distance used is vertical (in y), not perpendicular-that perpendicular idea is a different method (orthogonal regression) and won’t, in general, go through (x̄, ȳ). Tiny example: points (1,2), (2,3), (3,5), (4,4) have x̄=2.5, ȳ=3.5; the least-squares line is y=0.8x+1.5, which passes through (2.5,3.5). Residuals are about −0.3, −0.1, +1.1, −0.7: three below, one above, yet they add to 0. For your sketching mental model: draw a straight line that threads the cloud, roughly through (x̄, ȳ), and choose the slope so the vertical residuals visually “balance” in total size, not in count.

  2. Short answer: the least-squares regression line (the calculated one) always goes through the mean point (x̄, ȳ) and measures “error” vertically. If you’re drawing a by-eye best-fit line for an exam, you don’t have to nail it exactly, but aiming to pass near (x̄, ȳ) is a great anchor, and you should think in terms of vertical residuals, not perpendicular ones. The “errors balance” idea for least squares means the sum of the vertical residuals is zero, not that you must have the same number of points above and below. You can absolutely have more points above than below as long as the positives and negatives cancel overall. Here’s a quick illustration of that: residuals +4, +1, −5 sum to 0 even though two are above and one is below.

    With your numbers, you’re already doing the right thing. You had x̄ = 4 and ȳ ≈ 61.4, and a by-eye slope of about 6.33. If you want the line to pass through the mean point (mimicking the regression property), just set the intercept to ȳ − slope × x̄ ≈ 61.4 − 6.33×4 ≈ 36.1, which gives y ≈ 6.33x + 36.1. That line predicts 67.7 at x = 5, so the residual for the point (5, 68) is 68 − 67.7 = +0.3, a vertical difference. Perpendicular distances would produce a different “best” line (that’s orthogonal regression), which isn’t what standard school/exam regression uses. For a tidy mental model in exam sketches: pick a slope that makes the vertical deviations look roughly balanced in size above and below, and let the line go near (x̄, ȳ). If you want to compute its equation from your sketch, read off two convenient points on your drawn line (not necessarily data points) to get the slope, then adjust the intercept so it goes through (x̄, ȳ).

    If you’d like a proof-y peek at why least squares passes through (x̄, ȳ) and has residuals that sum to zero, this Khan Academy explainer is nice: https://www.khanacademy.org/math/ap-statistics/bivariate-data-ap/least-squares-regression/v/why-the-least-squares-regression-line-always-passes-through-the-mean-of-x-and-y

  3. Short answer: the exact least-squares regression line (with an intercept) always goes through the mean point (x̄, ȳ). If you’re drawing a by-eye “best-fit” line for an exam, you don’t have to force it exactly through that dot, but using (x̄, ȳ) as an anchor is a smart trick. Eyeball a sensible slope from the overall trend, then set the intercept to b = ȳ − m x̄ so your line goes through the mean. That’s exactly what you did, and it’s a solid method.

    On “balancing errors”: for least squares, it’s the sum of vertical residuals that equals zero, not “equal numbers of points above and below.” Counting points is a dodgy heuristic-one big outlier can outweigh five tiny residuals. Think in terms of vertical deviations (because you’re predicting y from x), not perpendicular distances. Perpendicular distances are for a different beast (orthogonal/Deming regression), not what school exams usually want. So the right mental model: draw a straight line through the middle of the cloud, roughly around (x̄, ȳ), with the total vertical “ups” and “downs” looking about balanced, and read the slope using two well-separated points on your drawn line.

    Hope this helps!

Leave a Reply

Your email address will not be published. Required fields are marked *

Join Our Community

Ready to make maths more enjoyable, accessible, and fun? Join a friendly community where you can explore puzzles, ask questions, track your progress, and learn at your own pace.

By becoming a member, you unlock:

  • Access to all community puzzles
  • The Forum for asking and answering questions
  • Your personal dashboard with points & achievements
  • A supportive space built for every level of learner
  • New features and updates as the Hub grows