Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why does R-Squared increase when I add a constraint?

    I am performing simple regressions (regress ylist xlist) and (regress ylist xlist, noconstant). In some cases, where I have added a constraint (in this case, the constraint is "noconstant"), my R-squared value increases in the second case, for the regression that is constrained. This struck me as unexpected. The data is unchanged between the two operations. Can anyone explain why this might be occurring or whether I might have done something wrong?

  • #2
    This is a statistical FAQ. I am confident that it has been asked here on Statalist but in any event there are discussions in many places. See e.g. https://stats.stackexchange.com/ques...2-in-linear-mo

    The data may be the same but you've changed the game and the rules are different. When you force the regression through the origin then R-sq is calculated with benchmark 0. You're saying

    (1) How much better is my regression at predicting, compared with using a prediction of 0 for the outcome?

    and no longer saying

    (2) How much better is my regression at predicting, compared with using a prediction of the mean of the outcome?

    Most often (1) gives a higher R-sq but is nevertheless not what researchers want to know. Rarely is zero a default prediction.

    If you want the definition of R-sq that is the square of the correlation between observed and predicted, you can calculate that directly as just that, the square of that correlation. More at http://www.stata.com/support/faqs/statistics/r-squared/

    I have seen cases where on physical (economic, biological, ...) grounds forcing a line through the origin makes much sense as an alternative model but doing that for a hyperplane is usually a distortion too far. If you're doing that on other grounds, e.g. that a negative intercept is hard to think about, it's likely that you need a different functional form. (Doing it because you like the higher R-sq is not defensible, because that's just an artefact.)

    Note: https://www.youtube.com/watch?v=tlHUwIl_w1A doesn't help (and the author can't spell Stata correctly in any case).

    Comment

    Working...
    X