Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Orthogonalizing Two Variables in a Linear Regression

    Hi Everyone,

    I estimate a linear model with OLS. I have two independent variables, x1 and x2. When I run regress y x1, I find that x1 is significant. When I run regress y x1 x2, x1 loses statistical significance as I expect since I argue that x2 matters for the changes in y not x1. Althoug the correlation between x1 and x2 is -0.25, (so there is no multicollinearity problem), I was suggested to orthogonalize two variables to make sure that actually x2 significantly explains y, not x1. When I orthognalize x2 and x1 (orthog x2 x1, gen (newx2 newx1)), t values remain almost same. When I orthognalize x1 and x2 (orthog x1 x2, gen (newx1_alt newx2_alt)), the coefficient on x1 becomes significant again. Can anyone help me how interpret these results? Should I use orthog here and if so, which order I should rely on? I read the help file but couldn't understand which one fits into my case better.

    Thanks in advance.

    Best,
    Ulas

  • #2
    Ulas, I don't see why the so-called "orthogonalization" is valid for your case. Regressing y on x1 and x2, by definition, reports the partial effects of x1 and x2, the other variable being constant. If you find x1's coefficient is insignificant in the multivariate regression, that would be the final proof for a weak partial effect of x1 on y. If you orthogonalize x1 and x2, then they become two different variables, and I don't see why we can draw conclusions about partial effects of x1 and x2 based on results from a different pair of variables, not mentioning ordering matters for generating the two new variables. If you have a bunch of similar variables measuring a same concept, I would recommend extracting key information from the original list of variables with factor analysis -- at least all original variables are "equal" and results are not subject to ordering.

    Comment


    • #3
      Thank you very much, Fei Wang ,

      I also think similarly but can not understand when we should use orthog command then? Should I use it when the correlation between x1 and x2 is too high?

      Comment


      • #4
        Ulas, frankly, I don't even understand why this method was invented to solve the "problem" of multi-collinearity. If x1 and x2 are alternative measures of a same thing, I would control for x1 OR x2 to check the robustness of results. If x1 and x2 are from a set of subjective questions about a same thing, as in many questionnaires of psychology and management, I would run a factor analysis. If x1 and x2 are both key regressors and happen to be highly correlated, it's harmless to control for both -- If both partial effects are statistically insignificant, then that would be the results, nothing wrong (besides, we may further test the joint significance of the two variables, and results may differ.).

        Comment


        • #5
          One potentially useful application of orthog is when you want to allow for non-linearity by adding transformations of the same variable, for example splines and especially cubic splines can be highly correlated. Using orthogonal versions of these variables can help the numerical stability, and since you should not interpret the individual coefficients in isolation anyhow, you don't loose anything. In that sense it is very similar in spirit to orthpoly, which is documented in the same help-file as orthog.
          Last edited by Maarten Buis; 08 Nov 2021, 01:51.
          ---------------------------------
          Maarten L. Buis
          University of Konstanz
          Department of history and sociology
          box 40
          78457 Konstanz
          Germany
          http://www.maartenbuis.nl
          ---------------------------------

          Comment


          • #6
            Thanks for both answers, they helped me a lot

            Comment

            Working...
            X