Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Predicting y after having orthogonalized x's

    Dear all,

    I am using Stata 15. I orthogonalized two variables that were highly correlated using the following command:
    Code:
    orthog x1 x2, generate (orthx1 orthx2)
    Then, I run a multiple regression adding other variables to the ones just orthogonalized.
    Code:
    regress y orthx1 orthx2 x3 x4
    Now I am trying the predict the y using some predefined values for the x's. I have some values for x1 and x2 that I would like to use in order to predict the y but they are not in the orthogonal unit of measurement. How can I convert the values of x1 and x2 in the orthogonal version so that I can use the coefficients obtained from the regression output to predict the y? I tried to google it but I was not able to find any results.

    Thank you very much for your help and time.

  • #2
    The simplest way I can think of to do this is as follows. Start from the state produced by the code you show in #1.

    1. Append the new values of x1 and x2 to your data set.

    2. Drop the orthogonalized variables and rerun your -orthog- command, specifying the same names for the orthogonalized variables you used originally, so that the corresponding values of the orthogonalized variables get re-calculated, but this time you will get them for the new values of x1 and x2 as well.

    3. Run -predict yhat, xb-.

    Comment


    • #3
      Thank you Clyde for your answer.

      If I am not wrong, what you are saying is to add a new observation to x1 and x2 with the values that I want to use to predict y and then run the -orthog- command on x1 and x2 so that I can find the orthogonalized values of interest in the variables orthx1 and orthx2.

      If what I just wrote is right, I wonder if I can affect the orthogonalization of x1 and x2 by adding the observation of interest, especially if I add many observations at a time to x1 and x2 and then I try to orthogonalize them all together.

      Comment


      • #4
        Ah, yes, you are right. When you add the new values of x1 and x2, you change the way the variables get orthogonalized. I should have realized that.

        OK, so I think you have to bite the bullet and do some matrix algebra. When you run the -orthog- command, specify the matrix() option so you can save the transformation matrix. Actually, what gets saved is the inverse-transformation matrix. But as long as the original variables are not linearly dependent, the result will be invertible.

        So something along the lines of
        Code:
        orthog x1 x2, gen(ortho_x1 ortho_x2) matrix(R)
        regress y ortho_x1 ortho_x2 x3 x4 // etc.
        Now append the new values of x1 and x2 to your data set as new observations. Make a matrix out of the additional values of x1 and x2. Multiply that matrix by R-1 on the right to get a new matrix with the values of ortho_x1 and ortho_x2 corresponding to the additional x's. Then save those matrix values back into ortho_x1 and ortho_x2. Run -predict-. You may have to do a little bit of juggling to accomplish these steps, but they should do the trick.

        Comment


        • #5
          Out of curiosity, why do you need to orthogonalize x1 and x2? That shouldn't matter in this case -- it merely changes the coefficients on the variables you're orthogonalizing beforehand without affecting R2. Since you are basically just rescaling those coefficients and have no interest in the rescaled units, you might as well just regress y on x1-x4. Am I missing something?
          Last edited by Michael Droste; 18 Jan 2018, 16:14.

          Comment


          • #6
            I was orthogonalizing x1 and x2 because the two variables were kind of correlated and I wanted to isolate the effect of one variable from the effect of the other.

            Comment


            • #7
              But orthogonalizing x1 and x2 does not isolate the effect of one variable from the effect of the other. It creates new variables whose effects are isolated from each other, but they are not x1 and x2, they are different variables. I suppose you get part of what you want because the first orthogonalized variable will be a standardized version of x1 (and perhaps that is all you need), but the second orthogonalized variable does not really represent either x1 or x2 alone in any simple way, and the effect of x2 can't be discerned from a regression using it unless you do a whole bunch of matrix algebra.

              Comment

              Working...
              X