Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Is Durbin–Wu–Hausman test a valid test when using generated regressor without any declaration of IVs?

    Dear All,

    Recently I have an exactly the same question with an old faq on Stata about Durbin–Wu–Hausman test.
    https://www.stata.com/support/faqs/s...-hausman-test/

    I just copy and paste the faq here.
    __________________________________________________ __________________________________________________ ________________________
    Before estimating the following simultaneous equations,
    z = a0 + a1*x1 + a2*x2 + epsilon1 (1) y = b0 + b1*z + b2*x3 + epsilon2 (2)
    one should decide whether it is necessary to use an instrumental variable, i.e., whether a set of estimates obtained by least squares is consistent or not.

    Davidson and MacKinnon (1993) suggest an augmented regression test (DWH test), which can easily be formed by including the residuals of each endogenous right-hand-side variable, as a function of all exogenous variables, in a regression of the original model. Back to our example, we would first perform a regression
    z = c0 + c1*x1 + c2*x2 + c3*x3 + epsilon3 (3)
    get residuals z_res, then perform an augmented regression:
    y = d0 + d1*z + d2*x3 + d3*z_res + epsilon4 (4)
    If d3 is significantly different from zero, then OLS is not consistent.
    __________________________________________________ __________________________________________________ ___________________

    My question is, the normal Durbin–Wu–Hausman needs a declaration of IV for z. In this case, it must be the x1 and x2. However, in my case, the z is a generated regressor and x1 x2 are a list of long variables even with tons of dummies, like equation (1).

    When I try to test the endogeneity of z in equation (2), do I need to prove x1x2 are all uncorrelated with epsilon2 (it is the definition of IV) or just do as the posted faq suggested?



    best,
    Zhaohui

  • #2
    I think you are asking relatively advanced question, and I have not come across an answer in the literature.

    My intuition is that if the conditions for estimating an IV regression consistently with a generated regressor are satisfied, then implementing the Durbin-Wu-Hausman residual based test would be correct too.

    These conditions are that your instruments X1 and X2 (and X3) have to be uncorrelated with the error epsilon2.

    Comment


    • #3
      The DWH test does not require you to specify an IV for the endogenous regressor. It imposes no restrictions on the reduced form of z. It’s based on the control function way of computing 2SLS. Under the null, the coefficient on z_res is zero and you don’t need to adjust for the two-step estimation.

      JW

      Comment


      • #4
        Originally posted by Jeff Wooldridge View Post
        The DWH test does not require you to specify an IV for the endogenous regressor. It imposes no restrictions on the reduced form of z. It’s based on the control function way of computing 2SLS. Under the null, the coefficient on z_res is zero and you don’t need to adjust for the two-step estimation.

        JW
        Dear Prof. Wooldridge:

        Thanks. I really appreciate your help. ^_^

        best,
        Zhaohui

        Comment


        • #5
          Originally posted by Joro Kolev View Post
          I think you are asking relatively advanced question, and I have not come across an answer in the literature.

          My intuition is that if the conditions for estimating an IV regression consistently with a generated regressor are satisfied, then implementing the Durbin-Wu-Hausman residual based test would be correct too.

          These conditions are that your instruments X1 and X2 (and X3) have to be uncorrelated with the error epsilon2.
          Dear Joro,
          Thanks for your reply. JW just got a good answer to this question below. Hopefully it helps.

          best,
          Zhaohui

          Comment


          • #6
            Dear Professor Jeff Wooldridge , of course we need to specify IVs for the (potentially) endogenous regressor. It we do not have excluded instruments in the first stage, the residuals from the first stage included in the structural equation will be perfectly collinear with the other included regressors. Here is an example, lets say Price is the dependent variable, MPG is the endogenous regressor, and Headroom is included exogenous regressor. So we are entertaining the system
            Price = b0 + b1 MPG + b2 Headroom + e
            MPG = g0 + g1 Headroom + v
            e and v correlated.

            Code:
            . sysuse auto, clear
            (1978 Automobile Data)
            
            . qui reg mpg head
            
            . predict double mpgre, resid
            
            
            . reg price headroom mpg mpgre
            note: headroom omitted because of collinearity
            
                  Source |       SS           df       MS      Number of obs   =        74
            -------------+----------------------------------   F(2, 71)        =     10.44
                   Model |   144280501         2  72140250.4   Prob > F        =    0.0001
                Residual |   490784895        71  6912463.32   R-squared       =    0.2272
            -------------+----------------------------------   Adj R-squared   =    0.2054
                   Total |   635065396        73  8699525.97   Root MSE        =    2629.2
            
            ------------------------------------------------------------------------------
                   price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                headroom |          0  (omitted)
                     mpg |  -141.0716   128.5347    -1.10   0.276    -397.3624    115.2192
                   mpgre |   -118.034     141.19    -0.84   0.406    -399.5589    163.4908
                   _cons |   9169.701    2754.45     3.33   0.001     3677.484    14661.92
            ------------------------------------------------------------------------------
            
            .
            So the regression dropped Headroom, because it is perfectly collinear with the residual of MPG.

            I believe that you picked up on the unfortunate semantics used by Original Poster, and wanted to make it clear that the test regression of the Durbin-Wu-Hausman test is an OLS regression, and not an IV regression, I guess this is what you mean when you said "The DWH test does not require you to specify an IV for the endogenous regressor".

            Also I do not think that you expressed your opinion on what (to me at least) this questions was all about: If z in Original Poster explanation, or MPG in my demonstration, is a Generated Regressor, does anything special happen? Is the DWH still valid, as I conjectured assuming that the instruments are valid? Is the standard error from the second stage still consistent given that z or MPG is a Generated Regressors?


            Originally posted by Jeff Wooldridge View Post
            The DWH test does not require you to specify an IV for the endogenous regressor. It imposes no restrictions on the reduced form of z. It’s based on the control function way of computing 2SLS. Under the null, the coefficient on z_res is zero and you don’t need to adjust for the two-step estimation.

            JW

            Comment


            • #7
              Joro: Because Zhaohui seemed fully aware that an exclusion restriction is needed to implement the test, I took the comment about "declaration of an iv" to mean that one has to say that x1 is the IV for z or x2 is the IV for z. Hence my comment about how the first stage of the CF approach does not impose any restrictions on the reduced form for z. 2SLS, under certain assumptions, picks out the optimal linear combination of x1, x2, and x3 -- ignoring the structure of the model. I did not mean to imply one doesn't need an IV; sorry that was unclear. Anyone who tries the CF approach without an IV quickly learns what you did.

              About the generated regressor issue: I have a long discussion of this in Chapter 6 of my MIT Press book. Under the null, no adjustment is needed. If the population coefficient on z_resid is not zero then an adjustment is needed, and so one might as well get the proper standard errors using a built-in 2SLS package. But the CF approach is very convenient for obtaining a test.

              Comment


              • #8
                Understood, so you confirm that z or MPG being a generated regressor does not change anything in the distribution theory under the null that the slope on the included residual is 0.

                Can I please ask you one more thing, in case you know the answer or can point me to relevant literature:

                Under standard assumptions, in my #6 say the errors in the two equations e and v being bivariate normal, is the t-statistic on the included residual we use for the Durbin-Wu-Hausman test also exactly t-distributed in finite samples?

                In other words if I am using a test regression Y = a + b*X + c*W + error, and the X is a generated regressor, is the t-statistic testing Ho: b=0 distributed as t in finite samples?

                Originally posted by Jeff Wooldridge View Post
                Joro: Because Zhaohui seemed fully aware that an exclusion restriction is needed to implement the test, I took the comment about "declaration of an iv" to mean that one has to say that x1 is the IV for z or x2 is the IV for z. Hence my comment about how the first stage of the CF approach does not impose any restrictions on the reduced form for z. 2SLS, under certain assumptions, picks out the optimal linear combination of x1, x2, and x3 -- ignoring the structure of the model. I did not mean to imply one doesn't need an IV; sorry that was unclear. Anyone who tries the CF approach without an IV quickly learns what you did.

                About the generated regressor issue: I have a long discussion of this in Chapter 6 of my MIT Press book. Under the null, no adjustment is needed. If the population coefficient on z_resid is not zero then an adjustment is needed, and so one might as well get the proper standard errors using a built-in 2SLS package. But the CF approach is very convenient for obtaining a test.

                Comment

                Working...
                X