Is Durbin–Wu–Hausman test a valid test when using generated regressor without any declaration of IVs?

Zhaohui Li

Join Date: Oct 2016

Posts: 15
#1

Is Durbin–Wu–Hausman test a valid test when using generated regressor without any declaration of IVs?

15 Aug 2020, 03:15

Dear All,

Recently I have an exactly the same question with an old faq on Stata about Durbin–Wu–Hausman test.
https://www.stata.com/support/faqs/s...-hausman-test/

I just copy and paste the faq here.
__________________________________________________ __________________________________________________ ________________________
Before estimating the following simultaneous equations,
z = a0 + a1*x1 + a2*x2 + epsilon1 (1) y = b0 + b1*z + b2*x3 + epsilon2 (2)
one should decide whether it is necessary to use an instrumental variable, i.e., whether a set of estimates obtained by least squares is consistent or not.

Davidson and MacKinnon (1993) suggest an augmented regression test (DWH test), which can easily be formed by including the residuals of each endogenous right-hand-side variable, as a function of all exogenous variables, in a regression of the original model. Back to our example, we would first perform a regression
z = c0 + c1*x1 + c2*x2 + c3*x3 + epsilon3 (3)
get residuals z_res, then perform an augmented regression:
y = d0 + d1*z + d2*x3 + d3*z_res + epsilon4 (4)
If d3 is significantly different from zero, then OLS is not consistent.
__________________________________________________ __________________________________________________ ___________________

My question is, the normal Durbin–Wu–Hausman needs a declaration of IV for z. In this case, it must be the x1 and x2. However, in my case, the z is a generated regressor and x1 x2 are a list of long variables even with tons of dummies, like equation (1).

When I try to test the endogeneity of z in equation (2), do I need to prove x1x2 are all uncorrelated with epsilon2 (it is the definition of IV) or just do as the posted faq suggested?

best,
Zhaohui
Tags: Durbin-Wu-Hausman, endogeneity, IV, ivreg
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#2

15 Aug 2020, 04:27

I think you are asking relatively advanced question, and I have not come across an answer in the literature.

My intuition is that if the conditions for estimating an IV regression consistently with a generated regressor are satisfied, then implementing the Durbin-Wu-Hausman residual based test would be correct too.

These conditions are that your instruments X1 and X2 (and X3) have to be uncorrelated with the error epsilon2.
1 like
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2167
#3

17 Aug 2020, 21:15

The DWH test does not require you to specify an IV for the endogenous regressor. It imposes no restrictions on the reduced form of z. It’s based on the control function way of computing 2SLS. Under the null, the coefficient on z_res is zero and you don’t need to adjust for the two-step estimation.

JW
1 like
Comment
Zhaohui Li

Join Date: Oct 2016

Posts: 15
#4

18 Aug 2020, 18:15

Originally posted by Jeff Wooldridge View Post

The DWH test does not require you to specify an IV for the endogenous regressor. It imposes no restrictions on the reduced form of z. It’s based on the control function way of computing 2SLS. Under the null, the coefficient on z_res is zero and you don’t need to adjust for the two-step estimation.

JW

Dear Prof. Wooldridge:

Thanks. I really appreciate your help. ^_^

best,
Zhaohui
Comment
Zhaohui Li

Join Date: Oct 2016

Posts: 15
#5

18 Aug 2020, 19:16

Originally posted by Joro Kolev View Post

I think you are asking relatively advanced question, and I have not come across an answer in the literature.

My intuition is that if the conditions for estimating an IV regression consistently with a generated regressor are satisfied, then implementing the Durbin-Wu-Hausman residual based test would be correct too.

These conditions are that your instruments X1 and X2 (and X3) have to be uncorrelated with the error epsilon2.

Dear Joro,
Thanks for your reply. JW just got a good answer to this question below. Hopefully it helps.

best,
Zhaohui
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#6

19 Aug 2020, 01:19

Dear Professor Jeff Wooldridge , of course we need to specify IVs for the (potentially) endogenous regressor. It we do not have excluded instruments in the first stage, the residuals from the first stage included in the structural equation will be perfectly collinear with the other included regressors. Here is an example, lets say Price is the dependent variable, MPG is the endogenous regressor, and Headroom is included exogenous regressor. So we are entertaining the system
Price = b0 + b1 MPG + b2 Headroom + e
MPG = g0 + g1 Headroom + v
e and v correlated.

Code:

. sysuse auto, clear (1978 Automobile Data) . qui reg mpg head . predict double mpgre, resid . reg price headroom mpg mpgre note: headroom omitted because of collinearity Source | SS df MS Number of obs = 74 -------------+---------------------------------- F(2, 71) = 10.44 Model | 144280501 2 72140250.4 Prob > F = 0.0001 Residual | 490784895 71 6912463.32 R-squared = 0.2272 -------------+---------------------------------- Adj R-squared = 0.2054 Total | 635065396 73 8699525.97 Root MSE = 2629.2 ------------------------------------------------------------------------------ price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- headroom | 0 (omitted) mpg | -141.0716 128.5347 -1.10 0.276 -397.3624 115.2192 mpgre | -118.034 141.19 -0.84 0.406 -399.5589 163.4908 _cons | 9169.701 2754.45 3.33 0.001 3677.484 14661.92 ------------------------------------------------------------------------------ .

So the regression dropped Headroom, because it is perfectly collinear with the residual of MPG.

I believe that you picked up on the unfortunate semantics used by Original Poster, and wanted to make it clear that the test regression of the Durbin-Wu-Hausman test is an OLS regression, and not an IV regression, I guess this is what you mean when you said "The DWH test does not require you to specify an IV for the endogenous regressor".

Also I do not think that you expressed your opinion on what (to me at least) this questions was all about: If z in Original Poster explanation, or MPG in my demonstration, is a Generated Regressor, does anything special happen? Is the DWH still valid, as I conjectured assuming that the instruments are valid? Is the standard error from the second stage still consistent given that z or MPG is a Generated Regressors?

Originally posted by Jeff Wooldridge View Post

The DWH test does not require you to specify an IV for the endogenous regressor. It imposes no restrictions on the reduced form of z. It’s based on the control function way of computing 2SLS. Under the null, the coefficient on z_res is zero and you don’t need to adjust for the two-step estimation.

JW
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2167
#7

19 Aug 2020, 05:15

Joro: Because Zhaohui seemed fully aware that an exclusion restriction is needed to implement the test, I took the comment about "declaration of an iv" to mean that one has to say that x1 is the IV for z or x2 is the IV for z. Hence my comment about how the first stage of the CF approach does not impose any restrictions on the reduced form for z. 2SLS, under certain assumptions, picks out the optimal linear combination of x1, x2, and x3 -- ignoring the structure of the model. I did not mean to imply one doesn't need an IV; sorry that was unclear. Anyone who tries the CF approach without an IV quickly learns what you did.

About the generated regressor issue: I have a long discussion of this in Chapter 6 of my MIT Press book. Under the null, no adjustment is needed. If the population coefficient on z_resid is not zero then an adjustment is needed, and so one might as well get the proper standard errors using a built-in 2SLS package. But the CF approach is very convenient for obtaining a test.
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#8

20 Aug 2020, 06:40

Understood, so you confirm that z or MPG being a generated regressor does not change anything in the distribution theory under the null that the slope on the included residual is 0.

Can I please ask you one more thing, in case you know the answer or can point me to relevant literature:

Under standard assumptions, in my #6 say the errors in the two equations e and v being bivariate normal, is the t-statistic on the included residual we use for the Durbin-Wu-Hausman test also exactly t-distributed in finite samples?

In other words if I am using a test regression Y = a + b*X + c*W + error, and the X is a generated regressor, is the t-statistic testing Ho: b=0 distributed as t in finite samples?

Originally posted by Jeff Wooldridge View Post

Joro: Because Zhaohui seemed fully aware that an exclusion restriction is needed to implement the test, I took the comment about "declaration of an iv" to mean that one has to say that x1 is the IV for z or x2 is the IV for z. Hence my comment about how the first stage of the CF approach does not impose any restrictions on the reduced form for z. 2SLS, under certain assumptions, picks out the optimal linear combination of x1, x2, and x3 -- ignoring the structure of the model. I did not mean to imply one doesn't need an IV; sorry that was unclear. Anyone who tries the CF approach without an IV quickly learns what you did.

About the generated regressor issue: I have a long discussion of this in Chapter 6 of my MIT Press book. Under the null, no adjustment is needed. If the population coefficient on z_resid is not zero then an adjustment is needed, and so one might as well get the proper standard errors using a built-in 2SLS package. But the CF approach is very convenient for obtaining a test.
Comment

Announcement

Is Durbin–Wu–Hausman test a valid test when using generated regressor without any declaration of IVs?

Comment

Comment

Comment

Comment

Comment

Comment

Comment