Dear Stata users,
Does anyone know whether an endogenous variable in an OLS regression biases the coefficients of exogenous variables if there is no multicollinearity? I know that in the general case, a single endogenous variable can bias all coefficients, but I believe the bias caused in the coefficients of the exogenous variables is due to multicollinearity. Let y = a + bx + cz + u be the OLS regression model, with x an endogenous explanatory variable and z exogenous. Thus E(u x) != 0 and E(u z) = 0 by assumption. I believe I've been able to show with some messy algebra that the OLS estimate of c is unbiased even though the estimate of b is biased if the two assumptions above hold plus additional assumptions that x and z are uncorrelated in the sample (sum(x z) = 0) and the mean of x = 0 in the sample. My question is, is this correct? If it is correct, is there a reference such as a textbook or peer-reviewed journal article that I can cite as authority for this point? (I couldn't find this point in browsing through Wooldridge's advanced textbook on cross-section and panel data econometrics). If it is not correct, I would love to know why and if possible, have a reference that shows that. As I think about this some more, it seems that with the assumption that z is exogenous and uncorrelated with x, I could run the regression without including x and still get the correct answer, since there is no omitted variable bias from leaving out a variable that is uncorrelated with the included explanatory variables. That strengthens my conviction that my argument is correct, but still hope to find a reference for this.
John Pender
Does anyone know whether an endogenous variable in an OLS regression biases the coefficients of exogenous variables if there is no multicollinearity? I know that in the general case, a single endogenous variable can bias all coefficients, but I believe the bias caused in the coefficients of the exogenous variables is due to multicollinearity. Let y = a + bx + cz + u be the OLS regression model, with x an endogenous explanatory variable and z exogenous. Thus E(u x) != 0 and E(u z) = 0 by assumption. I believe I've been able to show with some messy algebra that the OLS estimate of c is unbiased even though the estimate of b is biased if the two assumptions above hold plus additional assumptions that x and z are uncorrelated in the sample (sum(x z) = 0) and the mean of x = 0 in the sample. My question is, is this correct? If it is correct, is there a reference such as a textbook or peer-reviewed journal article that I can cite as authority for this point? (I couldn't find this point in browsing through Wooldridge's advanced textbook on cross-section and panel data econometrics). If it is not correct, I would love to know why and if possible, have a reference that shows that. As I think about this some more, it seems that with the assumption that z is exogenous and uncorrelated with x, I could run the regression without including x and still get the correct answer, since there is no omitted variable bias from leaving out a variable that is uncorrelated with the included explanatory variables. That strengthens my conviction that my argument is correct, but still hope to find a reference for this.
John Pender