Dear Stata users
I am currently using a random effects probit model via the xtprobit command in Stata to analyse export survival. I have lagged one of my key explanatory variables to address simultaneity issues. I am however unsure about how best to deal with endogeneity, specifically:
I am currently using a random effects probit model via the xtprobit command in Stata to analyse export survival. I have lagged one of my key explanatory variables to address simultaneity issues. I am however unsure about how best to deal with endogeneity, specifically:
- I initially followed Jeff Wooldridge's approach from the Stata forum (Engogeneity test after xtlogit - Statalist), testing for endogeneity using lead values alongside time averages in my regression. The results suggested that I can accept the hypothesis of strict exogeneity with respect to the idiosyncratic errors.
- If the mean variables are significant in this test, does this already indicate that there is some form of unobserved heterogeneity?
- I then applied the Mundlak approach by including time averages of covariates to check for correlation with u(i). The results indicated that some covariates may be correlated with u(i), leading me to the following questions:
- Since xtprobit does not allow a fixed-effects specification in Stata, many studies on export survival instead introduce fixed-effect dummy variables (e.g., for duration, sector, destination, and year) instead of time averages. Would this be a valid approach to mitigate this form of endogeneity (unobserved heterogeneity), or should I include these dummies and re-run the Mundlak approach by including time-averages?
- Should I simply keep the time averages in my final regression to account for this form of endogeneity? And if so, should I interpret the mean values in my thesis as this is only a test for endogeneity or can I simply indicate that I included time averages to account for unobserved heterogeneity)? If I should interpret the mean variables, I have noticed that the signs of the time-average variables often differ from the original covariates (e.g., the mean of GDP per capita is negative, while the original variable is positive). How should this then be interpreted?
Comment