Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Testing for endogeneity and unobserved heterogeneity after xtprobit

    Dear Stata users

    I am currently using a random effects probit model via the xtprobit command in Stata to analyse export survival. I have lagged one of my key explanatory variables to address simultaneity issues. I am however unsure about how best to deal with endogeneity, specifically:
    • I initially followed Jeff Wooldridge's approach from the Stata forum (Engogeneity test after xtlogit - Statalist), testing for endogeneity using lead values alongside time averages in my regression. The results suggested that I can accept the hypothesis of strict exogeneity with respect to the idiosyncratic errors.
      • If the mean variables are significant in this test, does this already indicate that there is some form of unobserved heterogeneity?
    • I then applied the Mundlak approach by including time averages of covariates to check for correlation with u(i). The results indicated that some covariates may be correlated with u(i), leading me to the following questions:
      • Since xtprobit does not allow a fixed-effects specification in Stata, many studies on export survival instead introduce fixed-effect dummy variables (e.g., for duration, sector, destination, and year) instead of time averages. Would this be a valid approach to mitigate this form of endogeneity (unobserved heterogeneity), or should I include these dummies and re-run the Mundlak approach by including time-averages?
      • Should I simply keep the time averages in my final regression to account for this form of endogeneity? And if so, should I interpret the mean values in my thesis as this is only a test for endogeneity or can I simply indicate that I included time averages to account for unobserved heterogeneity)? If I should interpret the mean variables, I have noticed that the signs of the time-average variables often differ from the original covariates (e.g., the mean of GDP per capita is negative, while the original variable is positive). How should this then be interpreted?
    Any advice of guidance would be much appreciated!
    Last edited by Mariska Aucamp; 21 Apr 2025, 15:23.

  • #2
    Depending on the size of your dataset, the number of cross-sectional units and the number of time periods, including dummies in your probit model could cause the incidental parameters problem (Lancaster, 2000) in probit, because probit, unlike logit, does not have a sufficient statistic for the unit fixed effects.

    You may want to include these fixed effect vectors in linear probability models or logit models as well, and see how your results change. The Mundlak approach does allow to soak up, partially, some of the correlation between the regressor and the error term.

    Comment

    Working...
    X