Fei Men wrote me privately:
I am very reluctant to trust hetprob, as its results are very sensitive to the correct specification of both the heteroscedastic part of the model and the main part of the model. There is no way in which we can directly see the errorterm, instead the heteroscedasticity manifests itself in making linear effects (slightly) non-linear. This is what is used to identify the heteroscedasticity in hetprob. However, if the effect wasn't linear to begin with, then hetprobit will incorrectly assume that that deviation from linearity is due to heteroscedasticity and "adjusts" all effects with that incorrect estimate for the residual error term. Similarly an incorrect specification of the heteroscadastic part will result noticably biased results. Below is a simulation that illustrates this point:
So you need some very extensive model checking before you can believe the results from hetprobit.
I have had a very painstaking dilemma when researching on the effect of divorce (dichotomous variable) on mothers' food security (dichotomous). [...]
Long story short, the heteroscedastic probit (Stata -hetprob-) model got me a small and non-significant effect of divorce with significant lnsigma2 for the divorce dummy while the homoscedastic probit, propensity score matching (PSM), and instrumental variable (IV) model have all got me a fairly large and highly significant divorce coefficient. Baseline characteristics such as income and homeownership are controlled in all models. Results are consistent across a variety of specifications in both heteroscedastic and homoscedastic models.
Given the contrasting results across different residual variance assumptions, I was wondering which story I should put more faith in, especially when PSM and IV approaches got me significant divorce effect. Is there a way to correct for heteroscedasticity in PSM and IV models?
Long story short, the heteroscedastic probit (Stata -hetprob-) model got me a small and non-significant effect of divorce with significant lnsigma2 for the divorce dummy while the homoscedastic probit, propensity score matching (PSM), and instrumental variable (IV) model have all got me a fairly large and highly significant divorce coefficient. Baseline characteristics such as income and homeownership are controlled in all models. Results are consistent across a variety of specifications in both heteroscedastic and homoscedastic models.
Given the contrasting results across different residual variance assumptions, I was wondering which story I should put more faith in, especially when PSM and IV approaches got me significant divorce effect. Is there a way to correct for heteroscedasticity in PSM and IV models?
Code:
. clear all . set seed 123456 . . program define sim, rclass 1. drop _all 2. set obs 1000 3. gen x = rnormal() 4. . // hetprobit is correctly specified . gen ystar1 = 1 + x + rnormal(0,exp(.5*x)) 5. gen byte y1 = ystar1 > 0 6. hetprob y1 x, het(x) 7. return scalar b1 = _b[x] 8. . // heteroscedasticity is incorrectly specified . gen ystar2 = 1 + x + rnormal(0, exp(.5*x + .25*x^2)) 9. gen byte y2 = ystar2 > 0 10. hetprob y2 x, het(x) 11. return scalar b2 = _b[x] 12. . // no heteroscedasticity, but incorretly specified x . gen ystar3 = 1 + x + .5*x^2 + rnormal() 13. gen byte y3 = ystar3 > 0 14. hetprobit y3 x, het(x) 15. return scalar b3 = _b[x] 16. . probit y3 c.x##c.x 17. return scalar b4 = _b[x] 18. end . simulate b1=r(b1) b2=r(b2) b3=r(b3) b4=r(b4), reps(2000) nodots : sim command: sim b1: r(b1) b2: r(b2) b3: r(b3) b4: r(b4) . sum Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- b1 | 2,000 1.005738 .0590741 .8329385 1.270037 b2 | 2,000 .6337518 .0690086 .4348118 .8676383 b3 | 2,000 .0000536 .0019632 -3.15e-06 .0851099 b4 | 2,000 1.020313 .1348678 .6841113 1.535614 . simsum b*, true(1) mcse bias dropbig Warning: found 2 observations with standardised b3 > 10 +----------+ | b3 | |----------| 41. | .0215943 | 1086. | .0851099 | +----------+ --> b3 have been changed to missing values for these observations Starting to process results ... +------------------------------------------------------------------------------------------------------------------+ | Performance measure r(b1) (MCse) r(b2) (MCse) r(b3) (MCse) r(b4) (MCse) | |------------------------------------------------------------------------------------------------------------------| | Bias in point estimate .0057378 .0013209 -.3662483 .0015431 -.9999997 4.35e-08 .0203125 .0030157 | +------------------------------------------------------------------------------------------------------------------+