Dear Statalist users,
On advise of my supervisor I have looked into the Heckman method for analyzing my data. I'm estimating what determinants influence the decision (of donor countries) to give aid. The panel dataset contains information on bilateral aid between 20 donor and 189 recipient countries for the period 1970-2015. Approximately 55% of the values are zeros, being 'true zeros' (no missing values), for y>0 the values are continuous. Having read quite some forum questions and literature on the Heckman/two part/ Tobit topic I still have some unclear issues.
The regression I roughly estimate is:
(RGDPPC is real GDP per capita, POP is population and FREEPOL is a democracy index)
My line of reasoning for analyzing this data is as follows:
1. In the decision for a donor country to give aid two stages may be applicable; e.g. a selection stage in which a country decides to give aid (yes/no) and a response stage in which a country decides on how much aid to give (conditional on a 'yes' in the selection stage).
2. A Tobit estimation does not allow for different mechanisms to influence the two stages.
3. A Heckman model is suited for selection problems, not corner solution problems.
4. The Exponential type II Tobit model of Wooldridge* seems appropriate, this model is estimated by a Heckman estimation with the dependent variable in logs.
5. When I estimate this model the results show either an insignificant lambda or a rho of 1 or -1.
6. Following the manual on the Heckman command and the FAQ on the 'rho in the Heckman estimator' a rho of 1 is problematic (the assumptions are probably violated) (https://www.stata.com/support/faqs/s...man-estimator/).
A. Do I conclude correctly that the Heckman model is thus not suitable for my data?
7. Afterwards I estimated the two part model with the twopm command (ssc install twopm).
8. When adding i.panel (for recipient countries) the model cannot converge (e.g. iteration not concave).
9. To assess this problem I estimated the probit/logit estimation separately. It appears that 2/3 of the country dummies 'predict success perfectly'.
10. I learned that the maximum likelihood causes this inability to converge as the estimator (for the perfect predicted variables) becomes infinitely large (https://www.statalist.org/forums/for...cess-perfectly).
11. This problem does not arise when estimating the two part model without the i.country variable.
12. From my statistics class I know, however, that I should add country fixed effects when I have panel data.
13. From the information on two part models I’m aware of, I suspect that there is not a solution to solve the non-convergence problem as these models are always estimated by a maximum likelihood estimator.
B. Are there any solutions for the non-convergence issue in the two part model?
C. If there is no solution for the convergence issue I suspect that a one-stage (Tobit) estimation with fixed effects is better than a two part estimation without fixed effects, would you agree?
D. Do you have any other recommendations?
Thank you in advance for any advice!
*Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data. Cambridge, Massachusetts: MIT Press.
On advise of my supervisor I have looked into the Heckman method for analyzing my data. I'm estimating what determinants influence the decision (of donor countries) to give aid. The panel dataset contains information on bilateral aid between 20 donor and 189 recipient countries for the period 1970-2015. Approximately 55% of the values are zeros, being 'true zeros' (no missing values), for y>0 the values are continuous. Having read quite some forum questions and literature on the Heckman/two part/ Tobit topic I still have some unclear issues.
The regression I roughly estimate is:
Code:
LNODA_donorx_lead = LNRGDPPC LNRGDPPC^2 LNPOP LNPOP^2 LNCOLONY_donorx FRIEND_donorx Distance_donorx DUMLanguage_donorx FREEPOL i.country, vce(robust)
My line of reasoning for analyzing this data is as follows:
1. In the decision for a donor country to give aid two stages may be applicable; e.g. a selection stage in which a country decides to give aid (yes/no) and a response stage in which a country decides on how much aid to give (conditional on a 'yes' in the selection stage).
2. A Tobit estimation does not allow for different mechanisms to influence the two stages.
3. A Heckman model is suited for selection problems, not corner solution problems.
4. The Exponential type II Tobit model of Wooldridge* seems appropriate, this model is estimated by a Heckman estimation with the dependent variable in logs.
5. When I estimate this model the results show either an insignificant lambda or a rho of 1 or -1.
Code:
heckman LNODAUSA_lead LNRGDPPC16 LNRGDPPC16SQ LNPOP LNPOPSQ FREEPOL MILFRUSA DUMISR DUMEGY YRSWARnew if YEAR > 1965, twostep select(DUMODAUSA_lead = LNRGDPPC16 LNRGDPPC16SQ LNPOP LNPOPSQ FREEPOL LNCOLS LNCOLSUSA MILFRUSA DUMISR DUMEGY YRSWARnew DistUSA LangUSA)
A. Do I conclude correctly that the Heckman model is thus not suitable for my data?
7. Afterwards I estimated the two part model with the twopm command (ssc install twopm).
Code:
twopm LNODAUSA_lead LNRGDPPC16 LNRGDPPC16SQ LNPOP LNPOPSQ FREEPOL LNCOLSUSA LNCOLS MILFRUSA DUMISR DUMEGY i.COUNTRYNR if YEAR > 1965, firstpart (logit) secondpart (glm) vce(robust)
9. To assess this problem I estimated the probit/logit estimation separately. It appears that 2/3 of the country dummies 'predict success perfectly'.
10. I learned that the maximum likelihood causes this inability to converge as the estimator (for the perfect predicted variables) becomes infinitely large (https://www.statalist.org/forums/for...cess-perfectly).
11. This problem does not arise when estimating the two part model without the i.country variable.
12. From my statistics class I know, however, that I should add country fixed effects when I have panel data.
13. From the information on two part models I’m aware of, I suspect that there is not a solution to solve the non-convergence problem as these models are always estimated by a maximum likelihood estimator.
B. Are there any solutions for the non-convergence issue in the two part model?
C. If there is no solution for the convergence issue I suspect that a one-stage (Tobit) estimation with fixed effects is better than a two part estimation without fixed effects, would you agree?
D. Do you have any other recommendations?
Thank you in advance for any advice!
*Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data. Cambridge, Massachusetts: MIT Press.
Comment