Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Issues on estimating Heckman (rho = 1) and two part model (non-converging)

    Dear Statalist users,

    On advise of my supervisor I have looked into the Heckman method for analyzing my data. I'm estimating what determinants influence the decision (of donor countries) to give aid. The panel dataset contains information on bilateral aid between 20 donor and 189 recipient countries for the period 1970-2015. Approximately 55% of the values are zeros, being 'true zeros' (no missing values), for y>0 the values are continuous. Having read quite some forum questions and literature on the Heckman/two part/ Tobit topic I still have some unclear issues.

    The regression I roughly estimate is:
    Code:
    LNODA_donorx_lead = LNRGDPPC LNRGDPPC^2 LNPOP LNPOP^2 LNCOLONY_donorx FRIEND_donorx Distance_donorx DUMLanguage_donorx FREEPOL i.country, vce(robust)
    (RGDPPC is real GDP per capita, POP is population and FREEPOL is a democracy index)

    My line of reasoning for analyzing this data is as follows:
    1. In the decision for a donor country to give aid two stages may be applicable; e.g. a selection stage in which a country decides to give aid (yes/no) and a response stage in which a country decides on how much aid to give (conditional on a 'yes' in the selection stage).
    2. A Tobit estimation does not allow for different mechanisms to influence the two stages.
    3. A Heckman model is suited for selection problems, not corner solution problems.
    4. The Exponential type II Tobit model of Wooldridge* seems appropriate, this model is estimated by a Heckman estimation with the dependent variable in logs.
    5. When I estimate this model the results show either an insignificant lambda or a rho of 1 or -1.
    Code:
    heckman LNODAUSA_lead LNRGDPPC16 LNRGDPPC16SQ LNPOP LNPOPSQ FREEPOL MILFRUSA DUMISR DUMEGY YRSWARnew if YEAR > 1965, twostep select(DUMODAUSA_lead = LNRGDPPC16 LNRGDPPC16SQ LNPOP LNPOPSQ FREEPOL LNCOLS LNCOLSUSA MILFRUSA DUMISR DUMEGY YRSWARnew DistUSA LangUSA)
    6. Following the manual on the Heckman command and the FAQ on the 'rho in the Heckman estimator' a rho of 1 is problematic (the assumptions are probably violated) (https://www.stata.com/support/faqs/s...man-estimator/).

    A. Do I conclude correctly that the Heckman model is thus not suitable for my data?

    7. Afterwards I estimated the two part model with the twopm command (ssc install twopm).
    Code:
    twopm LNODAUSA_lead LNRGDPPC16 LNRGDPPC16SQ LNPOP LNPOPSQ FREEPOL LNCOLSUSA LNCOLS MILFRUSA DUMISR DUMEGY i.COUNTRYNR if YEAR > 1965, firstpart (logit) secondpart (glm) vce(robust)
    8. When adding i.panel (for recipient countries) the model cannot converge (e.g. iteration not concave).
    9. To assess this problem I estimated the probit/logit estimation separately. It appears that 2/3 of the country dummies 'predict success perfectly'.
    10. I learned that the maximum likelihood causes this inability to converge as the estimator (for the perfect predicted variables) becomes infinitely large (https://www.statalist.org/forums/for...cess-perfectly).
    11. This problem does not arise when estimating the two part model without the i.country variable.
    12. From my statistics class I know, however, that I should add country fixed effects when I have panel data.
    13. From the information on two part models I’m aware of, I suspect that there is not a solution to solve the non-convergence problem as these models are always estimated by a maximum likelihood estimator.

    B. Are there any solutions for the non-convergence issue in the two part model?
    C. If there is no solution for the convergence issue I suspect that a one-stage (Tobit) estimation with fixed effects is better than a two part estimation without fixed effects, would you agree?
    D. Do you have any other recommendations?

    Thank you in advance for any advice!



    *Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data. Cambridge, Massachusetts: MIT Press.



    Dear Statalist I am currently using Stata 12. I have a set of dichotomous variables that I'm using to predict a categorical outcome in logistic regression.

  • #2
    You didn't get a quick answer. You'll increase your chances of a useful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex. Obviously, what you have in the first code block is not what you ran. Knowing exactly what you're doing is essential to us helping you.

    You also need to shorten your posting. Few of us are going to work through all this stuff to try to figure out what you're trying to do.

    When you use a less-commonly-used user-written command like twopart, you have several ways to do. First, simplify, get a model that works, then build up. That may help you understand what is causing the problems. Second, try to understand the ado file. Third, ask the authors.

    If country predicts the outcome perfectly for 2/3 of your data, then you have a real problem. However, if it is recipient country you're using dummies for, that makes sense. There are zero-inflated models but there has been substantial discussion on this list serve about their utility. There is also no consistent fe tobit estimator. Folks often use dummies for panels.

    Comment

    Working...
    X