Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • perfect prediction in -heckman, select()-

    Do heckman / heckprob attempt to identify perfect prediction in the selection equation? From the output, it does not seem like they do; logit or probit output would have said something like "blah predicts success perfectly; it is dropped and this many observations not used". However heckman does not say that.

    I think I am running into the issue with that, as Heckman model fails to converge (without the -difficult- option) or produces coefficients like 5 with a standard error of zero for a dummy variable in the selection equation. As normal(-5) is about the same as c(epsfloat), I suspect maximization just sends that parameter to a large enough value for the likelihood not to change... rather than attempting to remove it the way logit or probit do.
    -- Stas Kolenikov || http://stas.kolenikov.name
    -- Principal Survey Scientist, Abt SRBI
    -- Opinions stated in this post are mine only


  • #2
    Whle the discussion in the Techical Note section of the heckman documentation in the Stata Reference Manual does not directly address your problem, the following quote is suggestive.

    The Heckman selection model can be unstable when the model is not properly specified or if a specific dataset simply does not support the model’s assumptions.
    The discussion continues with a simulation example using data generated according to a heckman process, the heckman estimation fails to properly converge.

    Comment


    • #3
      If I may hijack this thread, I have a very similar question to that posed by Stas above.

      I am using heckprobit. In the first stage (selection equation), I have a categorical variable which perfectly predicts the outcome. This variable is retained in the model, and I obtain coefficients and standard errors. However, when I estimate the selection equation with probit, the categorical variable is (as expected) omitted due to perfect prediction. Why do heckman and heckprobit retain perfect predictors in the first stage?

      Interestingly, this thread over at the old Stata forum asks the same question. Indeed, STB-43 here also states:

      (STB-43) heckman
      heckman now has the capability to estimate models with variables that perfectly predict selection. Previously heckman would simply drop such variables from the selection equation, which is inappropriate in most cases.
      Why is dropping variables which perfectly predict selection inappropriate? What is going on behind the scenes?

      Example:

      In my own research, I am examining the success of pharmaceutical drugs. My selection equation involves regressing IntoDevelopment onto DevelopmentStatus. By definition, I have coded all drugs which have progressed past the Discovery stage as 1, since they have entered into clinical development. Drugs in the pre-clinical Discovery phase did not necessarily enter into development and so there is variation here.

      Code:
      DevelopmentStatu | IntoDevelopment    
                     s |         0          1 |     Total
      -----------------+----------------------+----------
              Clinical |         0         63 |        63
             Discovery |     3,820      3,078 |     6,898
      Phase 1 Clinical |         0        535 |       535
      Phase 2 Clinical |         0        854 |       854
      Phase 3 Clinical |         0        387 |       387
      Pre-registration |         0        113 |       113
            Registered |         0         95 |        95
      -----------------+----------------------+----------
                 Total |     3,820      5,125 |     8,945
      With heckprobit:
      Code:
      . heckprobit ... , select(drugIndicationIntoDevelopment = ... ib2.DevStatus ib2016.Year)
      
      **Iterations Output Omitted**
      
      Probit model with sample selection              Number of obs     =      7,555
                                                      Censored obs      =      3,779
                                                      Uncensored obs    =      3,776
      
                                                      Wald chi2(52)     =     423.84
      Log pseudolikelihood = -3955.044                Prob > chi2       =     0.0000
      
                                                     (Std. Err. adjusted for 5,003 clusters in Drug)
      --------------------------------------------------------------------------------------------------
                                       |               Robust
                                       |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      ---------------------------------+----------------------------------------------------------------
      **Outcome Equation Omitted**
      ---------------------------------+----------------------------------------------------------------
      **Selection Equation:**
      **Extra controls omitted**
      
      drugIndicationIntoDevelopment    |
      
                             DevStatus |
                             Clinical  |   7.636424   .2723167    28.04   0.000     7.102693    8.170155
                     Phase 1 Clinical  |    9.34752   .5423626    17.23   0.000     8.284508    10.41053
                     Phase 2 Clinical  |   7.617611   .2027719    37.57   0.000     7.220185    8.015037
                     Phase 3 Clinical  |   7.460695   .1892162    39.43   0.000     7.089838    7.831552
                     Pre-registration  |   7.800735   .2145818    36.35   0.000     7.380162    8.221307
                           Registered  |   7.485286   .1824119    41.04   0.000     7.127765    7.842806
                                       |
                                  Year |
                                 2003  |   2.953355   .3820132     7.73   0.000     2.204623    3.702087
                                 2004  |   3.044034   .3821105     7.97   0.000     2.295111    3.792956
                                 2005  |   3.244613   .3805017     8.53   0.000     2.498844    3.990383
                                 2006  |   3.403035   .3775828     9.01   0.000     2.662987    4.143084
                                 2007  |   3.278851   .3780417     8.67   0.000     2.537903      4.0198
                                 2008  |   3.022322   .3774185     8.01   0.000     2.282596    3.762049
                                 2009  |   3.185404   .3832624     8.31   0.000     2.434223    3.936584
                                 2010  |   3.003247   .3760539     7.99   0.000     2.266195    3.740299
                                 2011  |   2.964679   .3764863     7.87   0.000     2.226779    3.702578
                                 2012  |   2.594776   .3786905     6.85   0.000     1.852556    3.336996
                                 2013  |   2.464523   .3795631     6.49   0.000     1.720593    3.208453
                                 2014  |   2.317342   .3823459     6.06   0.000     1.567958    3.066726
                                 2015  |   1.944325   .3852784     5.05   0.000     1.189193    2.699457
                                       |
                                 _cons |  -3.467664   .4065917    -8.53   0.000    -4.264569   -2.670759
      ---------------------------------+----------------------------------------------------------------
                               /athrho |  -.4123434   .1905769    -2.16   0.030    -.7858672   -.0388195
      ---------------------------------+----------------------------------------------------------------
                                   rho |  -.3904606   .1615216                     -.6560615      -.0388
      --------------------------------------------------------------------------------------------------
      Wald test of indep. eqns. (rho = 0): chi2(1) =     4.68   Prob > chi2 = 0.0305
      With probit on the exact same selection equation:
      Code:
      . probit ... i.DevStatus ib2016.Year
      
      note: 1.DevStatus != 0 predicts success perfectly
            1.DevStatus dropped and 44 obs not used
      
      note: 2.DevStatus != 1 predicts success perfectly
            2.DevStatus dropped and 1833 obs not used
      
      note: 3.DevStatus omitted because of collinearity
      note: 4.DevStatus omitted because of collinearity
      note: 5.DevStatus omitted because of collinearity
      note: 6.DevStatus omitted because of collinearity
      note: 7.DevStatus omitted because of collinearity
      Iteration 0:   log pseudolikelihood = -5287.0055  
      Iteration 1:   log pseudolikelihood = -4413.1199  
      Iteration 2:   log pseudolikelihood = -4402.1526  
      Iteration 3:   log pseudolikelihood = -4402.1196  
      Iteration 4:   log pseudolikelihood = -4402.1196  
      
      Probit regression                               Number of obs     =      7,628
                                                      Wald chi2(49)     =     979.95
                                                      Prob > chi2       =     0.0000
      Log pseudolikelihood = -4402.1196               Pseudo R2         =     0.1674
      
                                                     (Std. Err. adjusted for 5,007 clusters in Drug)
      --------------------------------------------------------------------------------------------------
                                       |               Robust
         drugIndicationIntoDevelopment |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      ---------------------------------+----------------------------------------------------------------
      **Extra controls omitted**
      
                             DevStatus |
                             Clinical  |          0  (empty)
                            Discovery  |          0  (omitted)
                     Phase 1 Clinical  |          0  (empty)
                     Phase 2 Clinical  |          0  (empty)
                     Phase 3 Clinical  |          0  (empty)
                     Pre-registration  |          0  (empty)
                           Registered  |          0  (empty)
                                       |
                                  Year |
                                 2003  |   .5720061   .1156278     4.95   0.000     .3453799    .7986324
                                 2004  |   .6048604   .1187959     5.09   0.000     .3720247    .8376961
                                 2005  |   .8984012   .1137214     7.90   0.000     .6755113    1.121291
                                 2006  |   .9815531   .1058797     9.27   0.000     .7740326    1.189073
                                 2007  |   .9060379   .1045031     8.67   0.000     .7012156     1.11086
                                 2008  |   .6564245   .1040019     6.31   0.000     .4525845    .8602645
                                 2009  |   .8369335   .1161109     7.21   0.000     .6093603    1.064507
                                 2010  |   .6413842   .0970183     6.61   0.000     .4512319    .8315365
                                 2011  |   .6870169   .0992929     6.92   0.000     .4924064    .8816275
                                 2012  |   .4555852   .0986859     4.62   0.000     .2621645     .649006
                                 2013  |   .5481051   .0980139     5.59   0.000     .3560015    .7402087
                                 2014  |   .6183751   .0982493     6.29   0.000       .42581    .8109401
                                 2015  |   .5577375   .0983056     5.67   0.000      .365062    .7504129
                                       |
                                 _cons |  -1.101107   .1592578    -6.91   0.000    -1.413246   -.7889672
      --------------------------------------------------------------------------------------------------

      Comment

      Working...
      X