Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • testing for convergence with heckman twostep

    How can I systematically identify when a heckman selection model with the -twostep- option fails to converge?

    The maximum likelihood estimator returns e(converged).The twostep estimator does not return this scalar, but it does display in red text "convergence not achieved" (and the return code is 0). But I don't want to rely on "seeing" this message.

    I thought about testing for perfect prediction in the selection equation, but perfect prediction does not always mean that the twostep estimation will not converge (some discussion here: https://www.statalist.org/forums/for...=1666881628404 ).

    I'm running Stata 17.

    If it's helpful for this question, I created an example of a made-up model that does not converge:

    Code:
    * create data set:
    set seed 754         
    clear
    set obs 50
    foreach k in Y D x1 x2 x3 {
        g `k' = runiform(-2,2)
    }
    replace D = (D>0)                            /* create a Dummy                                         */
    g my_category= int(x3) + 100                /* create a categorical variable                         */
    replace D = 1 if my_category==100            /* ensure that it's a perfect predictor for some obs     */
    ta my_category D, m
    su *
    
    * run heckman
            heckman Y x1        , sel(    D    =   x1 x2   i.my_category ) twostep
    ereturn list
    cap     heckman Y x1        , sel(    D    =   x1 x2   i.my_category )
    di _rc
    ereturn list


    ouput:


    . * create data set:
    . set seed 754 /* 141 226 482 515 754 also do not converge */

    . clear

    . set obs 50
    Number of observations (_N) was 0, now 50.

    . foreach k in Y D x1 x2 x3 {
    2. g `k' = runiform(-2,2)
    3. }

    . replace D = (D>0) /* create a Dummy */
    (50 real changes made)

    . g my_category= int(x3) + 100 /* create a categorical variable */

    . replace D = 1 if my_category==100 /* ensure that it's a perfect predictor for some obs */
    (16 real changes made)

    . ta my_category D, m

    my_categor | D
    y | 0 1 | Total
    -----------+----------------------+----------
    99 | 6 4 | 10
    100 | 0 30 | 30
    101 | 6 4 | 10
    -----------+----------------------+----------
    Total | 12 38 | 50

    . su *

    Variable | Obs Mean Std. dev. Min Max
    -------------+---------------------------------------------------------
    Y | 50 .124543 1.208587 -1.871603 1.981572
    D | 50 .76 .4314191 0 1
    x1 | 50 -.0000641 1.054921 -1.984768 1.935179
    x2 | 50 .250539 1.137619 -1.850836 1.992067
    x3 | 50 -.055049 1.022086 -1.846501 1.761011
    -------------+---------------------------------------------------------
    my_category | 50 100 .6388766 99 101

    .
    . * run heckman
    . heckman Y x1 , sel( D = x1 x2 i.my_category ) twostep
    convergence not achieved
    note: two-step estimate of rho = -15596.883 is being truncated to -1

    Heckman selection model -- two-step estimates Number of obs = 50
    (regression model with sample selection) Selected = 38
    Nonselected = 12

    Wald chi2(1) = 0.00
    Prob > chi2 = 1.0000

    ------------------------------------------------------------------------------
    | Coefficient Std. err. z P>|z| [95% conf. interval]
    -------------+----------------------------------------------------------------
    Y |
    x1 | -.319197 1.57e+07 -0.00 1.000 -3.08e+07 3.08e+07
    _cons | .2806868 1.81e+07 0.00 1.000 -3.55e+07 3.55e+07
    -------------+----------------------------------------------------------------
    D |
    x1 | -23.11661 . . . . .
    x2 | -3.023762 3463.55 -0.00 0.999 -6791.456 6785.409
    |
    my_category |
    100 | 49.8795 . . . . .
    101 | -17.50229 3304.359 -0.01 0.996 -6493.927 6458.922
    |
    _cons | .0054919 . . . . .
    -------------+----------------------------------------------------------------
    /mills |
    lambda | -1.08e+08 5.99e+15 -0.00 1.000 -1.17e+16 1.17e+16
    -------------+----------------------------------------------------------------
    rho | -1.00000
    sigma | 1.079e+08
    ------------------------------------------------------------------------------

    . ereturn list

    scalars:
    e(N) = 50
    e(chi2) = 4.12607253951e-16
    e(df_m) = 1
    e(p) = .9999999837927813
    e(N_nonselected) = 12
    e(N_selected) = 38
    e(rho) = -1
    e(rho1) = -15596.88296339077
    e(sigma) = 107941312.1080871
    e(lambda) = -107941312.1080871
    e(selambda) = 5986423671759735
    e(rank) = 5

    macros:
    e(cmdline) : "heckman Y x1 , sel( D = x1 x2 i.my_category ) twostep"
    e(cmd) : "heckman"
    e(vce) : "conventional"
    e(method) : "two-step"
    e(rhometh) : "rhosigma"
    e(depvar) : "Y D"
    e(marginsnotok) : "SCores SCORESEL stdp stdf stdpsel YCond YExpected"
    e(marginsok) : "XB XBSel Mills NShazard PSel E(passthru) Pr(passthru) YStar(passthru) default"
    e(predict) : "heckma_p"
    e(title) : "Heckman selection model -- two-step estimates"
    e(title2) : "(regression model with sample selection)"
    e(chi2type) : "Wald"
    e(properties) : "b V"

    matrices:
    e(b) : 1 x 9
    e(V) : 9 x 9

    functions:
    e(sample)

    . cap heckman Y x1 , sel( D = x1 x2 i.my_category )

    . di _rc
    430

    . ereturn list

    scalars:
    e(rank) = 5
    e(N) = 50
    e(ic) = 300
    e(k) = 10
    e(k_eq) = 4
    e(k_dv) = 2
    e(converged) = 0
    e(rc) = 430
    e(ll) = -60.7415939062668
    e(k_eq_model) = 1
    e(df_m) = 1
    e(chi2) = 3.013984310887087
    e(p) = .0825491532826645
    e(rho) = -.9999999292307832
    e(sigma) = 1.196648178903739
    e(lambda) = -1.196648094217885
    e(selambda) = .1372649503599038
    e(k_aux) = 2
    e(N_selected) = 38
    e(N_nonselected) = 12
    e(p_c) = .9999139243578772
    e(chi2_c) = 1.16380554252e-08

    macros:
    e(marginsok) : "default XB PSel YCond YExpected Mills NShazard"
    e(marginsnotok) : "SCores SCORESEL stdp stdf stdpsel"
    e(cmd) : "heckman"
    e(predict) : "heckma_p"
    e(method) : "ml"
    e(title2) : "(regression model with sample selection)"
    e(chi2_ct) : "LR"
    e(chi2type) : "Wald"
    e(opt) : "moptimize"
    e(vce) : "oim"
    e(title) : "Heckman selection model"
    e(user) : "heck_d2"
    e(ml_method) : "e2"
    e(technique) : "nr"
    e(which) : "max"
    e(depvar) : "Y D"
    e(properties) : "b V"

    matrices:
    e(b) : 1 x 10
    e(V) : 10 x 10
    e(Cns) : 1 x 11
    e(ilog) : 1 x 20
    e(gradient) : 1 x 10

    functions:
    e(sample)

    .
    end of do-file



  • #2
    I was curious if the returned values would give some clue, so I looped the heckman model (using twostep option) over seeds 1-1000 and saved the returned scalars.I put vertical lines at the seeds that did not converge. In a pinch, it seems like i might be able to use e(rank) to detect when convergence doesn't occur (???) but that would be a bit clumsy.

    (the last red line is at 754, which is the seed used in the code above)

    Click image for larger version

Name:	returned_values_plots.png
Views:	1
Size:	340.3 KB
ID:	1687039

    Comment


    • #3
      bump

      Comment


      • #4
        Is there anything that I can add to make this clearer?

        Comment


        • #5
          Hi Brian
          Since the problem has to do with the first step (the probit), did you consider breaking the problem into two?
          one that focuses on the probit only, and detects if there are any problems there.
          If that passes, you can then estimate the heckman model.
          HTH
          F

          Comment


          • #6
            Thanks Fernando ! I had been hoping not to split it, but maybe it's unavoidable.

            I'm surprised that the convergence status is not included in the returned in e(). Does anyone know why?

            Comment


            • #7
              I would submit that question directly to technical services.
              If its something "easy" to address, I imagine they can patch it before next update.
              F

              Comment


              • #8
                update: Stata says that they will add this (a convergence scalar) to the -ereturn list- for the two-step estimator.

                Comment


                • #9
                  update: Stata informed me that they've added a convergence scalar to the ereturn list:e(fconverge)

                  Comment

                  Working...
                  X