testing for convergence with heckman twostep

Brian Holtemeyer

Join Date: Jun 2018

Posts: 36
#1

testing for convergence with heckman twostep

27 Oct 2022, 09:32

How can I systematically identify when a heckman selection model with the -twostep- option fails to converge?

The maximum likelihood estimator returns e(converged).The twostep estimator does not return this scalar, but it does display in red text "convergence not achieved" (and the return code is 0). But I don't want to rely on "seeing" this message.

I thought about testing for perfect prediction in the selection equation, but perfect prediction does not always mean that the twostep estimation will not converge (some discussion here: https://www.statalist.org/forums/for...=1666881628404 ).

I'm running Stata 17.

If it's helpful for this question, I created an example of a made-up model that does not converge:

Code:

* create data set: set seed 754 clear set obs 50 foreach k in Y D x1 x2 x3 { g `k' = runiform(-2,2) } replace D = (D>0) /* create a Dummy */ g my_category= int(x3) + 100 /* create a categorical variable */ replace D = 1 if my_category==100 /* ensure that it's a perfect predictor for some obs */ ta my_category D, m su * * run heckman heckman Y x1 , sel( D = x1 x2 i.my_category ) twostep ereturn list cap heckman Y x1 , sel( D = x1 x2 i.my_category ) di _rc ereturn list

ouput:

. * create data set:
. set seed 754 /* 141 226 482 515 754 also do not converge */

. clear

. set obs 50
Number of observations (_N) was 0, now 50.

. foreach k in Y D x1 x2 x3 {
2. g `k' = runiform(-2,2)
3. }

. replace D = (D>0) /* create a Dummy */
(50 real changes made)

. g my_category= int(x3) + 100 /* create a categorical variable */

. replace D = 1 if my_category==100 /* ensure that it's a perfect predictor for some obs */
(16 real changes made)

. ta my_category D, m

my_categor | D
y | 0 1 | Total
-----------+----------------------+----------
99 | 6 4 | 10
100 | 0 30 | 30
101 | 6 4 | 10
-----------+----------------------+----------
Total | 12 38 | 50

. su *

Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
Y | 50 .124543 1.208587 -1.871603 1.981572
D | 50 .76 .4314191 0 1
x1 | 50 -.0000641 1.054921 -1.984768 1.935179
x2 | 50 .250539 1.137619 -1.850836 1.992067
x3 | 50 -.055049 1.022086 -1.846501 1.761011
-------------+---------------------------------------------------------
my_category | 50 100 .6388766 99 101

.
. * run heckman
. heckman Y x1 , sel( D = x1 x2 i.my_category ) twostep
convergence not achieved
note: two-step estimate of rho = -15596.883 is being truncated to -1

Heckman selection model -- two-step estimates Number of obs = 50
(regression model with sample selection) Selected = 38
Nonselected = 12

Wald chi2(1) = 0.00
Prob > chi2 = 1.0000

------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
Y |
x1 | -.319197 1.57e+07 -0.00 1.000 -3.08e+07 3.08e+07
_cons | .2806868 1.81e+07 0.00 1.000 -3.55e+07 3.55e+07
-------------+----------------------------------------------------------------
D |
x1 | -23.11661 . . . . .
x2 | -3.023762 3463.55 -0.00 0.999 -6791.456 6785.409
|
my_category |
100 | 49.8795 . . . . .
101 | -17.50229 3304.359 -0.01 0.996 -6493.927 6458.922
|
_cons | .0054919 . . . . .
-------------+----------------------------------------------------------------
/mills |
lambda | -1.08e+08 5.99e+15 -0.00 1.000 -1.17e+16 1.17e+16
-------------+----------------------------------------------------------------
rho | -1.00000
sigma | 1.079e+08
------------------------------------------------------------------------------

. ereturn list

scalars:
e(N) = 50
e(chi2) = 4.12607253951e-16
e(df_m) = 1
e(p) = .9999999837927813
e(N_nonselected) = 12
e(N_selected) = 38
e(rho) = -1
e(rho1) = -15596.88296339077
e(sigma) = 107941312.1080871
e(lambda) = -107941312.1080871
e(selambda) = 5986423671759735
e(rank) = 5

macros:
e(cmdline) : "heckman Y x1 , sel( D = x1 x2 i.my_category ) twostep"
e(cmd) : "heckman"
e(vce) : "conventional"
e(method) : "two-step"
e(rhometh) : "rhosigma"
e(depvar) : "Y D"
e(marginsnotok) : "SCores SCORESEL stdp stdf stdpsel YCond YExpected"
e(marginsok) : "XB XBSel Mills NShazard PSel E(passthru) Pr(passthru) YStar(passthru) default"
e(predict) : "heckma_p"
e(title) : "Heckman selection model -- two-step estimates"
e(title2) : "(regression model with sample selection)"
e(chi2type) : "Wald"
e(properties) : "b V"

matrices:
e(b) : 1 x 9
e(V) : 9 x 9

functions:
e(sample)

. cap heckman Y x1 , sel( D = x1 x2 i.my_category )

. di _rc
430

. ereturn list

scalars:
e(rank) = 5
e(N) = 50
e(ic) = 300
e(k) = 10
e(k_eq) = 4
e(k_dv) = 2
e(converged) = 0
e(rc) = 430
e(ll) = -60.7415939062668
e(k_eq_model) = 1
e(df_m) = 1
e(chi2) = 3.013984310887087
e(p) = .0825491532826645
e(rho) = -.9999999292307832
e(sigma) = 1.196648178903739
e(lambda) = -1.196648094217885
e(selambda) = .1372649503599038
e(k_aux) = 2
e(N_selected) = 38
e(N_nonselected) = 12
e(p_c) = .9999139243578772
e(chi2_c) = 1.16380554252e-08

macros:
e(marginsok) : "default XB PSel YCond YExpected Mills NShazard"
e(marginsnotok) : "SCores SCORESEL stdp stdf stdpsel"
e(cmd) : "heckman"
e(predict) : "heckma_p"
e(method) : "ml"
e(title2) : "(regression model with sample selection)"
e(chi2_ct) : "LR"
e(chi2type) : "Wald"
e(opt) : "moptimize"
e(vce) : "oim"
e(title) : "Heckman selection model"
e(user) : "heck_d2"
e(ml_method) : "e2"
e(technique) : "nr"
e(which) : "max"
e(depvar) : "Y D"
e(properties) : "b V"

matrices:
e(b) : 1 x 10
e(V) : 10 x 10
e(Cns) : 1 x 11
e(ilog) : 1 x 20
e(gradient) : 1 x 10

functions:
e(sample)

.
end of do-file
Tags: None
Brian Holtemeyer

Join Date: Jun 2018

Posts: 36
#2

27 Oct 2022, 10:09

I was curious if the returned values would give some clue, so I looped the heckman model (using twostep option) over seeds 1-1000 and saved the returned scalars.I put vertical lines at the seeds that did not converge. In a pinch, it seems like i might be able to use e(rank) to detect when convergence doesn't occur (???) but that would be a bit clumsy.

(the last red line is at 754, which is the seed used in the code above)
Comment
Brian Holtemeyer

Join Date: Jun 2018

Posts: 36
#3

01 Nov 2022, 14:15

bump
Comment
Brian Holtemeyer

Join Date: Jun 2018

Posts: 36
#4

02 Nov 2022, 13:01

Is there anything that I can add to make this clearer?
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2462
#5

03 Nov 2022, 11:44

Hi Brian
Since the problem has to do with the first step (the probit), did you consider breaking the problem into two?
one that focuses on the probit only, and detects if there are any problems there.
If that passes, you can then estimate the heckman model.
HTH
F
Comment
Brian Holtemeyer

Join Date: Jun 2018

Posts: 36
#6

07 Nov 2022, 09:56

Thanks Fernando ! I had been hoping not to split it, but maybe it's unavoidable.

I'm surprised that the convergence status is not included in the returned in e(). Does anyone know why?
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2462
#7

07 Nov 2022, 10:13

I would submit that question directly to technical services.
If its something "easy" to address, I imagine they can patch it before next update.
F
Comment
Brian Holtemeyer

Join Date: Jun 2018

Posts: 36
#8

09 Nov 2022, 12:00

update: Stata says that they will add this (a convergence scalar) to the -ereturn list- for the two-step estimator.
1 like
Comment
Brian Holtemeyer

Join Date: Jun 2018

Posts: 36
#9

20 Jan 2023, 12:25

update: Stata informed me that they've added a convergence scalar to the ereturn list:e(fconverge)
Comment

Announcement

testing for convergence with heckman twostep

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment