Difference between Heckman ML vs Heckman twostep

Chen Samulsion

Join Date: Jan 2018

Posts: 914
#1

Difference between Heckman ML vs Heckman twostep

10 Feb 2025, 00:40

Dear Stata users,

I found a thread that was posted long ago and received no answer. https://www.stata.com/statalist/arch.../msg01403.html

I have same questions as Clifton Chow. Please allow me not paraphrase but to copy word for word.

I am modeling lnWage using Heckman for selection on a dataset of individuals with disability and I have some basic questions about the difference between Stata's Heckman full Maximum Likelihood and Heckman two-step efficient estimates.

1. I was under the impression that Heckman two-step is also estimated using maximum likelihood, but I see now it is not so as the coefficients, standard errors, lambda and even the sample in the analysis (both full and censored) are different. How is Heckman two-step estimated?

2. Both Heckman full ML and Heckman two-step include the estimate for lambda. Is the IMR part of the main equation in Heckman ML as well?

3. In the literature Heckman two-step is considered more robust and preferred. Why is this so and if that is the case, why bother with Heckman ML at all? What is essential advantage of Heckman ML vs. Heckman two-step?

I wonder if there's someone can answer these old questions. Thank you very much.
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 10190
#2

10 Feb 2025, 02:22

In the linear case, the first stage of the two-step estimator is a probit model, where the selection indicator is the outcome. The second stage is an augmented regression for the continuous sample, incorporating the inverse Mills ratio from the first step. The maximum likelihood (ML) estimator, on the other hand, jointly estimates the selection and outcome equations and is more efficient. However, ML relies on the assumption of joint normality of the error terms. If this assumption is violated, ML may be biased or inconsistent, whereas the two-stage estimator is more robust to misspecification since it requires fewer distributional assumptions beyond normality in the selection equation.

Therefore, if the joint normality assumption holds, ML estimation should be preferred due to efficiency.
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10190

10 Feb 2025, 03:31

Here is an illustration of the two-step procedure "by hand".

Code:

webuse womenwk, clear

*HECKMAN TWO-STEP
heckman wage educ age, select(married children educ age) twostep vce(bootstrap, seed(02102025))


*TWO-STEP "BY HAND"
gen sample= e(sample)
gen dvprobit= !missing(wage)

cap prog drop mybootstrap_prog
prog mybootstrap_prog, eclass
qui probit dvprobit married children educ age if sample
qui predict wagehat if sample, xb
qui gen double IML=normalden(wagehat)/normal(wagehat)
regress wage education age IML if sample
cap drop wagehat IML
end

bootstrap _b , reps(50) nowarn nodots nodrop seed(02102025): mybootstrap_prog

Res.:

Code:

. *HECKMAN TWO-STEP

.
. heckman wage educ age, select(married children educ age) twostep vce(bootstrap, seed(02102025))
(running heckman on estimation sample)

Bootstrap replications (50): .........10.........20.........30.........40.........50 done

Heckman selection model -- two-step estimates   Number of obs     =      2,000
(regression model with sample selection)              Selected    =      1,343
                                                      Nonselected =        657

                                                Wald chi2(2)      =     417.97
                                                Prob > chi2       =     0.0000

------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
        wage | coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
wage         |
   education |   .9825259    .059042    16.64   0.000     .8668057    1.098246
         age |   .2118695   .0210113    10.08   0.000     .1706882    .2530508
       _cons |   .7340391   1.214933     0.60   0.546    -1.647187    3.115265
-------------+----------------------------------------------------------------
select       |
     married |   .4308575    .066576     6.47   0.000     .3003709    .5613441
    children |   .4473249   .0301166    14.85   0.000     .3882975    .5063523
   education |   .0583645    .012429     4.70   0.000     .0340042    .0827248
         age |   .0347211   .0037843     9.17   0.000      .027304    .0421383
       _cons |  -2.467365   .2039289   -12.10   0.000    -2.867058   -2.067672
-------------+----------------------------------------------------------------
/mills       |
      lambda |   4.001615   .5293582     7.56   0.000     2.964092    5.039138
-------------+----------------------------------------------------------------
         rho |    0.67284
       sigma |  5.9473529
------------------------------------------------------------------------------


. *TWO-STEP "BY HAND"

. 
. gen sample= e(sample)

. 
. gen dvprobit= !missing(wage)

. 
. 
. 
. cap prog drop mybootstrap_prog

. 
. prog mybootstrap_prog, eclass
  1. 
. qui probit dvprobit married children educ age if sample
  2. 
. qui predict wagehat if sample, xb
  3. 
. qui gen double IML=normalden(wagehat)/normal(wagehat)
  4. 
. regress wage education age IML if sample
  5. 
. cap drop wagehat IML
  6. 
. end


. bootstrap _b , reps(50) nowarn nodots nodrop seed(02102025): mybootstrap_prog

Linear regression                                       Number of obs =  2,000
                                                        Replications  =     50
                                                        Wald chi2(3)  = 449.72
                                                        Prob > chi2   = 0.0000
                                                        R-squared     = 0.2793
                                                        Adj R-squared = 0.2777
                                                        Root MSE      = 5.3587

------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
        wage | coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
   education |   .9825259    .059042    16.64   0.000     .8668057    1.098246
         age |   .2118695   .0210113    10.08   0.000     .1706882    .2530508
         IML |   4.001615   .5293582     7.56   0.000     2.964092    5.039138
       _cons |   .7340391   1.214933     0.60   0.546    -1.647187    3.115265
------------------------------------------------------------------------------

Last edited by Andrew Musau; 10 Feb 2025, 03:33.

Comment

Chen Samulsion

Join Date: Jan 2018

Posts: 914
#4

10 Feb 2025, 06:40

Dear Andrew Musau, thank you so much for your detailed explanations and illustrations. It will be helpful if Stata add more details to help file of heckman command. Now the Description of heckman command only consists of one sentence:

heckman fits regression models with selection by using either Heckman's two-step consistent estimator or full maximum likelihood.

And a further question:

However, ML relies on the assumption of joint normality of the error terms. If this assumption is violated......

Is there any procedure to test whether the model satisfy the assumption of joint normality of the error terms? Or if results getting from full ML are quitely different from that of twostep, could we conclude directly that the assumption is violated?
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10190
#5

10 Feb 2025, 07:23

I would just compare the two-step and ML estimates. If the ML estimator produces significantly different results from the two-step method, that would warrant further investigation.
1 like
Comment
Chen Samulsion

Join Date: Jan 2018

Posts: 914
#6

10 Feb 2025, 07:31

Thank you very much Andrew Musau. That's great!
Comment
Jason Smith

Join Date: Nov 2023

Posts: 18
#7

14 Feb 2025, 11:40

Originally posted by Andrew Musau View Post

I would just compare the two-step and ML estimates. If the ML estimator produces significantly different results from the two-step method, that would warrant further investigation.

Hi Andrew Musau, could you elaborate on how you would further investigate? How would you treat the results if they are different? When I use FIML, the treatment effect is huge. However, when I use the twostep option, the treatment effect is much smaller and more reasonable. Do you default to preferring the two-step approach when the results are different?
Comment

Announcement

Difference between Heckman ML vs Heckman twostep

Comment

Comment

Comment

Comment

Comment

Comment