Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference between Heckman ML vs Heckman twostep

    Dear Stata users,

    I found a thread that was posted long ago and received no answer. https://www.stata.com/statalist/arch.../msg01403.html

    I have same questions as Clifton Chow. Please allow me not paraphrase but to copy word for word.

    I am modeling lnWage using Heckman for selection on a dataset of individuals with disability and I have some basic questions about the difference between Stata's Heckman full Maximum Likelihood and Heckman two-step efficient estimates.

    1. I was under the impression that Heckman two-step is also estimated using maximum likelihood, but I see now it is not so as the coefficients, standard errors, lambda and even the sample in the analysis (both full and censored) are different. How is Heckman two-step estimated?

    2. Both Heckman full ML and Heckman two-step include the estimate for lambda. Is the IMR part of the main equation in Heckman ML as well?

    3. In the literature Heckman two-step is considered more robust and preferred. Why is this so and if that is the case, why bother with Heckman ML at all? What is essential advantage of Heckman ML vs. Heckman two-step?

    I wonder if there's someone can answer these old questions. Thank you very much.

  • #2
    In the linear case, the first stage of the two-step estimator is a probit model, where the selection indicator is the outcome. The second stage is an augmented regression for the continuous sample, incorporating the inverse Mills ratio from the first step. The maximum likelihood (ML) estimator, on the other hand, jointly estimates the selection and outcome equations and is more efficient. However, ML relies on the assumption of joint normality of the error terms. If this assumption is violated, ML may be biased or inconsistent, whereas the two-stage estimator is more robust to misspecification since it requires fewer distributional assumptions beyond normality in the selection equation.

    Therefore, if the joint normality assumption holds, ML estimation should be preferred due to efficiency.

    Comment


    • #3
      Here is an illustration of the two-step procedure "by hand".

      Code:
      webuse womenwk, clear
      
      *HECKMAN TWO-STEP
      heckman wage educ age, select(married children educ age) twostep vce(bootstrap, seed(02102025))
      
      
      *TWO-STEP "BY HAND"
      gen sample= e(sample)
      gen dvprobit= !missing(wage)
      
      cap prog drop mybootstrap_prog
      prog mybootstrap_prog, eclass
      qui probit dvprobit married children educ age if sample
      qui predict wagehat if sample, xb
      qui gen double IML=normalden(wagehat)/normal(wagehat)
      regress wage education age IML if sample
      cap drop wagehat IML
      end
      
      bootstrap _b , reps(50) nowarn nodots nodrop seed(02102025): mybootstrap_prog
      Res.:

      Code:
      . *HECKMAN TWO-STEP
      
      .
      . heckman wage educ age, select(married children educ age) twostep vce(bootstrap, seed(02102025))
      (running heckman on estimation sample)
      
      Bootstrap replications (50): .........10.........20.........30.........40.........50 done
      
      Heckman selection model -- two-step estimates   Number of obs     =      2,000
      (regression model with sample selection)              Selected    =      1,343
                                                            Nonselected =        657
      
                                                      Wald chi2(2)      =     417.97
                                                      Prob > chi2       =     0.0000
      
      ------------------------------------------------------------------------------
                   |   Observed   Bootstrap                         Normal-based
              wage | coefficient  std. err.      z    P>|z|     [95% conf. interval]
      -------------+----------------------------------------------------------------
      wage         |
         education |   .9825259    .059042    16.64   0.000     .8668057    1.098246
               age |   .2118695   .0210113    10.08   0.000     .1706882    .2530508
             _cons |   .7340391   1.214933     0.60   0.546    -1.647187    3.115265
      -------------+----------------------------------------------------------------
      select       |
           married |   .4308575    .066576     6.47   0.000     .3003709    .5613441
          children |   .4473249   .0301166    14.85   0.000     .3882975    .5063523
         education |   .0583645    .012429     4.70   0.000     .0340042    .0827248
               age |   .0347211   .0037843     9.17   0.000      .027304    .0421383
             _cons |  -2.467365   .2039289   -12.10   0.000    -2.867058   -2.067672
      -------------+----------------------------------------------------------------
      /mills       |
            lambda |   4.001615   .5293582     7.56   0.000     2.964092    5.039138
      -------------+----------------------------------------------------------------
               rho |    0.67284
             sigma |  5.9473529
      ------------------------------------------------------------------------------
      
      
      . *TWO-STEP "BY HAND"
      
      . 
      . gen sample= e(sample)
      
      . 
      . gen dvprobit= !missing(wage)
      
      . 
      . 
      . 
      . cap prog drop mybootstrap_prog
      
      . 
      . prog mybootstrap_prog, eclass
        1. 
      . qui probit dvprobit married children educ age if sample
        2. 
      . qui predict wagehat if sample, xb
        3. 
      . qui gen double IML=normalden(wagehat)/normal(wagehat)
        4. 
      . regress wage education age IML if sample
        5. 
      . cap drop wagehat IML
        6. 
      . end
      
      
      . bootstrap _b , reps(50) nowarn nodots nodrop seed(02102025): mybootstrap_prog
      
      Linear regression                                       Number of obs =  2,000
                                                              Replications  =     50
                                                              Wald chi2(3)  = 449.72
                                                              Prob > chi2   = 0.0000
                                                              R-squared     = 0.2793
                                                              Adj R-squared = 0.2777
                                                              Root MSE      = 5.3587
      
      ------------------------------------------------------------------------------
                   |   Observed   Bootstrap                         Normal-based
              wage | coefficient  std. err.      z    P>|z|     [95% conf. interval]
      -------------+----------------------------------------------------------------
         education |   .9825259    .059042    16.64   0.000     .8668057    1.098246
               age |   .2118695   .0210113    10.08   0.000     .1706882    .2530508
               IML |   4.001615   .5293582     7.56   0.000     2.964092    5.039138
             _cons |   .7340391   1.214933     0.60   0.546    -1.647187    3.115265
      ------------------------------------------------------------------------------
      Last edited by Andrew Musau; 10 Feb 2025, 03:33.

      Comment


      • #4
        Dear Andrew Musau, thank you so much for your detailed explanations and illustrations. It will be helpful if Stata add more details to help file of heckman command. Now the Description of heckman command only consists of one sentence:
        heckman fits regression models with selection by using either Heckman's two-step consistent estimator or full maximum likelihood.
        And a further question:
        However, ML relies on the assumption of joint normality of the error terms. If this assumption is violated......
        Is there any procedure to test whether the model satisfy the assumption of joint normality of the error terms? Or if results getting from full ML are quitely different from that of twostep, could we conclude directly that the assumption is violated?

        Comment


        • #5
          I would just compare the two-step and ML estimates. If the ML estimator produces significantly different results from the two-step method, that would warrant further investigation.

          Comment


          • #6
            Thank you very much Andrew Musau. That's great!

            Comment


            • #7
              Originally posted by Andrew Musau View Post
              I would just compare the two-step and ML estimates. If the ML estimator produces significantly different results from the two-step method, that would warrant further investigation.
              Hi Andrew Musau, could you elaborate on how you would further investigate? How would you treat the results if they are different? When I use FIML, the treatment effect is huge. However, when I use the twostep option, the treatment effect is much smaller and more reasonable. Do you default to preferring the two-step approach when the results are different?

              Comment

              Working...
              X