Heckman Two-Step Model & different samples ?

Lamia Ben

Join Date: Oct 2018

Posts: 26
#1

Heckman Two-Step Model & different samples ?

08 Oct 2018, 10:42

Hello All !

I want to model the determinants of the occupation status of individuals in the labor market.
To avoid bias selection issues I opted for the heckman two step model : the first model estimating the participation in labor force (Yes or No) first and the second one being the model of performance (indepedent variable is categorical - 5 groups : Protected wage earner, Unprotected wage earner, Self-employed, Non-paid, Unemployed).
I wish to run the two-step model to get the inverse Mills' ratio so as to include it in the performance model, and this is where it gets tricky, the two models should be run for two different samples : the first estimation is run on a sample of 158974, including those who are active and non active in the labor market whereas the second model is run only for those who are participating in the labor force (79 902).

Is there any command to specify the samples on which every equation is run in order to obtain an accurate inverse Mills' ratio?

Thanks in advance for any kind of help
Tags: None
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

09 Oct 2018, 10:48

heckman is the natural way to do this. I think the eregress models in Stata 15 will also do selection. If you look carefully on Statalist, you'll find that someone has explained how to do this manually in recent weeks.
Comment
Lamia Ben

Join Date: Oct 2018

Posts: 26
#3

12 Oct 2018, 14:33

Thank you Mr Bromiley. Unfortunately I have Stata 12. But please I have a quick question for you.
Here's the command I'm using : heckman "model1" , twostep select("model2") rhotrunc first nshazard(mills)
The model1 should be run only for individuals who are participating in the labor market, while the model2 should be run for all individuals.
Are there any commands I can use that allow specification of the sample since the data I'm working on includes both ?
I tried :heckman "model1" if active == 1, twostep select("model2") rhotrunc first nshazard(mills)
But It seems that the command isn't correct since I get the following message :
Dependent variable never censored because of selection:
model would simplify to OLS regression
r(498);

Regards,
Lamia
Comment

FernandoRios

Join Date: Apr 2014
Posts: 2496

12 Oct 2018, 19:34

Hi Lamia,
I do not think there is such a command in stata. The standard heckman command assumes the selection and outcome model are estimated using the same data.
For some excercises i have prepared for classes in the past, however, i came up with an alternative way to do heckman that could potentially be applied for a case like yours

Below you can see an example of how i would apply what we can call a two step heckman.
I use an example from stata that is used to show how heckman works.
So first i estimate the model using heckman two step
the second is estimating the heckman using what i call the two-step Maximum Likelihood.
Third, i simulate a situation where I have two datasets (here just duplicate the dataset), but use one dataset to estimate the probit, and the second to estimate the OLS component, all using ML
I provide you below with the code for all the examples, with a table that shows how the results compare.

It should be fairly easy to adapt this to whatever you are trying to do.
HTH
Fernando

Code:

webuse womenwk, clear
gen dwage=wage!=.  

 heckman wage educ age  , select(dwage=married children educ age) twostep
est sto m0
 capture program drop myheckman
 program myheckman
 args lnf xb1 lns1 g1 xb2
 qui {  
 ** the probit
 
 replace `lnf'=log(normal(`xb2'))  if $ML_y2==1
 replace `lnf'=log(normal(-`xb2')) if $ML_y2==0 
  

 ** the OLS
  tempvar mill
  gen double `mill'=normalden(`xb2')/normal(`xb2')
 replace `lnf'=`lnf'+log(normalden($ML_y1,`xb1'+`g1'*`mill',exp(`lns1'))) if $ML_y2==1
 
 }
  end
 ml model lf myheckman (wage:wage= educ age) (lns1:) (g:) (dwage :dwage=married children educ age)  , robust miss  maximize technique(nr bhhh)
ml display 
est sto m1
 
 capture program drop my2heckman
 program my2heckman
 args lnf xb1 lns1 g1 xb2
 qui {
 ** the probit
 replace `lnf'=0
 replace `lnf'=log(normal(`xb2'))*($ML_y2==1)+log(normal(-`xb2'))*($ML_y2==0) if sam==2
 ** the OLS
 tempvar mill
 gen double `mill'=normalden(`xb2')/normal(`xb2')
 replace `lnf'=log(normalden($ML_y1,`xb1'+`g1'*`mill',exp(`lns1')))  if sam==1
 
 }
 end
 
 gen id=_n
expand 2
bysort id:gen sam=_n
drop if wage==. & sam==1
ml model lf my2heckman (wage:wage= educ age) (lns1:) (g:) (dwage :dwage=married children educ age)  , robust miss  maximize technique(nr bhhh)
ml display 
est sto m2

est tab m0 m1 m2 , se

-----------------------------------------------------
    Variable |     m0           m1           m2      
-------------+---------------------------------------
wage         |
   education |  .98252587    .98320249    .98320254  
             |  .05388212    .05440715    .05414974  
         age |  .21186952    .21423626    .21423625  
             |  .02205106    .02293244    .02294102  
       _cons |  .73403914    .61749839    .61749742  
             |  1.2483309     1.287053    1.2842208  
-------------+---------------------------------------
dwage        |
     married |  .43085749    .42322921    .42322923  
             |  .07420801    .06922548    .06936448  
    children |  .44732491    .44253532     .4425353  
             |   .0287417      .027939    .02795703  
   education |  .05836454    .05816828    .05816828  
             |  .01097418    .01104809    .01106901  
         age |  .03472113    .03573271    .03573271  
             |  .00422933    .00424514    .00425717  
       _cons |  -2.467365   -2.4889185   -2.4889185  
             |  .19256348    .19043543     .1912609  
-------------+---------------------------------------
/mills       |
      lambda |  4.0016155                            
             |  .60653883                            
-------------+---------------------------------------
lns1         |
       _cons |               1.6771628    1.6771628  
             |               .01976352    .01976338  
-------------+---------------------------------------
g            |
       _cons |               4.0444969     4.044499  
             |                .6167703     .6148037  
-----------------------------------------------------
                                         legend: b/se

Announcement

Heckman Two-Step Model & different samples ?

Comment

Comment

Comment