Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Heckman Two-Step Model & different samples ?

    Hello All !

    I want to model the determinants of the occupation status of individuals in the labor market.
    To avoid bias selection issues I opted for the heckman two step model : the first model estimating the participation in labor force (Yes or No) first and the second one being the model of performance (indepedent variable is categorical - 5 groups : Protected wage earner, Unprotected wage earner, Self-employed, Non-paid, Unemployed).
    I wish to run the two-step model to get the inverse Mills' ratio so as to include it in the performance model, and this is where it gets tricky, the two models should be run for two different samples : the first estimation is run on a sample of 158974, including those who are active and non active in the labor market whereas the second model is run only for those who are participating in the labor force (79 902).

    Is there any command to specify the samples on which every equation is run in order to obtain an accurate inverse Mills' ratio?

    Thanks in advance for any kind of help

  • #2
    heckman is the natural way to do this. I think the eregress models in Stata 15 will also do selection. If you look carefully on Statalist, you'll find that someone has explained how to do this manually in recent weeks.

    Comment


    • #3
      Thank you Mr Bromiley. Unfortunately I have Stata 12. But please I have a quick question for you.
      Here's the command I'm using : heckman "model1" , twostep select("model2") rhotrunc first nshazard(mills)
      The model1 should be run only for individuals who are participating in the labor market, while the model2 should be run for all individuals.
      Are there any commands I can use that allow specification of the sample since the data I'm working on includes both ?
      I tried :heckman "model1" if active == 1, twostep select("model2") rhotrunc first nshazard(mills)
      But It seems that the command isn't correct since I get the following message :
      Dependent variable never censored because of selection:
      model would simplify to OLS regression
      r(498);

      Regards,
      Lamia

      Comment


      • #4
        Hi Lamia,
        I do not think there is such a command in stata. The standard heckman command assumes the selection and outcome model are estimated using the same data.
        For some excercises i have prepared for classes in the past, however, i came up with an alternative way to do heckman that could potentially be applied for a case like yours

        Below you can see an example of how i would apply what we can call a two step heckman.
        I use an example from stata that is used to show how heckman works.
        So first i estimate the model using heckman two step
        the second is estimating the heckman using what i call the two-step Maximum Likelihood.
        Third, i simulate a situation where I have two datasets (here just duplicate the dataset), but use one dataset to estimate the probit, and the second to estimate the OLS component, all using ML
        I provide you below with the code for all the examples, with a table that shows how the results compare.

        It should be fairly easy to adapt this to whatever you are trying to do.
        HTH
        Fernando
        Code:
        webuse womenwk, clear
        gen dwage=wage!=.  
        
         heckman wage educ age  , select(dwage=married children educ age) twostep
        est sto m0
         capture program drop myheckman
         program myheckman
         args lnf xb1 lns1 g1 xb2
         qui {  
         ** the probit
         
         replace `lnf'=log(normal(`xb2'))  if $ML_y2==1
         replace `lnf'=log(normal(-`xb2')) if $ML_y2==0 
          
        
         ** the OLS
          tempvar mill
          gen double `mill'=normalden(`xb2')/normal(`xb2')
         replace `lnf'=`lnf'+log(normalden($ML_y1,`xb1'+`g1'*`mill',exp(`lns1'))) if $ML_y2==1
         
         }
          end
         ml model lf myheckman (wage:wage= educ age) (lns1:) (g:) (dwage :dwage=married children educ age)  , robust miss  maximize technique(nr bhhh)
        ml display 
        est sto m1
         
         capture program drop my2heckman
         program my2heckman
         args lnf xb1 lns1 g1 xb2
         qui {
         ** the probit
         replace `lnf'=0
         replace `lnf'=log(normal(`xb2'))*($ML_y2==1)+log(normal(-`xb2'))*($ML_y2==0) if sam==2
         ** the OLS
         tempvar mill
         gen double `mill'=normalden(`xb2')/normal(`xb2')
         replace `lnf'=log(normalden($ML_y1,`xb1'+`g1'*`mill',exp(`lns1')))  if sam==1
         
         }
         end
         
         gen id=_n
        expand 2
        bysort id:gen sam=_n
        drop if wage==. & sam==1
        ml model lf my2heckman (wage:wage= educ age) (lns1:) (g:) (dwage :dwage=married children educ age)  , robust miss  maximize technique(nr bhhh)
        ml display 
        est sto m2
        
        est tab m0 m1 m2 , se
        
        -----------------------------------------------------
            Variable |     m0           m1           m2      
        -------------+---------------------------------------
        wage         |
           education |  .98252587    .98320249    .98320254  
                     |  .05388212    .05440715    .05414974  
                 age |  .21186952    .21423626    .21423625  
                     |  .02205106    .02293244    .02294102  
               _cons |  .73403914    .61749839    .61749742  
                     |  1.2483309     1.287053    1.2842208  
        -------------+---------------------------------------
        dwage        |
             married |  .43085749    .42322921    .42322923  
                     |  .07420801    .06922548    .06936448  
            children |  .44732491    .44253532     .4425353  
                     |   .0287417      .027939    .02795703  
           education |  .05836454    .05816828    .05816828  
                     |  .01097418    .01104809    .01106901  
                 age |  .03472113    .03573271    .03573271  
                     |  .00422933    .00424514    .00425717  
               _cons |  -2.467365   -2.4889185   -2.4889185  
                     |  .19256348    .19043543     .1912609  
        -------------+---------------------------------------
        /mills       |
              lambda |  4.0016155                            
                     |  .60653883                            
        -------------+---------------------------------------
        lns1         |
               _cons |               1.6771628    1.6771628  
                     |               .01976352    .01976338  
        -------------+---------------------------------------
        g            |
               _cons |               4.0444969     4.044499  
                     |                .6167703     .6148037  
        -----------------------------------------------------
                                                 legend: b/se

        Comment

        Working...
        X