Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Heckman model with more observations than reg

    Dear Statalist,
    I am running heckman models and compare the results with reg (OLS). For example
    heckman lghwage wexpfull, select(age)
    reg lghwage wexpfull
    But Stata tells me that the number of observations in the heckman model is much higher than with the OLS. Is this normal or does it indicate a problem in my data?
    Thank you!

  • #2
    Marta:
    ii is always difficult (for me, at least) to give a positive reply without seeing what Stata gave you back.
    That said, what you experience may be due to missing values, as you can see from the following toy-example:
    Code:
    use http://www.stata-press.com/data/r15/womenwk
    . reg wage i.educ
    
          Source |       SS           df       MS      Number of obs   =     1,343
    -------------+----------------------------------   F(3, 1339)      =    126.78
           Model |  11803.0955         3  3934.36516   Prob > F        =    0.0000
        Residual |  41551.7992     1,339  31.0319635   R-squared       =    0.2212
    -------------+----------------------------------   Adj R-squared   =    0.2195
           Total |  53354.8946     1,342  39.7577456   Root MSE        =    5.5706
    
    ------------------------------------------------------------------------------
            wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
       education |
             12  |   1.853433   .4027279     4.60   0.000     1.063387    2.643479
             16  |   5.422342   .3910016    13.87   0.000       4.6553    6.189385
             20  |   10.36177   .6331415    16.37   0.000     9.119714    11.60383
                 |
           _cons |   20.44185   .2956586    69.14   0.000     19.86184    21.02185
    ------------------------------------------------------------------------------
    
    . heckman wage i.educ,select(age)
    
    Iteration 0:   log likelihood = -5400.8769  
    Iteration 1:   log likelihood = -5383.0138  
    Iteration 2:   log likelihood = -5382.4071  
    Iteration 3:   log likelihood = -5382.4068  
    
    Heckman selection model                         Number of obs     =      2,000
    (regression model with sample selection)              Selected    =      1,343
                                                          Nonselected =        657
    
                                                    Wald chi2(3)      =     344.40
    Log likelihood = -5382.407                      Prob > chi2       =     0.0000
    
    ------------------------------------------------------------------------------
            wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    wage         |
       education |
             12  |   1.357748   .3987013     3.41   0.001     .5763079    2.139188
             16  |   4.920785   .3878262    12.69   0.000      4.16066     5.68091
             20  |   9.764491   .6296831    15.51   0.000     8.530334    10.99865
                 |
           _cons |   22.92084   .3929268    58.33   0.000     22.15072    23.69096
    -------------+----------------------------------------------------------------
    select       |
             age |   .0468164   .0034605    13.53   0.000     .0400339    .0535989
           _cons |  -1.227563   .1249251    -9.83   0.000    -1.472412   -.9827143
    -------------+----------------------------------------------------------------
         /athrho |   -.811626   .0976252    -8.31   0.000    -1.002968   -.6202842
        /lnsigma |   1.832919   .0310983    58.94   0.000     1.771968    1.893871
    -------------+----------------------------------------------------------------
             rho |  -.6704863   .0537376                     -.7628377   -.5513259
           sigma |   6.252111   .1944302                      5.882416     6.64504
          lambda |  -4.191954   .4454765                     -5.065072   -3.318837
    ------------------------------------------------------------------------------
    LR test of indep. eqns. (rho = 0):   chi2(1) =    47.86   Prob > chi2 = 0.0000
    
    . sum wage education age
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
            wage |      1,343    23.69217    6.305374    5.88497   45.80979
       education |      2,000      13.084    3.045912         10         20
             age |      2,000      36.208     8.28656         20         59
    
    .
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      I apologize for not showing imediately what Stata gave me back. I attached.
      Your example was still able to help me. So Stata is only considering in his calculations only the selected observations (uncensored observations)? And the nonselected (censored) observations are observations where there are missings in the dependent and independent variables?
      Attached Files

      Comment


      • #4
        Marta:
        you may be interested in https://stats.stackexchange.com/ques...he-second-step

        As an aside, which version of Stata are you using?
        Last edited by Carlo Lazzaro; 01 Oct 2019, 06:34.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Thank you, Carlo Lazzaro for all help. Stata estimates for the uncensored observations. I am using Stata 14.

          Comment


          • #6
            Marta:
            now things are clearer to me (by the way: that's why posters are requested to s'pecify which version of Stata they're using, unless it is not the last released).

            when I run the same code in my old Stata 14 version, I got censored/uncensored...
            Code:
            . use http://www.stata-press.com/data/r15/womenwk
            
            . heckman wage i.educ,select(age)
            
            Iteration 0:   log likelihood = -5400.8769 
            Iteration 1:   log likelihood = -5383.0138 
            Iteration 2:   log likelihood = -5382.4071 
            Iteration 3:   log likelihood = -5382.4068 
            
            Heckman selection model                         Number of obs     =      2,000
            (regression model with sample selection)        Censored obs      =        657
                                                            Uncensored obs    =      1,343
            
                                                            Wald chi2(3)      =     344.40
            Log likelihood = -5382.407                      Prob > chi2       =     0.0000
            
            ------------------------------------------------------------------------------
                    wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
            wage         |
               education |
                     12  |   1.357748   .3987013     3.41   0.001     .5763079    2.139188
                     16  |   4.920785   .3878262    12.69   0.000      4.16066     5.68091
                     20  |   9.764491   .6296831    15.51   0.000     8.530334    10.99865
                         |
                   _cons |   22.92084   .3929268    58.33   0.000     22.15072    23.69096
            -------------+----------------------------------------------------------------
            select       |
                     age |   .0468164   .0034605    13.53   0.000     .0400339    .0535989
                   _cons |  -1.227563   .1249251    -9.83   0.000    -1.472412   -.9827143
            -------------+----------------------------------------------------------------
                 /athrho |   -.811626   .0976252    -8.31   0.000    -1.002968   -.6202842
                /lnsigma |   1.832919   .0310983    58.94   0.000     1.771968    1.893871
            -------------+----------------------------------------------------------------
                     rho |  -.6704863   .0537376                     -.7628377   -.5513259
                   sigma |   6.252111   .1944302                      5.882416     6.64504
                  lambda |  -4.191954   .4454765                     -5.065072   -3.318837
            ------------------------------------------------------------------------------
            LR test of indep. eqns. (rho = 0):   chi2(1) =    47.86   Prob > chi2 = 0.0000
            
            .
            ...instead of selected/non selected (as reported in Stata 15):
            Code:
            . use http://www.stata-press.com/data/r15/womenwk
            
            . heckman wage i.educ,select(age)
            
            Iteration 0:   log likelihood = -5400.8769 
            Iteration 1:   log likelihood = -5383.0138 
            Iteration 2:   log likelihood = -5382.4071 
            Iteration 3:   log likelihood = -5382.4068 
            
            Heckman selection model                         Number of obs     =      2,000
            (regression model with sample selection)              Selected    =      1,343
                                                                  Nonselected =        657
            
                                                            Wald chi2(3)      =     344.40
            Log likelihood = -5382.407                      Prob > chi2       =     0.0000
            
            ------------------------------------------------------------------------------
                    wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
            wage         |
               education |
                     12  |   1.357748   .3987013     3.41   0.001     .5763079    2.139188
                     16  |   4.920785   .3878262    12.69   0.000      4.16066     5.68091
                     20  |   9.764491   .6296831    15.51   0.000     8.530334    10.99865
                         |
                   _cons |   22.92084   .3929268    58.33   0.000     22.15072    23.69096
            -------------+----------------------------------------------------------------
            select       |
                     age |   .0468164   .0034605    13.53   0.000     .0400339    .0535989
                   _cons |  -1.227563   .1249251    -9.83   0.000    -1.472412   -.9827143
            -------------+----------------------------------------------------------------
                 /athrho |   -.811626   .0976252    -8.31   0.000    -1.002968   -.6202842
                /lnsigma |   1.832919   .0310983    58.94   0.000     1.771968    1.893871
            -------------+----------------------------------------------------------------
                     rho |  -.6704863   .0537376                     -.7628377   -.5513259
                   sigma |   6.252111   .1944302                      5.882416     6.64504
                  lambda |  -4.191954   .4454765                     -5.065072   -3.318837
            ------------------------------------------------------------------------------
            LR test of indep. eqns. (rho = 0):   chi2(1) =    47.86   Prob > chi2 = 0.0000
            
            .
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment

            Working...
            X