Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with Heckman model: Dependent variable never censored because of selection

    Dear all, I am incurring in a strange (to me) problem when trying to apply the Heckman model to my data. In particular, I have a dependent variable (job satisfaction) which is observed only among those are employed (occupati), otherwise it has missing value. Since I want to compare women's and men's job satisfaction, I would like to correct for their different likelihood of being in employment, using as instrumental variable the marital status (married vs. not married).

    when running
    Code:
    heckman job_sat i.job_1dgt, select(occupato=i.married   i.job_1dgt   )
    Stata gives me the message that Dependent variable never censored because of selection, but why? Did I made some mistake in preparing the dataset?

    Following my data:


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double(job_1dgt job_sat) float(occupato married)
    .  . 0 0
    .  . 0 0
    3  2 1 0
    2 10 1 1
    6  9 1 0
    8  6 1 1
    5 10 1 0
    3  8 1 0
    5  6 1 0
    5 10 1 0
    7  9 1 1
    4  9 1 0
    .  . 0 0
    5 10 1 0
    2 10 1 0
    6  6 1 0
    8  7 1 0
    .  . 0 0
    .  . 0 1
    4  8 1 0
    end
    label values job_1dgt prof1
    label def prof1 2 "professioni intellettuali, scientifiche e di elevata special", modify
    label def prof1 3 "professioni tecniche", modify
    label def prof1 4 "professioni esecutive nel lavoro d'ufficio", modify
    label def prof1 5 "professioni qualificate nelle attività commerciali e nei se", modify
    label def prof1 6 "artigiani, operai specializzati e agricoltori", modify
    label def prof1 7 "conduttori di impianti, operai di macchinari fissi e mobili", modify
    label def prof1 8 "professioni non qualificate", modify
    label values job_sat c73
    label def c73 10 "completamente soddisfatto", modify

    I am really in troubles!! Thanks a lot in advance for your help.

    Best, G.P.

  • #2
    You need variation in your independent variables, otherwise how are you modeling the censoring process?

    Comment


    • #3
      Dear Andrew, thanks for your answer. I do not mean what do you think that I need variation: my dependent variable has variation, but of course it is observed only within people employed and not in people outside employment... may you be clearer please?

      Comment


      • #4
        As I said, the issue is with your independent variables. In selection models, your outcome is 0/continuous or missing/continuous. Therefore, you first model the censoring process (what determines selection to the 0 category) and then use the information that you obtain to model the continuous process. There is no variation in your independent variables to accomplish the former.

        Comment


        • #5
          Sorry, but I do not see the point. You mean the independent in the selection outcomes? I have variation in it, isnt'it? So what should I fix?

          Sorry but I am quite new to these models and I really do not see what I am missing.

          Thanks, Giorgio

          Comment


          • #6
            In words, job_1dgt is always missing whenever the selection indicator is zero. As a result of listwise deletion, the remaining sample consists only of the observations for which job_sat is continuous, and you cannot model the censoring process.

            Comment


            • #7
              Dear Andrew, thank you for the explanation, now it is totally clear.

              But at this point, one question comes on my side: is there any way of studying the effect of some work-related variables (like job_characteristics in the case i did before) while controlling for selection into employment) on job satisfaction and at the same time controlling for the selection into employment? In other terms, can I include some covariates in the second equation which are systematically missing for individuals with value 0 in the selection equation?


              Thanks really a lot

              Comment


              • #8
                Originally posted by Giorgio Piccitto View Post
                In other terms, can I include some covariates in the second equation which are systematically missing for individuals with value 0 in the selection equation?
                No - not if they coincide with the outcome - as they will eliminate the observations corresponding to the non-selected sample due to listwise deletion. The maximum likelihood estimator does the estimation in a single step, but you can do it in two stages. The first is a probit where the selection indicator is the outcome - and if your independent variables are available only for the positive category, you cannot estimate the model. The second stage is an augmented regression for the continuous sample (including inverse Mills ratios from stage 1).
                Last edited by Andrew Musau; 08 Jul 2022, 12:53.

                Comment


                • #9
                  There is a way to solve this, but you require an instrumental variable for job_1dgt, and if you're putting in separate dummies for each outcome (except the base) it gets even harder. I discuss this in my 2010 MIT Press book. See Procedure 19.2. The idea is that y2, the missing explanatory variable, cannot be included in the probit, and so you have to instrument for it.

                  Comment


                  • #10
                    Dear Andrew and Jeff,

                    thanks a lot for your answer.

                    So Andrew, if I got correctly you suggest to do the procedure 'manually' with the two-steps method, estimating a first selection equation with occupation as Y and including only variables that are observed in both values of occupation (0 and 1, so basically variables that do not refer to a job characteristic) + the variable that I choose to meet the exclusion restriction; then I save the inverse Mills ratio, and in the final equation I can add all the variables I had in the selection equation (except for the exclusion restriction variable) + other variables referring to job characteristics. I correctly got your point?

                    Jeff, thanks a lot for your tip, I'll go to read the procedure you are suggesting.

                    Really thanks to both.

                    Best, Giorgio

                    Comment


                    • #11
                      Originally posted by Giorgio Piccitto View Post
                      So Andrew, if I got correctly you suggest to do the procedure 'manually' with the two-steps method, estimating a first selection equation with occupation as Y and including only variables that are observed in both values of occupation (0 and 1, so basically variables that do not refer to a job characteristic) + the variable that I choose to meet the exclusion restriction; then I save the inverse Mills ratio, and in the final equation I can add all the variables I had in the selection equation (except for the exclusion restriction variable) + other variables referring to job characteristics. I correctly got your point?
                      Yes, but you do not need to do this manually as you can ask for the two-step consistent estimator in place of MLE. In case you have to do it manually, you need to correct the standard errors in the second stage, e.g., by bootstrapping.

                      Code:
                      webuse womenwk
                      g employed= !missing(wage)
                      heckman wage educ age, select(employed=married children educ age) twostep
                      
                      *BY HAND
                      probit employed married children educ age
                      predict employedhat, xb
                      gen invMillsratio = normalden(employedhat)/normal(employedhat)
                      
                      
                      capture program drop se_correction
                      program define se_correction, eclass
                      tempname holding
                      cap drop employedhat invMillsratio
                      probit employed married children educ age
                      predict employedhat, xb
                      gen invMillsratio = normalden(employedhat)/normal(employedhat)
                      regress wage educ age invMillsratio
                      matrix `holding'=e(b)
                      ereturn post `holding'
                      ereturn local cmd="bootstrap"
                      end
                      
                      bootstrap _b, reps(1000) nowarn nodots: se_correction
                      Res.:

                      Code:
                      . heckman wage educ age, select(employed=married children educ age) twostep
                      
                      Heckman selection model -- two-step estimates   Number of obs     =      2,000
                      (regression model with sample selection)              Selected    =      1,343
                                                                            Nonselected =        657
                      
                                                                      Wald chi2(2)      =     442.54
                                                                      Prob > chi2       =     0.0000
                      
                      ------------------------------------------------------------------------------
                                   |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                      -------------+----------------------------------------------------------------
                      wage         |
                         education |   .9825259   .0538821    18.23   0.000     .8769189    1.088133
                               age |   .2118695   .0220511     9.61   0.000     .1686502    .2550888
                             _cons |   .7340391   1.248331     0.59   0.557    -1.712645    3.180723
                      -------------+----------------------------------------------------------------
                      employed     |
                           married |   .4308575    .074208     5.81   0.000     .2854125    .5763025
                          children |   .4473249   .0287417    15.56   0.000     .3909922    .5036576
                         education |   .0583645   .0109742     5.32   0.000     .0368555    .0798735
                               age |   .0347211   .0042293     8.21   0.000     .0264318    .0430105
                             _cons |  -2.467365   .1925635   -12.81   0.000    -2.844782   -2.089948
                      -------------+----------------------------------------------------------------
                      /mills       |
                            lambda |   4.001615   .6065388     6.60   0.000     2.812821     5.19041
                      -------------+----------------------------------------------------------------
                               rho |    0.67284
                             sigma |  5.9473529
                      ------------------------------------------------------------------------------
                      
                      . 
                      . 
                      . 
                      . *BY HAND
                      
                      . 
                      . probit employed married children educ age
                      
                      Iteration 0:   log likelihood = -1266.2225  
                      Iteration 1:   log likelihood = -1031.4962  
                      Iteration 2:   log likelihood = -1027.0625  
                      Iteration 3:   log likelihood = -1027.0616  
                      Iteration 4:   log likelihood = -1027.0616  
                      
                      Probit regression                               Number of obs     =      2,000
                                                                      LR chi2(4)        =     478.32
                                                                      Prob > chi2       =     0.0000
                      Log likelihood = -1027.0616                     Pseudo R2         =     0.1889
                      
                      ------------------------------------------------------------------------------
                          employed |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                      -------------+----------------------------------------------------------------
                           married |   .4308575    .074208     5.81   0.000     .2854125    .5763025
                          children |   .4473249   .0287417    15.56   0.000     .3909922    .5036576
                         education |   .0583645   .0109742     5.32   0.000     .0368555    .0798735
                               age |   .0347211   .0042293     8.21   0.000     .0264318    .0430105
                             _cons |  -2.467365   .1925635   -12.81   0.000    -2.844782   -2.089948
                      ------------------------------------------------------------------------------
                      
                      . 
                      . bootstrap _b, reps(1000) nowarn nodots: se_correction
                      
                      Bootstrap results                               Number of obs     =      2,000
                                                                      Replications      =      1,000
                      
                      -------------------------------------------------------------------------------
                                    |   Observed   Bootstrap                         Normal-based
                                    |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                      --------------+----------------------------------------------------------------
                          education |   .9825259   .0525306    18.70   0.000     .8795678    1.085484
                                age |   .2118695   .0226484     9.35   0.000     .1674794    .2562596
                      invMillsratio |   4.001616   .6006486     6.66   0.000     2.824366    5.178865
                              _cons |   .7340391   1.229088     0.60   0.550    -1.674929    3.143008
                      -------------------------------------------------------------------------------
                      
                      .

                      Comment


                      • #12
                        Dear Andrew, I was trying to adopt your procedure in order to estimate the Heckman, but there is something I do not get: once I did the twostep Heckman, I tried to ask for the margins VAR, predict(ycond), but Stata gives me the answer that predict option yexpected not appropriate with margins.

                        Is this related to the twostep specification? How could I ask for margins in the proper way?

                        Thanks a lot, Giorgio

                        Comment


                        • #13
                          See

                          Code:
                          help heckman postestimation
                          You do not need margins, but the predict command.

                          Code:
                          predict ycond, ycond

                          Comment

                          Working...
                          X