Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Asking: STATA commands for semiparametric/nonparametric Heckman Selection Model

    Dear STATA users:

    I am dealing with a problem about missing not at random. I am using a longitudinal data on ageing. In my data, majority of the missing are due to the death of the respondents (about 50% percent of them died in the follow-up waves). I wanted to use Heckman Selection Model to address this selection issue but my DV does not follow normal distribution (please see the figure about the distribution). Therefore, I am looking for a semi-parametric version of HSM but haven't found yet. If anyone knows the STATA codes, could you kindly share with me? I need it urgently because it is about my dissertation and the work has gotten stuck for a couple mof months. Thanks a lot!
    Attached Files
    Last edited by Yang Yi; 22 Nov 2014, 19:46.

  • #2
    I think what you're looking for is a Stata implementation of

    http://www.sciencedirect.com/science...0440769390111H

    but I couldn't find one.
    Jorge Eduardo Pérez Pérez
    www.jorgeperezperez.com

    Comment


    • #3
      Hi, Jorge, thank you very much for your reply!

      Yes, there are a couple of articles about the semi-parametric and non-parametric versions but I can't find the STATA commands for that...

      Comment


      • #4
        I think you are going to have to program this yourself. The following is untested preliminar code for the Ahn Powell 1983 estimator, to get you going:

        Code:
        * Untested code
        * s selection
        * v instrument
        * Obtain conditional expectation of selection dummy
        lpoly s v, degree(0) at(v) gen(grid shat) bw(0.1)
        
        * Build all possible pairwise combinations
        gen id = _n
        
        tempfile a
        save `a'
        keep id shat s
        ren shat shato
        ren id ido
        ren s so
        cross using `a'
        
        * Weights (normal kernel)
        gen weight = 1/0.1 * normalden(shat-shato)*s*so
        
        * Estimator
        reg y x [w=weight]
        Jorge Eduardo Pérez Pérez
        www.jorgeperezperez.com

        Comment


        • #5
          Dear Jorge,

          Thank you very much for your help! In my study, when I use 'heckprob' for the key variables (I mean the variables I am interested in), heckprob works well. But when I add control variables, the Rho become non-significant. So I run 'prob' in the same model. The results of the two method look similar. So I run Hausman test (please see the attachment) to see if there is difference between these two methods (actually to see if the selection issue is still in the model although Rho is not significant in HSM). The Hausman test is significant. Does this mean my data does not meet the restriction of HSM and I have to use a semi-parametric HSM? Thank you!

          Attached Files

          Comment


          • #6
            Please note the strong preference on Statalist for registration with full names (first and last) (FAQ 6). You can re-register via the CONTACT US button at the bottom right of the page.
            Steve Samuels
            Statistical Consulting
            [email protected]

            Stata 14.2

            Comment


            • #7
              If you're using heckprob because your outcome variable is discrete then the Ahn and Powell article will not help. That assumes an essentially continuous response variable.

              I'm puzzled by your comments about the heckprob estimates. If when you add a full set of controls you get a small, insignificant estimate of rho, then you are done: you can report the results both ways, but it appears you don't have much of a problem. Why would you do a Hausman test in this context? The test of rho = 0 is the likelihood ratio test.

              As a final comment, applications of selection methods to account for people dying have always puzzled me. I'm not saying it's wrong, but what is the population you're interested in? I see people make sample selection adjustments for health care use to account for people who die. Do we really want to ask the counterfactual, "How much health care would that person have used if alive?" I'm skeptical. JW

              Comment


              • #8
                Dear Prof. Wooldridge,

                Thank you very much for your comments! Yes, I agree with you that it may be not appropriate to treat missing due to death as a sample selection issue because people DON'T choose to die. I have this concern at the very begining of my study. I tried Heckman Models because missing due to death is missing not at random. Heckman model addresses missing not at random. Meanwile, in this study, on one hand, cognitive impairment is highly asoociated with mortality. On the other hand, the indepenent variables, such as poverty, is associated with mortality, too. Old people living in poverty are more likely to have cognitive impairment and more likely to die. Therefore, I can't exclude the died people. That is the main concern I have and I am stilling looking for an appropriate method to address this mising due to death issue.

                These days, I am reading your book and notice the Tobit model (type II and type III) are similar to Heckman models but do not require normality. Can I say death is a truncation? Do you think Tobit model could be a more appropriate method to address missing due to death in a longitudinal dataset? Or could you recommand any methods to me? I am a new student in this field and willing to learn more. Thank you very much!
                Last edited by Yang Yi; 07 Jul 2015, 22:19.

                Comment


                • #9
                  Dear Prof. Wooldridge,

                  Regarding your comment : "If when you add a full set of controls you get a small, insignificant estimate of rho, then you are done: you can report the results both ways, " Yes, when I use heckprob, the rho become insignificant only when I add the control variables into the model. The rho is significant when I only have the key variables in the model.

                  What do you mean by 'you can report the results both ways'? Sorry, I don't get it. Could you explain more about it ? Thank you!

                  Comment

                  Working...
                  X