Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Heckman selection model with random effects


    Hello Everyone,
    I have a cross-sectional dataset and would like to estimate a Heckman sample selection model (heckprob: Probit model with sample selection) by using Stata 11. My data is hierarchical (households within villages and within regions). Therefore, I would like to include those variables, as random effects, for both the selection (probit) model and the second stage (probit) model.
    Apparently, this can be done using gllamm command, though I did not figure out yet how to define the selection model.
    My question is whether the following simple alternative could be valid:
    1. Estimate the random effects (villages, regions) probit part of the model using xtprobit.
    2. Calculate the inverse Mills ratio from the results, which equals
    Invmills = normalden(linear_pred)/normal(linear_pred)
    3. Include the Mills as an additionally explanatory variable in the second stage regression to control for selectivity bias by using either xtprobit or gllamm including again the random effects (villages, regions)
    In a 2005 post from Statalist I have seen that this approach might not be adequate for panel data (measurements over time) with fixed effects. As my data is different I wonder: - Is this approach correct? - Is there any other better approach I could use? Many thanks in advance for your help!

  • #2
    My data is hierarchical (households within villages and within regions). Therefore, I would like to include those variables, as random effects,
    To me, the "therefore" does not logically follow. Just as one does not necessarily model panel data using random effects models (there are e.g. also "fixed effects" models), one does not have to model hierarchical data using random effects models (a.k.a. mixed models). There are different ways of handling the within-cluster correlation arising from the nested nature of the data. I suspect that you could apply the Heckman relatively straightforwardly if you did not insist on using a random effects model in your main outcome equation. [If you do persist with random effects models and you have Stata 13, as we assume you have (see the FAQ), then looked at the mixed suite of models, including meprobit. rather than xtprobit. Amongst other things, this would allow you to use cluster-robust SEs, where the cluster can be defined at e.g. region level.)

    Comment


    • #3
      Thank you very much for your suggestions.
      Unfortunately, as I stated in my previous post, I’m using the Stata 11 updated version and I can not use meprobit. I looked at Stata 11 mixed models (Multilevel mixed-effects logistic regression) and I tried the xtlogit command (meqrlogit in Stata 13):

      xtmelogit y1 x1 x2 x3 x4 x5 x6 x7 x8,|| village: || region:
      predict linear_pred, xb
      gen Invmills = normalden(linear_pred)/normal(linear_pred)
      xtmelogit y2 Invmills x1 x2 x3 x4 x5 x6 x7 x9 x10 x11 x12,|| village: || region:

      Would this approach be correct?

      Thanks again,

      Comment


      • #4
        You did mention having Stata 11, so I agree that my remarks about meprobit were probably not helpful. Apologies. But you are persisting with using random effects models and, as I also said, I don't think one should necessarily go there (why impose such structure on the model?) gllamm on SSC would allow you to cluster standard errors, I think.

        In any case, I am not aware of any paper that has developed the "Heckman Two-Step" method that is appropriate in a multilevel context -- I would think there are potentially tricky issues concerning how to treat the correlation between the errors in the selection model and the errors in the main equation model. [Such a paper may exist; I don't know of it.] So, you won't get confirmation from me regarding your proposed approach.

        Comment

        Working...
        X