Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Logit fits better than probit, but Heckman needed because of selection bias

    Dear Stata connoisseurs!

    I'm working on my first empirical project, analyzing whether a person decides to have a certain vaccination or not.

    Logit fits my data slightly better than probit (Log likelihood -38.01 vs. -38.85 and Pseudo R2 0.308 vs. 0.291).
    But selection bias is / might be an issue, so that I want to use Heckman. To my knowledge a respective command only exists for probit.

    Would you recommend to use the better fitting logit and heckprobit or should I do both (‘normal’ model and model with selection) in Probit giving up some fit but staying in the same specification?

    Thanks a lot for your help!
    Antonio

  • #2
    The big thing is getting the selection right so heckprobit is probably the way to do.

    Comment


    • #3
      I have never encountered a case where the choice of basic logit or probit made much of a substantive difference, and that seems to be true in this case as well. Therefore the choice is usually based on

      (a) whatever everybody else in your field usually uses, or
      (b) you want to use something a little more advanced that does require one or the other.

      You seem to fall under case (b) so I agree with Phil. The difference in fit is extremely trivial so it wouldn't be a factor in my decision.
      -------------------------------------------
      Richard Williams, Notre Dame Dept of Sociology
      StataNow Version: 19.5 MP (2 processor)

      EMAIL: [email protected]
      WWW: https://www3.nd.edu/~rwilliam

      Comment


      • #4
        Well, the reason Heckman goes with a probit model is because of the assumption of bivariate joint normality. More specificially, the error term in the selection equation is assumed as normal, as is the error term in the population linear model. He assumes a bivariate joint normal distribution for these two error terms, leading to a the conditional mean of the error term in the linear model as a linear function of that of the error term in the selection equation. This is due to properties of the joint normal distribution- the logit model assumes the error term follows a logistic distribution- for which Heckman's method is not valid as far as I understand it.

        Comment


        • #5
          Thanks a lot, Phil, Richard and Chinmay for your answers and opinions!

          I kind of liked the idea of having a logit model as it's easier to interpret. Besides the slightly better fit that's why I thought of doing the basic logit model and in the second step use heckprobit.
          But if this is uncommon or causes confusion, I'll probably do the 'normal' model also with probit.

          Comment

          Working...
          X