Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • ​​​​​​​IV estimation in binary models - endogenous variable binary vs. continuous *

    Hello, everyone,

    I have a question regarding IV estimation in binary models with one endogenous variable.

    If the endogenous variable can be coded either as a binary variable or a continuous variable, from an econometrics perspective, assuming the IV is a strong IV, is it better to code the endogenous variable as binary or continuous?
    • If the endogenous variable is coded as binary, the first stage estimation is non-linear
    • If the endogenous variable is coded as continuous, the first stage estimation is linear
    Besides that Stata ivprobit only allows for continuous endogenous variable, for a binary response model with IV estimation, will the first-stage being linear have certain advantages such as the estimation being more consistent and less biased?


    Thank you for your help!
    C.

  • #2
    One cannot code binary variable as continuous, it has two levels and this is all there is. For continuous variables you can code them as binary, e.g., discretising them by say coding the values below the median as 0s and whatever is above the median as 1s, but doing this entails loss of information.

    In short it is better to stay as close to the "true nature" of the variable rather than fabricate versions of it that fit particular econometric estimation technique.

    2SLS is appropriate for all kinds of variables.

    If you want to go for a nonlinear model when you have a binary dependent variable, ivprobit for continuous endogenous regressor, and biprobit for binary endogenous regressor.

    Comment


    • #3
      Thank you for your advice, Joro.

      Let me be a bit more specific of the endogenous variable. I have an endogenous construct (a person's volunteering experience)
      • it can be measured by a binary variable (Volunteering =1 if the person has ever participated in volunteering activities, =0 otherwise). This variable will be coded based on data from the survey questionnaire "have you ever participated in any voluntary work in the past two years?"
      • Or it can be measured by a continuous variable (Volunteering Diversity, coded as Blau's Index). This variable will be coded based on a different survey questionnaire where a respondent reports the different voluntary organizations he/she has engaged in.
      The dependent variable is binary, the IV is categorical (=1, 2, 3). I used 2SLS to see the IV strength and it is a strong IV. The ivprobit with Volunteering Diversity works really well, but the biprobit with Volunteering does not have significant results in the first and second stage. The 2SLS (ivreg2) with Volunteering and Volunteering Diversity, respectively, has produced significant first stage results but not significant at 2nd stage.

      So I am wondering, if we need to run IV analysis in the binary response model, is it better that the endogenous construct be measured by a continuous variable?

      Comment

      Working...
      X