Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • 2SLS with a binary endogenous variable

    Hello, I'm in interested in examining the effect of an endogenous dummy variable, D, that on the dependent variable Y. i.e., D has a potential self-selected issue.

    If I run:

    Yii +βDi+Xλ +ei

    β is apparently biased and inconsistent so what I'd like to do is run a two-stage probit model, where in the first stage I use a criterion function:


    D*i=Zσ +ei (1)

    where Z a vector of exogenous variables that include at least one instrument variable.

    Then I plugged the fitted value of D and the inverse mill ratio variable into the structural model:

    Yii +β'Dhati +Xλ +τ INVERSEMILL + ei (2)

    if τ is significant, there is self-selection issue, then I report coefficient for β'
    if τ is not significant, there is no self-selection issue, then using OLS and report coefficient for β is fine.


    However, I've been told that this 2SLS with a binary endogenous variable is called Forbidden regression and yields biased and inconsistent estimates of β'. is this model wrong? or what's the solution to an endogenous binary variable.

    Thank you very much.



  • #2
    Hi Alex,

    Check out the help for
    treatreg

    which in Stata 15 is called
    etregress

    This pre-programmed command does what you want to do.

    Comment


    • #3
      Originally posted by Joro Kolev View Post
      Hi Alex,

      Check out the help for
      treatreg

      which in Stata 15 is called
      etregress

      This pre-programmed command does what you want to do.
      Thanks. But I was concerned with the consistency and bias with this estimator, not how I will do it

      Comment


      • #4
        Jeff Wooldridge's panel data book discusses this. It also comes up on Statalist from time to time.

        Short version: basic IV is fine (consistent umder std assumptions) using your instruments and other variables as is, but Jeff outlines a procedure that is more efficient (via constructing transformed instruments and then doing basic IV). If you look around the Statalist archives you should find the discussion.

        Comment


        • #5
          Alex,

          1. If you want to know whether you have an endogeneity issue, or can just do OLS, do treatreg, then OLS and then do a Hausman test. This will tell you whether you have endogeneity issue or not, and is all preprogrammed so it will be hard to get it wrong.

          2. If you want to address endogeneity, but for any reason do not want to use treatreg, as Mark said there are two simple strategies that work:
          a) Standard 2SLS (ivregress) where you disregard the fact that your endogenous variable is binary.
          b) First stage probit, but then you do not plug in the predicted values, but use the predicted values from the Probit as instruments in ivregress.

          3. What you re describing sounds like a Control Function procedure which Wooldridge describes somewhere, but you are doing two things wrong/different from Wooldridge description, and therefore the properties of your procedure are not known.
          a) In control function you do not plug in the predicted values from the first stage. You just keep the endogenous Di as it is, so in your eq.(2) you should not have Dhat, but D.
          b) In your eq.(2) you should not have Inverse Mills ratio, but rather something called Generalised Residual/Error, which has two terms both functions of the Inverse Mills ratio.


          Comment


          • #6
            Thank you Joro, I have two questions regarding your generous reply.
            Originally posted by Joro Kolev View Post

            b) First stage probit, but then you do not plug in the predicted values, but use the predicted values from the Probit as instruments in ivregress.
            I'm not sure what you mean by use the predicted values from the probit as instruments. You mean in the second stage, I keep endogenous Di and use Dihat as instrument?


            Originally posted by Joro Kolev View Post

            b) In your eq.(2) you should not have Inverse Mills ratio, but rather something called Generalised Residual/Error, which has two terms both functions of the Inverse Mills ratio
            Are Generalised Residual/Error also called selectivity variables? and How do I calculate them? Thank you.

            Comment


            • #7
              Thank you guys for the reply. I want to be more explicit for my case.

              I want to examine the effect of a state program (dummy variable D) on the unemployment rate (Y) . My observation unit is city i.

              for OLS, I'd run :
              Yii +βDi+Xλ +ei

              However, this program is generally located in cities that tend to have a lower unemployment rate. so D is not random and we have a selectivity bias.

              Now I can of course do this traditional IV approach by finding an instrument.

              What I also want to do for another approach is that

              (1) do a probit model first to predict the likelihood the city i is being selected to have the program D.

              D*i=Zσ +ei (1)

              (2) do an OLS with selectivity variable included in the second stage

              Yii +β'D +Xλ +τ Selectivity + ei (2)


              and my questions are that

              (1) is β' consistent and unbiased.

              (2) is this approach called Heckman correction approach or control function approach?

              (3) I know Heckman approach tend to requires variables for D=0 are not observable. But in my case, variables for D=0 are observable. Can I still use this approach?

              (4) Lastly, is the Selectivity variable different than Inverse mill ratio? or is the Selectivity variable just another name for Generalized Residual from a control function approach.

              Thank you.

              Comment

              Working...
              X