Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Instrumental variables with binary endogenous regressor

    Hi Stata listers,

    I am estimating the following model using Instrumental Variables:
    Y = B0 + B1D + B2X + U, where D is an endogenous dummy variable.

    In order to avoid the forbidden regression, I'm following Wooldridge (2002):
    1. Estimate D = A0 + A1Z + A2X + V using a probit model, and calculate the fitted value, Dhat.
    2. Estimate the main equation by IV using Dhat as instrument.

    My question is: should I include in step 2 the instrument Z or should I only use Dhat as instrument (together with X as instrument of itself)?

    Many thanks for your help.

    Maria

  • #2
    I gather the biprobit model would fit your needs.
    Best regards,

    Marcos

    Comment


    • #3
      Hi Maria,
      See pages 142-144 of Mostly Harmless Econometrics by Angrist and Pischke. I believe you need to only include only Dhat and X.

      Comment


      • #4
        What you need is that Dhat not be correlated with the residual is the original model: Y = B0 + B1D + B2X + U,
        A probit model in the first stage will not give you that because it is nonlinear.
        Just run a 2SLS regression using Z as an instrument for D: ivregress 2sls Y (D = Z) X

        Comment


        • #5
          Dear Maria,

          Just to add to the variety of (late) advice you are getting, I would suggest using both Z and Dhat as instruments. That is what we did in (see the top of page 291):

          Windmeijer, F. and Santos Silva, J.M.C. (1997), Estimation of Count Data Models with Endogenous Regressors; An Application to Demand for Health Care, Journal of Applied Econometrics, 12(3), pp. 281-294.

          Best wishes,

          Joao

          Comment


          • #6
            @João: Why would using an LPM in the first stage be wrong? And since the Dhat from a probit model is non-linear in X and Z, is there not a risk that it still be linearly correlated with U as defined in the equation in #1 above?

            Comment


            • #7
              Dear Eric de Souza,

              A LPM in the first stage is not wrong and that is actually what should be done.

              The question is what instrument to use in the first stage. We can just use Z, but that is unlikely to be a good instrument if Z is continuous and D binary. An alternative is to replace Z with E(D|Z,X), which is likely a better instrument; this can be approximated using a logit or probit. What we did in the 1997 paper was actually to use both Z and an estimate of E(D|Z,X) obtained with a logit.

              So, the command I would suggest is something like (which has a LPM in the first stage):

              ivregress 2sls Y (D = Z Dhat) X

              Does is make sense?

              Best wishes,

              Joao

              Comment


              • #8
                Provided E(U|X,Z) = 0, using either Z, Dhat, or both produces consistent estimators. Joao's proposal has the advantage of being asymptotically no worse than either of the other two approaches if Var(U|X,Z) is constant. It also produces an overidentifying restriction, although I'm not sure how valuable that is as a specification test in this context.

                A potential downside is that using (Z,Dhat) is more likely to lead to a weak instruments problem. If the probit model for D is correct then Z is redundant as an IV. Therefore, the rule for having a strong enough instrument -- roughly, the first-stage F statistic is above 10 -- would be harder to achieve.

                Comment


                • #9
                  Dear all,

                  Do we get the proper standard errors when running the following command that you suggested:
                  ivregress 2sls Y (D = Z Dhat) X
                  Thanks a lot,

                  Mat

                  Comment


                  • #10
                    Mat: Yes, the standard errors are valid provided the error satisfies the zero conditional mean restriction in my earlier post (which is essentially needed for consistency, anyway). I show this in Chapter 6 of my MIT Press book. You should make them robust to heteroskedasticity, as usual.

                    Comment


                    • #11
                      Hi Jeff and Joao:

                      In the above example, which first stage F-stat should be reported? D = A0 + A1Z + A2X + V or D= AZ+ D-hat?

                      Comment


                      • #12
                        Originally posted by Devon Smith View Post
                        Hi Jeff and Joao:

                        In the above example, which first stage F-stat should be reported? D = A0 + A1Z + A2X + V or D= AZ+ D-hat?
                        Devon, if you only use Z as the instrument, then the first-stage regression is as below and the F stat is only related to Z.
                        Code:
                        regress D Z X
                        If you use both Z and Dhat as instruments, then the first-stage regression would be as below and the F stat is related to the joint significance of Z and Dhat.

                        Code:
                        regress D Z Dhat X
                        ADD: But you should't "manually" run the first stage. Use -ivregress- and correctly specify the set of instruments there. After -ivregress-, run -estat firststage- (for homoskedasticity) or -weakivtest- (for heteroskedasticity, from SSC) to obtain the correct F stat.
                        Last edited by Fei Wang; 02 Jul 2022, 20:45.

                        Comment


                        • #13
                          Hi Fei:

                          Thanks for your reply. I am using an interaction with the endogenous variable in my model: y=ax+ bx*y where x is binary and endogenous. I have an instrument for x, z. I am trying to do a first stage of the form x=c.z and obtain xhat. Then I plan on using xhat and xhat*y as instrument. I am using the ivregress 2sls command since I am using the svy-prefix and ivreg2 is not supported with svy. Moreover, I can't run estat first after ivregress 2sls as well since that is also not supported with svy. That's why I was wondering how to get the first stage F "manually."

                          Comment


                          • #14
                            Hi Devon: My former student, Ruonan Xu at Rutgers, has written on exactly this problem. She shows that using the probit fitted values can strengthen the IVs and explores the effective F statistic for determining weak IVs. You can use -weakivtest- after obtaining the probit fitted values. You can add Z as extra instruments but it might weaken the group as a whole.

                            https://www.sciencedirect.com/scienc...ZWpyxaZcaFZhvA

                            Comment


                            • #15
                              Hi Jeff: Thanks for the link! This is exactly what I was looking for.

                              Comment

                              Working...
                              X