Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Instrumental Variable Probit using Panel Data

    I am trying to implement, IV regression in which the second stage is probit. I use correlated random effect approach suggested by @Jeff Wooldridge.
    This is the code that I am using
    Code:
    by pid, sort : egen x1_bar = mean(x1)
    by pid, sort : egen x2_bar = mean(x2)
    by pid, sort : egen iv_bar = mean(iv)
    program drop my2sls
    program my2sls
        reg x1 x2 iv x1_bar x2_bar iv_bar i.time
        predict x1_hat, xb
        probit y x1_hat x2 x1_bar x2_bar i.date, cluster(pid)
        drop follow_hat
    end
    bootstrap, reps(100): my2sls
    Everything is correct?
    Any other suggestions?

  • #2
    You need to use a control function approach combined with CRE. Get the residuals from the first regression, not fitted values. Add resids to second stage with x1 and the other variables, not x1hat.

    Is x1 roughly continuous?

    Comment


    • #3
      Dear Professor Wooldridge,
      Thank you very much for your reply! Yes it is continuous, but my IV takes the maximum value of 4.

      Code:
      by pid, sort : egen x1_bar = mean(x1)
      by pid, sort : egen x2_bar = mean(x2)
      by pid, sort : egen iv_bar = mean(iv)
      program drop my2sls
      program my2sls
          reg x1 x2 iv x1_bar x2_bar iv_bar i.time
          predict x1_resid, resid
          probit y x1 x1_resid x2 x1_bar x2_bar i.date, cluster(pid)
          drop x1_resid
      end
      bootstrap, reps(100): my2sls
      Is it correct now? Or do I also need to define a mean value for x1_resid?

      Follow up question: How can I use Stock-Yogo recommendations to test the strength of the instrument?

      Can I just simply run ivreg2 and omit the output?


      Comment


      • #4
        Very good intuition! In Lin and Wooldridge (2019), we show that you should include the time average of the instruments in both stages. You can find the unpublished version on Wei's website. In your setting, this is numerically identical to including the time average of x1_resid, but without the problem of a generated regressor. So use

        Code:
        probit y x1 x1_resid x2 x1_bar x2_bar iv_bar i.date, cluster(pid)
        You can actually use the cluster-robust t statistic on x1_resid to test the null that x1 is exogenous with respect to the idiosyncratic shocks.

        We don't have good theory for determining strong vs weak IVs in nonlinear contexts. What you propose is what I suggest: act as if the model of interest is also linear and use the diagnostics for linear models. You might look at the user-written command weakivtest in addition to ivreg2.

        I just noticed something about your bootstrap. Using cluster(pid) does nothing because your bootstrap standard errors will be robust to clustering as well as the two-step estimation. But now I don't see how you have told Stata you want to use the panel bootstrap. I should have some old code that does that.

        JW

        Comment


        • #5
          Professor Wooldridge, thank you very much! This information is gold. Your old code that does that is publicly available somewhere? Or could I send you an email for the code if it is at your fingertips?
          Best regards,
          Vera

          Comment


          • #6
            Email is fine. I should be able to find it pretty easily.

            Comment


            • #7
              Originally posted by Jeff Wooldridge View Post
              Very good intuition! In Lin and Wooldridge (2019), we show that you should include the time average of the instruments in both stages. You can find the unpublished version on Wei's website. In your setting, this is numerically identical to including the time average of x1_resid, but without the problem of a generated regressor. So use

              Code:
              probit y x1 x1_resid x2 x1_bar x2_bar iv_bar i.date, cluster(pid)
              You can actually use the cluster-robust t statistic on x1_resid to test the null that x1 is exogenous with respect to the idiosyncratic shocks.

              We don't have good theory for determining strong vs weak IVs in nonlinear contexts. What you propose is what I suggest: act as if the model of interest is also linear and use the diagnostics for linear models. You might look at the user-written command weakivtest in addition to ivreg2.

              I just noticed something about your bootstrap. Using cluster(pid) does nothing because your bootstrap standard errors will be robust to clustering as well as the two-step estimation. But now I don't see how you have told Stata you want to use the panel bootstrap. I should have some old code that does that.

              JW


              Dear Professor Wooldridge and dear Mihailovna, thanks for your valuable discussions. I would also like to ask how to tell Stata that we want to use the panel bootstrap, any suggestions?

              Also, my rough understanding is that the result we get here from bootstrap is BSE (bootstrapped standard errors), an approximate estimate of ACSE (asymptotically correct standard errors), so do we have a way to compute the ACSE?


              Best wishes,

              Ke

              Comment

              Working...
              X