Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Instrumental variables regression with proportional (rate) dependent variable

    I am looking for theoretical and practical advice on how to model a particular type of IV regression. The classical case of 2SLS is estimation where y is continuous. 2SLS is inefficient when y is binary or a count variable. Recent applied work has highlighted the value of two stage residual inclusion when y (or x) is binary. In the design of interest, the dependent variable is continuous but bounded from 0 to 1 (it is a proportion). My understanding is that the most appropriate model for rate variables is a GLM with a link function like logit (I've used this approach in a previous, unrelated project). I have not seen documentation of a 2SGLM, however.

    Does anyone have advice on how to implement an IV design with a rate outcome variable and continuous X and Z variables?


  • #2
    If your dependent variable was a 0/1 dichotomy, would ivprobit do what you want? If so, Wooldridge has noted that ivprobit could be easily modified to work with fractional response variables (see especially slide 17):

    http://www.stata.com/meeting/chicago...wooldridge.pdf

    Some of the things that Wooldridge was calling for in that presentation were implemented in Stata 14 with the -fracreg- command. But, as far as I know fractional ivprobit does not have an official Stata command. If you are interested, I do have a fracivp command where I hacked ivprobit to work with fractional variables -- but it is use at your own risk.
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    Stata Version: 17.0 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      Thanks Richard. This sounds very promising. The description in the slides you posted seems to indicate something along the lines of a residual inclusion model for the second stage. Would you mind sharing your code? I'd really appreciate it.

      To complicate the model slightly, my design includes two endogenous regressors as well as unit and time fixed effects.
      Last edited by Austin L Wright; 25 Jun 2015, 14:14.

      Comment


      • #4
        Dear Richard,

        Other than using the residuals from a first stage as an instrument, is there any other adjustment that IVProbit command would need? Do you know of any papers that discuss the econometrics/justification of using a probit model for a fractional dependent variable with these residuals?

        thank you!
        D

        Comment


        • #5
          I discuss this method in Section 18.6 of my 2010 MIT Press book "Econometric Analysis of Cross Section and Panel Data." It is also a special case of Papke and Wooldridge (2008, Journal of Econometrics). A two-step method is very easy provided your endogenous explanatory variable is (roughly) continuous. You can use glm or fracprob.

          Code:
          reg w x1 ... xk z1 ... zm
          predict vhat, resid
          glm y w vhat x1 ... xk, fam(bin) link(probit) vce(robust)
          The standard errors from the glm are not correct if you decide to leave vhat in the equation. You should bootstrap the two estimation steps to get proper standard errors. As Richard pointed out several years ago, you can use ivprobit in a single step if you override the check that y is binary. It's the same problem then. I show that this is consistent in my 2014 Journal of Econometrics paper on quasi-MLE with endogenous explanatory variables.

          Comment


          • #6
            Thank you very much. One thing that I am not super clear about is how to include a second endogenous variable in the control function. If x and x^2 are endogenous, I want to have two instruments for my two endogenous varialbes: vhat and vhat^2. But including these in the glm equation together would be problematic because they are correlated?

            thank you very much!

            D

            Comment


            • #7
              I think you did not understand the procedure at all.

              Residual inclusion/ control function is conceptually different from Instrumental Variables, and it does not work like the IV.

              Professor Wooldridge literally told you how you should do it in #5 above.

              Here is one more thread that you might find useful: https://www.statalist.org/forums/for...ch-with-probit
              and one more thread that you might find useful: https://www.statalist.org/forums/for...quadratic-term

              The second thread above explicitly speaks of dealing with a quadratic term.

              Originally posted by Donovan Pollack View Post
              Thank you very much. One thing that I am not super clear about is how to include a second endogenous variable in the control function. If x and x^2 are endogenous, I want to have two instruments for my two endogenous varialbes: vhat and vhat^2. But including these in the glm equation together would be problematic because they are correlated?

              thank you very much!

              D

              Comment


              • #8
                Dear Joro,

                I understand they are different but Prof Wooldridge explained that the ivprobit command, which takes instruments, can do this in one step. If I have x and x^2, it is not clear how to handle this with the ivprobit command. Does this make sense?

                thank you,
                D

                Comment


                • #9
                  Just to clarify/contrast these procedures:

                  (1) glm
                  glm y x x^2 vhat $controls, fam(bin) link(logit) vce(robust)
                  then bootstrap

                  vs
                  (2) ivprobit
                  ivprobit y $controls, (x x^2 = iv) but this is not identified
                  ivprobit y $controls, (x x^2 = iv iv2)

                  is the glm option correct with two endogenous regressors? Or is this better with ivprobit?

                  thanks
                  best,
                  D

                  Comment


                  • #10
                    I have seen some people on here suggest, incorrectly I believe, that the ivprobit command should be : ivprobit y $controls, (x x^2 = vhat vhat2). I thought this was not how the twostep function works, we should use the iv, not the residuals themselves in this command, right?

                    Comment


                    • #11
                      Donovan: You can trick Stata into doing this by specifying x as endogenous and x^2 as an exogenous variable. Once the control function has been included for x all functions of x are exogenous.

                      While you don't need to include vh^2 in the two-step control function procedure, you can use it to make the functional form more flexible. I recommend this sort of flexibility in my 2015 Journal of Human Resources paper.

                      Comment


                      • #12
                        The following all make sense (and you have to bootstrap the two-step procedures). So vce(robust) is only used for testing the coefficient on vhat or on both vhat and vhat^2. For ivprobit, you have to override the data check or it will make y binary.

                        Code:
                        glm y x c.x#c.x vhat $controls, fam(bin) link(logit) vce(robust)
                        glm y x c.x#c.x vhat c.vhat#c.vhat $controls, fam(bin) link(logit) vce(robust)
                        ivprobit y $controls c.x#c.x (x = iv)

                        Comment


                        • #13
                          Dear Jeff Wooldridge, I was just try running your suggestion in #12 and I obtain very different results.

                          The key difference between the three models that you suggest is that with the ivprobit command
                          Code:
                          ivprobit y $controls c.x#c.x (x = iv)
                          the term c.x#c.x is included in the first stage.

                          Would this actually be the right thing to do?
                          Last edited by Lukas Lang; 08 Jun 2022, 17:56. Reason: Please delete this post. Something went wrong and the same post was added twice. Apologies.
                          ------
                          I use Stata 17

                          Comment


                          • #14
                            Dear Jeff Wooldridge, I was just trying your suggestion in #12 and I obtain very different results.

                            The key difference between the three models that you suggest is that with the ivprobit command
                            Code:
                            ivprobit y $controls c.x#c.x (x = iv)
                            the term c.x#c.x is included in the first stage.

                            Would this actually be the right thing to do?
                            Last edited by Lukas Lang; 08 Jun 2022, 17:51. Reason: correcting typos
                            ------
                            I use Stata 17

                            Comment


                            • #15
                              Hello Jeff Wooldridge ,

                              I need your help with the following.

                              I am running an ivprobit command for a continuous endogenous variable (D) that has a curvilinear effect. I instrumented both the linear and quadratic terms of D on two exogenous variables BSize (continuous) and FC (binary).

                              gen D2= D^2

                              ivprobit SOF ControlVars (D D2= BSize FC) i.Year i.industry, vce(cluster Tickernum)

                              A summary of the long output is presented below:
                              1st Stage 2nd Stage
                              D D2 SOF
                              BSize 0.278* 3.262
                              (0.161) (8.110)
                              FC 6.750*** 352.924***
                              (1.463) (88.784)
                              D 0.186**
                              (0.075)
                              D2 -0.004**
                              (0.002)
                              Constant 6.109** 53.321 -6.076***
                              (2.542) (129.891) (0.933)
                              Year controls Yes Yes Yes
                              Industry controls Yes Yes Yes
                              Control Vars Yes Yes Yes
                              /athrho2_1 : corr(e.D2,e.D) -0.023
                              (0.370)
                              /athrho3_1: corr(e.D,e.SOF) 0.185
                              (0.394)
                              /athrho3_2: corr(e.D2,e.SOF) 1.740***
                              (0.034)
                              /lnsigma2 : SD (D) 2.159***
                              (0.026)
                              /lnsigma3 : SD (D2) 6.129***
                              (0.041)
                              # Observations 2124 2124 2124
                              Wald χ2 269.48***
                              Wald test of exogeneity χ2(1): 6.88; Prob> χ2 = 0.0320
                              I estimated the marginal effects using this command:

                              margins, dydx(D D2) atmeans

                              The reported marginal effects for D and D2 are 0.186** and -0.004**, respectively.

                              But when I tried to verify the inverted U-shape using utest, stata reports that D not found. r(111).

                              My question is: can I use utest after the ivprobit command? if not, how can I verify the inverted U-shape here?

                              Kind Regards

                              Lyle

                              Comment

                              Working...
                              X