Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Non-linear IV: Dependent count variable and binary endogenous variable

    Hallo

    I have a problem in my research project where my dependent variable is a (very dispersed) count variable, I have many covariates and there is one key endogenous variable that is binary. I was hoping that someone with experience with this type of models can help me out.

    I have researched potential solutions, in particular Wooldridge 2014 ("Control Functions in Applied Econometrics") and Wooldridge 2015 ("Quasi-maximum likelihood estimation and testing for nonlinear models with endogenous explanatory variables"). I also read the very helpful thread
    https://www.statalist.org/forums/for...ative-binomial

    Overall, there appears to be no "silver bullet" solution. At the end of the day, all models are incorrect, but I am trying to do the best that I can and find the ones that appear more sensible.

    What I have done so far is:

    - Winsorize all count variables (to allay dispersion) and simply run IV 2SLS

    - Run the stata user-command "ivpois" which assumes an exponential conditional mean. However, since my standard errors are clustered, I have to bootstrap which is taking an awfully long time. The fact that the endogenous variable is binary is no issue here, correct?

    - Control Function approach: include residuals from the first stage, which I estimate by OLS, and include into a second stage that is either Poisson or Negative Binomial. Again, bootstrapped clustered standard error for inference. If I would like to present results from the Negative Binomial model, is this the best that I can do? Since I have a binary endogenous variable, this approach might be strictly speaking wrong (but then again, all models have issues).

    - I have started to think about making restrictive assumptions on the structural errors that pertain to the outcome-equation and the equation for the endogenous variable to arrive at a Log-Likelihood Function that I can maximize. However, the Poisson Assumption (maybe I should use a different assumption here?) on the count variable makes it difficult to arrive at an analytic expression for the likelihood - do you have recommendations where to look here? In the worst case, I may to have to simulate or numerically integrate probabilities - what packages would you recommend here?

    Thanks for your help!


  • #2
    Dear Rainer Widmann,

    The case you are considering is exactly the case we studied in

    Windmeijer, F.A.G. and Santos Silva, J.M.C. (1997), Estimation of Count Data Models with Endogenous regressors; An Application to Demand for Health Care, Journal of Applied Econometrics, 12(3), pp. 281-294.

    The estimators discussed in the paper are available in Stata: type "help ivpoisson".

    Best wishes,

    Joao

    Comment


    • #3
      Many thanks!

      As far as I understand, your suggested solution is discussed on page 286 and then page 290. You suggest to use the predicted "first-stage" propensities as instruments in ivpoisson gmm. What confuses me though is that the help of "ivpoisson" clarifies that it is for "Poisson model with continuous endogenous covariates".

      Is this limitation simply not applicable in this particular case?

      Comment


      • #4
        You could also have a look at Stata's etpoisson command for exponential mean models with binary endogenous covariates:
        Code:
        help etpoisson
        Best,
        Joerg

        Comment


        • #5
          Dear Rainer Widmann,

          I am not sure why the help file says that; I believe you can safely ignore that. As Joerg Luedicke (StataCorp) noted, etpoisson is an alternative, but it relies on very strong distributional assumptions.

          Best wishes,

          Joao

          Comment


          • #6
            Dear Rainer Widmann,

            I have the same question. Have you decided to use ivpoisson for your case? Or did you find a better way to do it?

            Comment


            • #7
              Victoria: Despite its name, ivpoisson with the gmm option imposes the fewest assumptions. It can be applied to any nonnegative outcome and any kind of endogenous variables. You simply need enough good instruments.

              I keep hoping Stata will change the name to ivexponential.

              Comment


              • #8
                Dear Dr. Wooldridge:

                (It's actually Zhenzhen here. I am glad to hear from you via different channels.)

                Thank you for your response. I have three further questions:

                1) How can I apply ivpoisson on panel data? Should I bootstrap the standard error or simply using vce (cluster id) could do the trick?

                2) Does it work well for the unbalanced panel?

                3) I know we should probably always add the time variable, right? However, every time when I do it, it takes so much longer to run the regression and many times, the results do not converge. Is it because the software is taking those time variables into considerations when finding the best instruments?

                I read your comments on the better name of this command and I totally agree with you. ivpoisson is a bit confusing.

                Comment

                Working...
                X