Non-linear IV: Dependent count variable and binary endogenous variable

Rainer Widmann

Join Date: Feb 2020

Posts: 2
#1

Non-linear IV: Dependent count variable and binary endogenous variable

17 Feb 2020, 07:33

Hallo

I have a problem in my research project where my dependent variable is a (very dispersed) count variable, I have many covariates and there is one key endogenous variable that is binary. I was hoping that someone with experience with this type of models can help me out.

I have researched potential solutions, in particular Wooldridge 2014 ("Control Functions in Applied Econometrics") and Wooldridge 2015 ("Quasi-maximum likelihood estimation and testing for nonlinear models with endogenous explanatory variables"). I also read the very helpful thread
https://www.statalist.org/forums/for...ative-binomial

Overall, there appears to be no "silver bullet" solution. At the end of the day, all models are incorrect, but I am trying to do the best that I can and find the ones that appear more sensible.

What I have done so far is:

- Winsorize all count variables (to allay dispersion) and simply run IV 2SLS

- Run the stata user-command "ivpois" which assumes an exponential conditional mean. However, since my standard errors are clustered, I have to bootstrap which is taking an awfully long time. The fact that the endogenous variable is binary is no issue here, correct?

- Control Function approach: include residuals from the first stage, which I estimate by OLS, and include into a second stage that is either Poisson or Negative Binomial. Again, bootstrapped clustered standard error for inference. If I would like to present results from the Negative Binomial model, is this the best that I can do? Since I have a binary endogenous variable, this approach might be strictly speaking wrong (but then again, all models have issues).

- I have started to think about making restrictive assumptions on the structural errors that pertain to the outcome-equation and the equation for the endogenous variable to arrive at a Log-Likelihood Function that I can maximize. However, the Poisson Assumption (maybe I should use a different assumption here?) on the count variable makes it difficult to arrive at an analytic expression for the likelihood - do you have recommendations where to look here? In the worst case, I may to have to simulate or numerically integrate probabilities - what packages would you recommend here?

Thanks for your help!
Tags: None
Joao Santos Silva

Join Date: Apr 2014

Posts: 3018
#2

17 Feb 2020, 08:42

Dear Rainer Widmann,

The case you are considering is exactly the case we studied in

Windmeijer, F.A.G. and Santos Silva, J.M.C. (1997), Estimation of Count Data Models with Endogenous regressors; An Application to Demand for Health Care, Journal of Applied Econometrics, 12(3), pp. 281-294.

The estimators discussed in the paper are available in Stata: type "help ivpoisson".

Best wishes,

Joao
Comment
Rainer Widmann

Join Date: Feb 2020

Posts: 2
#3

17 Feb 2020, 09:23

Many thanks!

As far as I understand, your suggested solution is discussed on page 286 and then page 290. You suggest to use the predicted "first-stage" propensities as instruments in ivpoisson gmm. What confuses me though is that the help of "ivpoisson" clarifies that it is for "Poisson model with continuous endogenous covariates".

Is this limitation simply not applicable in this particular case?
Comment
Joerg Luedicke (StataCorp)

StataCorp Employee

Join Date: Apr 2014

Posts: 117
#4

17 Feb 2020, 12:47

You could also have a look at Stata's etpoisson command for exponential mean models with binary endogenous covariates:

Code:

help etpoisson

Best,
Joerg
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3018
#5

17 Feb 2020, 13:10

Dear Rainer Widmann,

I am not sure why the help file says that; I believe you can safely ignore that. As Joerg Luedicke (StataCorp) noted, etpoisson is an alternative, but it relies on very strong distributional assumptions.

Best wishes,

Joao
Comment
Victoria Yan

Join Date: May 2020

Posts: 3
#6

28 May 2020, 12:55

Dear Rainer Widmann,

I have the same question. Have you decided to use ivpoisson for your case? Or did you find a better way to do it?
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2189
#7

28 May 2020, 22:41

Victoria: Despite its name, ivpoisson with the gmm option imposes the fewest assumptions. It can be applied to any nonnegative outcome and any kind of endogenous variables. You simply need enough good instruments.

I keep hoping Stata will change the name to ivexponential.
Comment
Victoria Yan

Join Date: May 2020

Posts: 3
#8

29 Jun 2020, 11:55

Dear Dr. Wooldridge:

(It's actually Zhenzhen here. I am glad to hear from you via different channels.)

Thank you for your response. I have three further questions:

1) How can I apply ivpoisson on panel data? Should I bootstrap the standard error or simply using vce (cluster id) could do the trick?

2) Does it work well for the unbalanced panel?

3) I know we should probably always add the time variable, right? However, every time when I do it, it takes so much longer to run the regression and many times, the results do not converge. Is it because the software is taking those time variables into considerations when finding the best instruments?

I read your comments on the better name of this command and I totally agree with you. ivpoisson is a bit confusing.
Comment

Announcement

Non-linear IV: Dependent count variable and binary endogenous variable

Comment

Comment

Comment

Comment

Comment

Comment

Comment