Endogeneity issue-Negative Binomial

NOURHENE BEN YOUSSEF

Join Date: Aug 2015

Posts: 13
#1

Endogeneity issue-Negative Binomial

31 Aug 2015, 23:42

I have a database with counts as dependent variable. This variable suffers from over-dispersion problem. That's why I need to use negative binomial regression. Since I have an endogenous variable, I need to estimate the coefficient using an instrument variable. Is there a way/program/code that helps to estimate negative bbinomial model using an instrument? I am using STATA version 12. I have found IVPOIS address the issues of estimating coefficient using instrument for POISSON model. Could you help me please to modify the original code for IVPOIS for negative binomial models? Thanks.
Tags: None
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2152
#2

01 Sep 2015, 05:39

It's tricky to incorporate endogenous variables if you insist on using the negative binomial distribution. If the endogenous explanatory variable is continuous then you can use a control function approach, but the assumptions are somewhat restrictive.

I would strongly recommend trying IVPOIS, too. Regrettably, this command name is a misnomer. It should be called something like IVEXPON, as it works for any exponential model with multiplicative error. (Note to the the Stata folks: I would think seriously about changing the name of the command or at least having an alternative name that more accurately describes its scope.) It does not care whether the EEV is continuous or discrete, so it produces consistent estimators under much weaker assumptions. At a minimum, compare it with any other solution you use.

What kind of EEV do you have?

JW

Last edited by Jeff Wooldridge; 01 Sep 2015, 05:45.
1 like
Comment
NOURHENE BEN YOUSSEF

Join Date: Aug 2015

Posts: 13
#3

01 Sep 2015, 11:27

Thank you for your prompt reply. That helps a lot.

The dependent variable is number of days which is count variable. The endogenous explanatory variable (EEV) and the instrument variable are both continuous variables. As you suggested, IVPOIS can be used as sensitivity test; but what do you mean by "much weaker assumptions". I will also try control function approach. Do you have an idea about the command/coding in STATA to test CF method?

Btw, do you know how to test for the strength of the instrumental variable?

Thanks.

NBY
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2152
#4

02 Sep 2015, 09:14

1. If y2 is your EEV, you have to essentially assume it has a linear reduced form with an error independent of the exogenous variables (rather than just uncorrelated, or even mean independent). If you write

y2 = z*d2 + v2

so that v2 is the reduced form error, then v2 is independent of z. That is a pretty strong assumption, even when y2 is continuous.

2. Then, you have to assume that

y1 given z1, y2, and v2

has a negative binomial distribution with exponential mean, which is also strong.

IVPOIS requires neither of these assumptions.

Let me be clear: I think the CF approach is a good way to go. Just use OLS on the first stage, get v2^, and insert into the NegBin the second stage. You will want to bootstrap the standard errors if the coefficient on v2^ is significant (so evidence of endogeneity).

Code:

reg y2 z1 z2 ... zK ... zM predict v2hat, resid nbreg y2 z1 z2 ... zK y2 v2hat

Incidentally, I would even prefer

Code:

reg y2 z1 z2 ... zK ... zM predict v2hat, resid poisson y2 z1 z2 ... zK y2 v2hat, robust

because then assumption (2) is not needed. And, no, the Poisson assumption is not needed either. That's why I prefer Poisson regression to NegBin unless you want to actually estimate probabilities.
1 like
Comment
Johann Salgado

Join Date: Oct 2015

Posts: 2
#5

05 Oct 2015, 20:45

Hello ! I have a two question, I try to develop my thesis with a model IVPOIS, but i have two problem: 1. In the data are many observations with zero. 2. The dependent variable suffers from over-dispersion problem. I am not sure if this methodology is correct to my work.

The second question, is about i prove that instrument is relevant to the independent variable, something like the first step in a two-stage model (ivreg) because the command IVPOIS not allow the first option, since it does not work this way.

Thank you !
Comment
NOURHENE BEN YOUSSEF

Join Date: Aug 2015

Posts: 13
#6

06 Oct 2015, 10:42

1)To test your model, you need to use negative binomial regression because your dependent variable (DISCLAG) is a count variable and over-dispersed and does not have an excessive number of zero.
Moreover, using negative binomial regression, you might find scaled deviation and Pearson chi-square values close to 1 which indicates adequate model fit compare to Poisson model where their scaled deviation and Pearson Chi-squared values are high (may be around 100). Therefore, you can use a negative binomial regression.
See the paper : Schmidt and Wilkins (2013) "Bringing Darkness to Light: The Influence of Auditor Quality and Audit Committee Expertise on the Timeliness of Financial Statement Restatement Disclosures" Auditing: A Journal of Practice & Theory, Vol. 32, No. 1.

2) To control for endogeneity issues, you need to run two stage least square:
EEV = f(independent variable, control variable, instrumental variable)
Dependent variable = f( estimated EEV, independent variable other than EEV, control variable).
Comment
Johann Salgado

Join Date: Oct 2015

Posts: 2
#7

06 Oct 2015, 17:25

The last question is about interpretation the results of command Ivpois. The interpretation is like rate ratios ? (e^Beta), or is different for the multiplicative errors?
Thanks and sorry for my bad english.
Comment
Arian G.

Join Date: Oct 2015

Posts: 8
#8

06 Oct 2015, 18:59

Originally posted by Jeff Wooldridge View Post

Let me be clear: I think the CF approach is a good way to go. Just use OLS on the first stage, get v2^, and insert into the NegBin the second stage. You will want to bootstrap the standard errors if the coefficient on v2^ is significant (so evidence of endogeneity).

Thanks a lot for your input.

Would this also work for a GEE negative binomial model, which is a negative binomial model with a PA option?
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2152
#9

06 Oct 2015, 19:14

Yes, Arian, it would. You just have to use the panel bootstrap because I assume you have panel data, right? So bootstrap both steps but draw the entire cross section observation.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2152
#10

06 Oct 2015, 19:18

Originally posted by NOURHENE BEN YOUSSEF View Post

1)To test your model, you need to use negative binomial regression because your dependent variable (DISCLAG) is a count variable and over-dispersed and does not have an excessive number of zero.
Moreover, using negative binomial regression, you might find scaled deviation and Pearson chi-square values close to 1 which indicates adequate model fit compare to Poisson model where their scaled deviation and Pearson Chi-squared values are high (may be around 100). Therefore, you can use a negative binomial regression.
See the paper : Schmidt and Wilkins (2013) "Bringing Darkness to Light: The Influence of Auditor Quality and Audit Committee Expertise on the Timeliness of Financial Statement Restatement Disclosures" Auditing: A Journal of Practice & Theory, Vol. 32, No. 1.

2) To control for endogeneity issues, you need to run two stage least square:
EEV = f(independent variable, control variable, instrumental variable)
Dependent variable = f( estimated EEV, independent variable other than EEV, control variable).

I disagree with most of this -- which is why I suggested alternatives. First, the Poisson estimator is fully robust for any kind of over- or underdispersion. It's not a matter of comparing model fit. The Poisson quasi-MLE is fully robust. The Negative Binomial is not. I know you might have read something else, or learned something else, but it's wrong.

For (2), you are committing what is called the "forbidden regression." You cannot, in most cases, simply insert fitted values for the EEV into a nonlinear function. This is exactly why I suggested either the control function approach -- adding residuals -- or IVPOIS. As I emphasized above, IVPOIS has nothing to do with the Poisson distribution. It is a method of moments procedure for an exponential mean function. It is the most robust of ALL procedures.
1 like
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2152
#11

06 Oct 2015, 19:20

Originally posted by Johann Salgado View Post

Hello ! I have a two question, I try to develop my thesis with a model IVPOIS, but i have two problem: 1. In the data are many observations with zero. 2. The dependent variable suffers from over-dispersion problem. I am not sure if this methodology is correct to my work.

The second question, is about i prove that instrument is relevant to the independent variable, something like the first step in a two-stage model (ivreg) because the command IVPOIS not allow the first option, since it does not work this way.

Thank you !

Use IVPOIS or the control function method I described earlier. In fact, I prefer the control function method with the Poisson QMLE in the second stage. You should estimate a linear first stage to check for instrument relevance. This is not perfect because IVPOISS uses nonlinear moment conditions, but it should give you a good idea about whether your IV is relevant. JW
1 like
Comment
Arian G.

Join Date: Oct 2015

Posts: 8
#12

06 Oct 2015, 19:28

Originally posted by Jeff Wooldridge View Post

Yes, Arian, it would. You just have to use the panel bootstrap because I assume you have panel data, right? So bootstrap both steps but draw the entire cross section observation.

Thank you for the quick response. How would this be expanded to multiple endogenous variables? (say, I have 2 endogenous variables, or a linear and square term of the same variable).
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2152
#13

07 Oct 2015, 04:43

If you use the control function approach then you need to estimate a reduced form for each unique EEV. Putting in nonlinear functions of those EEVs requires no change: you just add the first-stage residuals to the second stage GEE. Now, you might want to include more functions of the residuals, such as squares and cross products. This makes the CF approach more flexible.
Comment
Arian G.

Join Date: Oct 2015

Posts: 8
#14

07 Oct 2015, 11:03

Originally posted by Jeff Wooldridge View Post

If you use the control function approach then you need to estimate a reduced form for each unique EEV. Putting in nonlinear functions of those EEVs requires no change: you just add the first-stage residuals to the second stage GEE. Now, you might want to include more functions of the residuals, such as squares and cross products. This makes the CF approach more flexible.

Thank you very much for your help!
Comment
NOURHENE BEN YOUSSEF

Join Date: Aug 2015

Posts: 13
#15

07 Oct 2015, 15:47

Thank you Dr. Wooldridge!
Comment

Announcement

Endogeneity issue-Negative Binomial

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment