Endogeneity issue-Negative Binomial

NOURHENE BEN YOUSSEF

Join Date: Aug 2015

Posts: 13
#16

09 Oct 2015, 17:08

Hi Dr. Woolbridge,

Thank you for your help. As suggested by you, I tried both approaches : CF and IVPOIS. The results using the CF approach are very strong. However, I get the following error message when I tried the ivpoisson: "
Step 1
Iteration 0: GMM criterion Q(b) = 2.14e+165 (not concave)
missing values encountered in analytic gradient"

I would appreciate any help.

Thank you.

Nourhene
Comment
Bjorn Arnarson

Join Date: Oct 2015

Posts: 8
#17

25 Oct 2015, 05:51

Originally posted by Jeff Wooldridge View Post

Code:

reg y2 z1 z2 ... zK ... zM predict v2hat, resid nbreg y2 z1 z2 ... zK y2 v2hat

Incidentally, I would even prefer

Code:

reg y2 z1 z2 ... zK ... zM predict v2hat, resid poisson y2 z1 z2 ... zK y2 v2hat, robust

Is the inclusion of the bolded y2 's a typo? Should be y1 I assume. A more general question, why should y2 be included (on the RHS) in the second stage not the instrumented y2's?
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2169
#18

25 Oct 2015, 20:31

Originally posted by Bjorn Arnarson View Post

Is the inclusion of the bolded y2 's a typo? Should be y1 I assume. A more general question, why should y2 be included (on the RHS) in the second stage not the instrumented y2's?

You are correct about the first instance of y2 in the "poisson" command. It should be y1. As to the second question, the control function works by including v2hat to render y2 exogenous. If by the "instrumented values" you mean the fitted values from the first stage then that is generally incorrect. You keep y2 in its original form and add v2hat. Then a robust t statistic on v2hat tests the null hypothesis that y2 is exogenous.

JW
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2169
#19

25 Oct 2015, 20:32

Originally posted by NOURHENE BEN YOUSSEF View Post

Hi Dr. Woolbridge,

Thank you for your help. As suggested by you, I tried both approaches : CF and IVPOIS. The results using the CF approach are very strong. However, I get the following error message when I tried the ivpoisson: "
Step 1
Iteration 0: GMM criterion Q(b) = 2.14e+165 (not concave)
missing values encountered in analytic gradient"

I would appreciate any help.

Thank you.

Nourhene

Solving computational problems is not my strong suit. Two reasons I like the CF approach is that we are forced to estimate a reduced form -- thereby checking for weak instruments -- and computation is almost never a problem. IVPOIS is a GMM procedure and can run into problems.
Comment
Xu Xu

Join Date: Nov 2015

Posts: 2
#20

08 Nov 2015, 09:14

Originally posted by Jeff Wooldridge View Post

Code:

reg y2 z1 z2 ... zK ... zM predict v2hat, resid nbreg y2 z1 z2 ... zK y2 v2hat

Hi Dr. Wooldridge,

Thank you and everyone who has contributed to this post. I find it very helpful.

I want to run some post estimation tests with the CF approach. nbreg does not support "overid" test. Is there any other way that I can run overidentification tests in Stata following the second stage regression?

Thank you!
Comment
Ashley Southcote

Join Date: Jan 2016

Posts: 1
#21

19 Jan 2016, 16:31

Originally posted by Jeff Wooldridge View Post

1. If y2 is your EEV, you have to essentially assume it has a linear reduced form with an error independent of the exogenous variables (rather than just uncorrelated, or even mean independent). If you write

y2 = z*d2 + v2

so that v2 is the reduced form error, then v2 is independent of z. That is a pretty strong assumption, even when y2 is continuous.

2. Then, you have to assume that

y1 given z1, y2, and v2

has a negative binomial distribution with exponential mean, which is also strong.

IVPOIS requires neither of these assumptions.

Let me be clear: I think the CF approach is a good way to go. Just use OLS on the first stage, get v2^, and insert into the NegBin the second stage. You will want to bootstrap the standard errors if the coefficient on v2^ is significant (so evidence of endogeneity).

Code:

reg y2 z1 z2 ... zK ... zM predict v2hat, resid nbreg y2 z1 z2 ... zK y2 v2hat

Incidentally, I would even prefer

Code:

reg y2 z1 z2 ... zK ... zM predict v2hat, resid poisson y2 z1 z2 ... zK y2 v2hat, robust

because then assumption (2) is not needed. And, no, the Poisson assumption is not needed either. That's why I prefer Poisson regression to NegBin unless you want to actually estimate probabilities.

Dear Prof. Wooldridge,

thank you for your explanation. I would like to know if the approach above is applicable to solve the issue of measurement error in the explanatory variable when using non-linear models (in my specific case Negative Binomial ). If not, would you be so kind to point me to some references that I can consult ?

Thank you in advance for your reply.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2169
#22

28 Jan 2016, 18:02

Originally posted by Ashley Southcote View Post

Dear Prof. Wooldridge,

thank you for your explanation. I would like to know if the approach above is applicable to solve the issue of measurement error in the explanatory variable when using non-linear models (in my specific case Negative Binomial ). If not, would you be so kind to point me to some references that I can consult ?

Thank you in advance for your reply.

Ashley: Yes, it will work for measurement error. It really is only justified if y2 is essentially continuous. Presumably you have an IV for it. When you use the CF approach with an exponential mean function, it doesn't matter what is the cause of the endogeneity.

JW
Comment
Arun Swami

Join Date: Mar 2016

Posts: 2
#23

02 Mar 2016, 21:48

Dear Prof. Wooldridge.

Thank you for all your clarifications on the control function approach. It really helps to estimate correct models.
May I kindly seek your guidance on my current model set up please. The challenge I have is that the endogeneous regressor in the first step, is a count variable and so am estimating negative binomial in the first step. However in second step, my dependent variable is continuous. So is the following correct way to estimate CF approach. Also kindly seek references to your work on this type of models wherein the fist stage is count or non-continuous please.

glm np12 eo amc eoamc eosq , family(nb) vce(robust)
predict res1, anscombe
glm pg1013 smc npsmc eo amc eoamc eosq np12 res1 , vce(robust)

Many thanks in advance
Arun

Last edited by Arun Swami; 02 Mar 2016, 22:18.
Comment
Arun Swami

Join Date: Mar 2016

Posts: 2
#24

23 Mar 2016, 07:24

May I kindly request someone's guidance on my above post please?
Comment
David Gomez

Join Date: Dec 2016

Posts: 1
#25

09 Dec 2016, 10:38

Originally posted by Jeff Wooldridge View Post

It's tricky to incorporate endogenous variables if you insist on using the negative binomial distribution. If the endogenous explanatory variable is continuous then you can use a control function approach, but the assumptions are somewhat restrictive.

I would strongly recommend trying IVPOIS, too. Regrettably, this command name is a misnomer. It should be called something like IVEXPON, as it works for any exponential model with multiplicative error. (Note to the the Stata folks: I would think seriously about changing the name of the command or at least having an alternative name that more accurately describes its scope.) It does not care whether the EEV is continuous or discrete, so it produces consistent estimators under much weaker assumptions. At a minimum, compare it with any other solution you use.

What kind of EEV do you have?

JW

Hi Professor Wooldridge,

I am trying to figure out how to do the empirical calculation for Terza's (1998) correction term for the second-stage. I know in the paper he mentions that the correction term is similar in nature to the inverse mills ratio. My understanding is that you need to calculate an IMR when y2=0 and y1=1. One way I am thinking about it is the following:

IMR = cond(y2 == 1,exp(-.5*phat^2)/(sqrt(2*_pi)*normprob(phat)), 1-(exp(-.5*phat^2))/(1-(sqrt(2*_pi)*normprob(phat))))

But honestly, I am not sure about it. I've been going through the literature but I haven't found a clear indication of how people empirically calculated it.

Thanks for your help!
Comment
Guest
#26

08 Dec 2017, 05:01

Originally posted by Jeff Wooldridge View Post

1. If y2 is your EEV, you have to essentially assume it has a linear reduced form with an error independent of the exogenous variables (rather than just uncorrelated, or even mean independent). If you write

y2 = z*d2 + v2

so that v2 is the reduced form error, then v2 is independent of z. That is a pretty strong assumption, even when y2 is continuous.

2. Then, you have to assume that

y1 given z1, y2, and v2

has a negative binomial distribution with exponential mean, which is also strong.

IVPOIS requires neither of these assumptions.

Let me be clear: I think the CF approach is a good way to go. Just use OLS on the first stage, get v2^, and insert into the NegBin the second stage. You will want to bootstrap the standard errors if the coefficient on v2^ is significant (so evidence of endogeneity).

Code:

reg y2 z1 z2 ... zK ... zM predict v2hat, resid nbreg y2 z1 z2 ... zK y2 v2hat

Incidentally, I would even prefer

Code:

reg y2 z1 z2 ... zK ... zM predict v2hat, resid poisson y2 z1 z2 ... zK y2 v2hat, robust

because then assumption (2) is not needed. And, no, the Poisson assumption is not needed either. That's why I prefer Poisson regression to NegBin unless you want to actually estimate probabilities.

Hi Professor Wooldridge,

my EVV is a dummy and my data is a cross-section, will it work???
Comment
Santosh Pathak

Join Date: Jul 2019

Posts: 11
#27

17 Oct 2019, 14:27

Originally posted by Jeff Wooldridge View Post

If you use the control function approach then you need to estimate a reduced form for each unique EEV. Putting in nonlinear functions of those EEVs requires no change: you just add the first-stage residuals to the second stage GEE. Now, you might want to include more functions of the residuals, such as squares and cross products. This makes the CF approach more flexible.

Hi Dr. Woolridge,

I am using ivpoisson to account for the endogeneity issue in my count data model with overdispersion. Is it okay to use ivpoisson or I should shift to CF approach?

Thanks in advance.

Regards,
Santosh
Comment
Tom Kisters

Join Date: May 2020

Posts: 48
#28

15 Apr 2021, 09:52

Jeff Wooldridge

Dear Professor Wooldridge (or of course anyone else),

I have a dependent variable which has a proportion/percentage interpretation (for which fractional regression would be most suitable). As a result, I wanted to use a control function approach to deal with my EEV.
The problem is that my EEV is ordinal, and and oprobit/ologit, does not produce any residuals.

You mention that ivpois has very little assumptions. Is the ivpois command in any way suitable for such a regression?

If not, is there any other strategy I can consider?

I have been trying to go at this from every angle. For a more detailed description please see the link below:

Background: https://stats.stackexchange.com/ques...n-2sri-with-an

EDIT: I have decided to turn this into a new post: https://www.statalist.org/forums/for...inomial-family.

Last edited by Tom Kisters; 15 Apr 2021, 09:57.
Comment
Rocio Aguilar

Join Date: Jan 2016

Posts: 53
#29

07 Dec 2022, 06:54

Good evening,
I have a similar problem but my dependent variable is not a count:

HTML Code:

| DEDICA~L | |----------| 1. | 49.08 | 2. | 8.56 | 3. | 23.87 | 4. | 66.51 | 5. | .03 | |----------| 6. | 0 | 7. | 0 | 8. | 100 | 9. | 99.97 | 10. | 0 | +----------+

The distribution of my dependent variable is showed in the graph.
And to top it all off, I have two endogenous variables.
Please, some ideas?
Thanks
Attached Files

Last edited by Rocio Aguilar; 07 Dec 2022, 06:59.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment