How to handle endogenous count variable in impact study

Abdallah Alidu

Join Date: May 2016

Posts: 52
#1

How to handle endogenous count variable in impact study

26 Jan 2017, 08:44

Hi,
Let me first of all apologize for the "bump posting" I did earlier. I am new to the program and I am therefore asking for your forgiveness.
I am conducting a research entitled "the impact of agricultural technology adoption on farm output in northern Ghana" and am using a cross sectional data. My Main objective is to determine the impact of agricultural technology adoption on output but my endogenous variable here is count (i.e. number of agricultural technologies adopted). My main challenge is how to correct for the endogeneity. Can I use Poisson or any count model in the first stage to predict the probabilities of adoption for the second stage?
Any help would be appreciated.
Thank you.
Tags: None
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

27 Jan 2017, 10:12

It is best to use an estimator that is directly built for your problem. I think you can do this directly in GSEM and user-written CMP (not sure about CMP and count variables).

You probably can use a count model for the first stage (being sure to have some exogenous variables that do not appear in the second stage) and then an 2SLS estimator or some other estimator that takes care of endogeneity at the second stage. However, I'd think this gives you correct estimates of the standard errors.
Comment
Roman Mostazir

Join Date: Apr 2014

Posts: 874
#3

27 Jan 2017, 22:21

GSEM is the way to go. It seems you have hierachical data which GSEM will be able to fit. Just thinking, is there any particular reason to treat '#of technologie adopted' as endogenous? Can't it be just another independent variable? In that case you can escape GSEM and fit normal single/multi-level regression models.

Roman
Comment
Abdallah Alidu

Join Date: May 2016

Posts: 52
#4

28 Jan 2017, 02:55

Many thanks to Phil and Roman!!
Dear Roman, the outcome model contains farm size as an independent variable and I suspect it might correlate with number of agricultural technologies adopted especially when the two are incorporated into the outcome model as independent variables. Another variable that might correlate with both adoption and output is labor. So that is why am considering the adoption variable endogenous. Any more help would be appreciated.
Thanks!
Abdallah.
Comment
Roman Mostazir

Join Date: Apr 2014

Posts: 874
#5

28 Jan 2017, 09:27

Yes, make sense. As suggested above, try GSEM if you have multi-level data. Stata has rich documentation on sem and gsem. Type help sem . Alan Acock also has a great book on SEM, "Discovering Structural Equation Modeling Using Stata".

Roman
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2156
#6

28 Jan 2017, 20:24

Abdallah: If y1 is the left hand side variable and y2 is the right hand side endogenous explanatory variable, is it y2 that's endogenous? If so, what is the nature of y1? Is it continuous?
Comment
Abdallah Alidu

Join Date: May 2016

Posts: 52
#7

29 Jan 2017, 04:02

Thanks Roman!
Dear Jeff,
Y1 is the left hand variable and continuous and Y2 is the right hand endogenous explanatory variable but count.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2156
#8

29 Jan 2017, 10:04

Then you don't need to do anything fancy. In fact, the fancy things are much less robust because they'll depend on extra assumptions. Since you are using a linear model for Y1 (I presume), you don't need to do anything other than 2SLS. Read a careful treatment of 2SLS and you will notice that nothing is assumed about the nature of Y2. It can be continuous, discrete, or mixed. The first stage, or reduced form, of Y2 is just a linear projection. So Y2 can be continuous, binary, or a count.

For efficiency reasons, you may want to generate an IV for Y2 that accounts for its count nature. I would use Poisson regression and obtain the fitted values. It's robust, simple, and, if the mean of Y2 is exponential, and the variance of U1 is constant, the efficient IV. And, you can ignore the first-stage estimation, as with usual 2SLS. The following commands first do 2SLS and then do IV using the fitted values.

Code:

ivregress 2sls y1 (y2 = zKp1 ... zL) z1 z2 ... zk, robust poisson y2 z1 z2 ... zK zKp1 ... zL predict y2hat ivregress 2sls y1 (y2 = y2hat) z1 z2 ... zk, robust

Again, let me emphasize that these procedures do not use extra assumptions Y2 need not be Poisson distributed, nor even have an exponential mean. You are just generating instruments.

The joint MLE using cmp is overkill and not robust.
1 like
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3010
#9

29 Jan 2017, 10:46

Nice to see threads converging; see here.

Joao
1 like
Comment
Yazeed Abdul

Join Date: Jan 2018

Posts: 3
#10

11 Jan 2018, 13:25

Hello Dear members,
I am looking at the impacts of networks size(x1) and support(x2) on technology adoption (y1) and consequently the impacts of x1,x2 and y1 on net revenue (y2) using data from cross-sectional survey. So I have the following equations;
Y2=x1+x2+z1+e2 and
Y1=x1+x2+z+e1 but y1 is a discrete variable with 3 categories. I want to estimate y2 and y1 using multinomial endogenous switching regression to account for self-selection in choice of the 3 categories of the technology. However my concern is how to exogenize the two network vatiables x1 (which is a count variable) and x2 (which is a continuous variable) in this system. I have potential instruments for both x1 and x2. Any suggestions.

If i think of 2sls for x1 and x2 at the first stage and y1 at the second stage, the multinomial nature of y1 seem to make it a problem. OlThank you

Last edited by Yazeed Abdul; 11 Jan 2018, 13:43.
Comment
Yazeed Abdul

Join Date: Jan 2018

Posts: 3
#11

11 Jan 2018, 13:59

Hello Dear members,
I am looking at the impacts of networks size(x1) and support(x2) on technology adoption (y1) and consequently the impacts of x1,x2 and y1 on net revenue (y2) using data from cross-sectional survey. So I have the following equations;
Y2=x1+x2+z1+e2 and
Y1=x1+x2+z+e1 but y1 is a discrete variable with 3 categories. I want to estimate y2 and y1 using multinomial endogenous switching regression to account for self-selection in choice of the 3 categories of the technology. However my concern is how to exogenize the two network vatiables x1 (which is a count variable) and x2 (which is a continuous variable) in this system. I have potential instruments for both x1 and x2. Any suggestions.

If i think of 2sls for x1 and x2 at the first stage and y1 at the second stage, the multinomial nature of y1 seem to make it a problem. OlThank you
Comment

Announcement

How to handle endogenous count variable in impact study

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment