Poisson-CRE and over identification test

Federico Nutarelli

Join Date: Sep 2018

Posts: 430
#1

Poisson-CRE and over identification test

30 Oct 2020, 03:29

Hi all,

I am trying to perform an over identification test on the following theoretic scenario. Say that I have a balanced panel where N = 208 and T=10. Let's call z = (z_1, z_2) the instrument matrix, x(NT,1) the endogenous being instrumented, w the other feature (NT, d_W). Now, according to Wooldridge et al. procedure, I would like to compute:

Code:

xtreg x z w i.Year, fe predict double residuals, e xtpoisson y x residuals w i.Year, fe

My aim is to test for over identifying restrictions in such scenario. One thing I thought is to manually compute the Hansen test:

1) estimate the entire procedure with all instruments;
2) save the residuals;
3) regress the residuals on the instruments alone (xtreg, fe ???)
4) Now N*R^2 would be Chi squared, but how to save N*R^2 from xtreg, fe? And how, in the case we manage to do it, to predict p-values?

As you can see the procedure lacks some steps. So, can you please either advice me on steps 3) and 4) or provide a more elegant (maybe a command or standard procedure) way to perform overidentifiication test on Poisson-CRE procedure please?

Thank you,

Federico
Tags: fixed effects, Overidentification, panel, panel data, xtpoisson
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2173
#2

31 Oct 2020, 06:46

Frederico: A simple test is available if you bootstrap the two steps. Include in the second step instruments equal to the number of over identifying restrictions along with the control function. Do a joint Wald test on those IVs. Of course, if you put in all IVs you’ll get perfect collinearity.
Comment

Federico Nutarelli

Join Date: Sep 2018
Posts: 430

02 Nov 2020, 03:19

Jeff Wooldridge Thank you very much for the response and for the hint!
So keeping the example in #1(z_1 and z_2, hence one over identifying condition), something like this:

Code:

program cre_cf, rclass

xtreg x z_1 z_2 w i.Year, fe
predict double residuals, e
xtpoisson y x z_1 residuals w i.Year, fe
return scalar b_x = _b[x]
return scalar b_residuals = _b[residuals]
return scalar se_x = _se[x]
return scalar se_residuals = _se[residuals]  
gen newid = id
xtset newid Year
bootstrap  r(b_x) r(b_residuals)  r(se_x) r(se_residuals), ///
    reps(40000) saving(datiboot_40000, replace double) seed(123456789) cluster(oldid) idcluster(newid) nodrop: cre_cf

i.e. including z_1 in step 2, right?

Thank you again a lot.

Last edited by Federico Nutarelli; 02 Nov 2020, 03:31. Reason: Wrong indentation

Comment

Jeff Wooldridge

Join Date: Apr 2014

Posts: 2173
#4

02 Nov 2020, 09:41

Yes, you include z_1 in the second step. But it is the t statistic on z_1 which is the test statistic in this case, so you want to bootstrap it or the standard error. And you should find that the t statistic is invariant to whether you use z_1 or z_2 in obtaining the t statistic.
Comment
Federico Nutarelli

Join Date: Sep 2018

Posts: 430
#5

02 Nov 2020, 12:39

Jeff Wooldridge thank you a lot
Comment

Federico Nutarelli

Join Date: Sep 2018
Posts: 430

02 Nov 2020, 13:12

Jeff Wooldridge sorry if I come back on this, but just a clarification. So my test statistic in the t-stat of z_1.

What I did not understand its if I have to run two bootstraps with both z_1 and z_2 in first stage and first only z_1 on second then only z_2 like this:

1)

Code:

 program cre_cf, rclass  
xtreg x z_1 z_2 w i.Year, fe
predict double residuals, e
xtpoisson y x z_1 residuals w i.Year, fe
return scalar b_x = _b[x]
return scalar b_residuals = _b[residuals]
return scalar se_x = _se[x]
return scalar se_residuals = _se[residuals]  
gen newid = id
xtset newid Year
bootstrap  r(b_x) r(b_residuals)  r(se_x) r(se_residuals), ///    
reps(40000) saving(datiboot_40000, replace double) seed(123456789) cluster(oldid) idcluster(newid) nodrop: cre_cf

Code:

 program cre_cf, rclass  
xtreg x z_1 z_2 w i.Year, fe
predict double residuals, e
xtpoisson y x z_2 residuals w i.Year, fe
return scalar b_x = _b[x]
return scalar b_residuals = _b[residuals]
return scalar se_x = _se[x]
return scalar se_residuals = _se[residuals]  
gen newid = id
xtset newid Year
bootstrap  r(b_x) r(b_residuals)  r(se_x) r(se_residuals), ///    
reps(40000) saving(datiboot_40000, replace double) seed(123456789) cluster(oldid) idcluster(newid) nodrop: cre_cf

or something like two bootstraps with the first stage having first only z_1 and then only z_2, like this:

a)

Code:

 program cre_cf, rclass  
xtreg x z_1 w i.Year, fe
predict double residuals, e
xtpoisson y x z_1 residuals w i.Year, fe
return scalar b_x = _b[x]
return scalar b_residuals = _b[residuals]
return scalar se_x = _se[x]
return scalar se_residuals = _se[residuals]  
gen newid = id
xtset newid Year
bootstrap  r(b_x) r(b_residuals)  r(se_x) r(se_residuals), ///    
reps(40000) saving(datiboot_40000, replace double) seed(123456789) cluster(oldid) idcluster(newid) nodrop: cre_cf

Code:

 program cre_cf, rclass  
xtreg x z_2 w i.Year, fe
predict double residuals, e
xtpoisson y x z_1 residuals w i.Year, fe
return scalar b_x = _b[x]
return scalar b_residuals = _b[residuals]
return scalar se_x = _se[x]
return scalar se_residuals = _se[residuals]  
gen newid = id
xtset newid Year
bootstrap  r(b_x) r(b_residuals)  r(se_x) r(se_residuals), ///    
 reps(40000) saving(datiboot_40000, replace double) seed(123456789) cluster(oldid) idcluster(newid) nodrop: cre_cf

and then compare the t_1 stats of z_1 of a) and b). Or neither of the two...

Thank you again!

Last edited by Federico Nutarelli; 02 Nov 2020, 13:19.

Comment

Jeff Wooldridge

Join Date: Apr 2014

Posts: 2173
#7

02 Nov 2020, 14:00

The first stage has to have both z_1 and z_2, otherwise there is no overidentification. Did you try it? You'll get perfect collinearity. And besides, you're not returning the proper statistics, which are _b[z_1] and _se[_z1], and so on. For the overid test, the coefficients on x and residuals is irrelevant.
Comment
Federico Nutarelli

Join Date: Sep 2018

Posts: 430
#8

03 Nov 2020, 03:09

Jeff Wooldridge

Thank you again for the reply.

The first stage has to have both z_1 and z_2, otherwise there is no overidentification. Did you try it? You'll get perfect collinearity

Yes indeed. I just tried the first approach (1 and 2) for the same reason you stated. Of course I did not put z_1 and z_2 in second stage.

And besides, you're not returning the proper statistics, which are _b[z_1] and _se[_z1], and so on

This is a typo of mine. In the actual code I bootstrapped _b[z_1] and _se[z_1] and then recovered the t stat as _b[z_1]/_se[z_1]

Now the result. I obtained .0187089 as the mean for the standard errors of z_2 and .0190751 as the mean for the standard errors of z_1. So quite the same value. However the t-stat differs a lot (0.533 for z_1 and -1.02 for z_2). However this is due to the different values of the coefficients (mean of _b[z_1] is .003111 and mean of _b[z_2] is -.013205). So standard errors are quite the same. As far as I understood the test,, it is similar to a Sargan/Hansen test. Indeed, if the overidentified restriction is correlated with the residuals, we should have strong multicollinearity among residuals and over identified restriction thus invalidating the significatività of the z. If the interpretation is correct, then having obtained the same standard errors I should be fine right? I mean, the differences in t-statistics are implied by the coefficient values but in both cases also with t-statistics the z are not significant.

Thank you again
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2173
#9

03 Nov 2020, 07:28

While it won’t change the conclusion, this is a case where you should bootstrap the p-value for the t statistic. That is what is invariant to using z_1 versus z_2 when you use the absolute value. You should see that in your original data: the absolute values of the t stats are the same. If you bootstrap the standard errors instead and then form the t stats with the original estimates then you won’t get equivalence, as you found.
Comment
Federico Nutarelli

Join Date: Sep 2018

Posts: 430
#10

03 Nov 2020, 11:13

Jeff Wooldridge

While it won’t change the conclusion, this is a case where you should bootstrap the p-value for the t statistic. That is what is invariant to using z_1 versus z_2 when you use the absolute value. You should see that in your original data: the absolute values of the t stats are the same.

Ok. So basically I should do something like:

Code:

di 2*ttail(e(df_r),abs(‘t’))

on the .dta file of the bootstrap to obtain the bootstrapped p-values for the t-statistic right?

Can you please provide me a toy example on that? Maybe I have a toy example that you shared in the past:

Code:

use airfare, clear egen double concenbar = mean(concen), by(id) egen double lfarebar = mean(lfare), by(id) capture program drop cre_cf program cre_cf, rclass xtreg lfare concen y98 y99 y00, fe predict double u2ddh, e reg lfare concen concenbar ldist ldistsq y98 y99 y00 predict double v2h, resid xtpoisson passen lfare u2ddh y98 y99 y00, fe return scalar b_lfare_1 = _b[lfare] glm passen lfare v2h concenbar lfarebar y98 y99 y00, fam(poiss) return scalar b_lfare_2 = _b[lfare] glm passen lfare v2h concenbar y98 y99 y00, fam(poiss) return scalar b_lfare_3 = _b[lfare] drop u2ddh v2h end gen newid = id xtset newid year bootstrap r(b_lfare_1) r(b_lfare_2) r(b_lfare_3), /// reps(500) saving(datiboot_wooldridge, replace double) seed(123) cluster(id) idcluster(newid): cre_cf

Further, if I have clustered data, should I correct for heteroschedasticity by including the option ice(robust) to xtpoisson (II ask since I don't see the option in the toy example).

Thank you again!

Last edited by Federico Nutarelli; 03 Nov 2020, 11:41.
Comment

Announcement