Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Poisson-CRE and over identification test

    Hi all,

    I am trying to perform an over identification test on the following theoretic scenario. Say that I have a balanced panel where N = 208 and T=10. Let's call z = (z_1, z_2) the instrument matrix, x(NT,1) the endogenous being instrumented, w the other feature (NT, d_W). Now, according to Wooldridge et al. procedure, I would like to compute:

    Code:
    xtreg x z w i.Year, fe
    predict double residuals, e
    xtpoisson y x residuals w i.Year, fe
    My aim is to test for over identifying restrictions in such scenario. One thing I thought is to manually compute the Hansen test:

    1) estimate the entire procedure with all instruments;
    2) save the residuals;
    3) regress the residuals on the instruments alone (xtreg, fe ???)
    4) Now N*R^2 would be Chi squared, but how to save N*R^2 from xtreg, fe? And how, in the case we manage to do it, to predict p-values?

    As you can see the procedure lacks some steps. So, can you please either advice me on steps 3) and 4) or provide a more elegant (maybe a command or standard procedure) way to perform overidentifiication test on Poisson-CRE procedure please?

    Thank you,

    Federico

  • #2
    Frederico: A simple test is available if you bootstrap the two steps. Include in the second step instruments equal to the number of over identifying restrictions along with the control function. Do a joint Wald test on those IVs. Of course, if you put in all IVs you’ll get perfect collinearity.

    Comment


    • #3
      Jeff Wooldridge Thank you very much for the response and for the hint!
      So keeping the example in #1(z_1 and z_2, hence one over identifying condition), something like this:

      Code:
      program cre_cf, rclass
      
      xtreg x z_1 z_2 w i.Year, fe
      predict double residuals, e
      xtpoisson y x z_1 residuals w i.Year, fe
      return scalar b_x = _b[x]
      return scalar b_residuals = _b[residuals]
      return scalar se_x = _se[x]
      return scalar se_residuals = _se[residuals]  
      gen newid = id
      xtset newid Year
      bootstrap  r(b_x) r(b_residuals)  r(se_x) r(se_residuals), ///
          reps(40000) saving(datiboot_40000, replace double) seed(123456789) cluster(oldid) idcluster(newid) nodrop: cre_cf
      i.e. including z_1 in step 2, right?

      Thank you again a lot.
      Last edited by Federico Nutarelli; 02 Nov 2020, 03:31. Reason: Wrong indentation

      Comment


      • #4
        Yes, you include z_1 in the second step. But it is the t statistic on z_1 which is the test statistic in this case, so you want to bootstrap it or the standard error. And you should find that the t statistic is invariant to whether you use z_1 or z_2 in obtaining the t statistic.

        Comment


        • #5
          Jeff Wooldridge thank you a lot

          Comment


          • #6
            Jeff Wooldridge sorry if I come back on this, but just a clarification. So my test statistic in the t-stat of z_1.

            What I did not understand its if I have to run two bootstraps with both z_1 and z_2 in first stage and first only z_1 on second then only z_2 like this:

            1)

            Code:
             program cre_cf, rclass  
            xtreg x z_1 z_2 w i.Year, fe
            predict double residuals, e
            xtpoisson y x z_1 residuals w i.Year, fe
            return scalar b_x = _b[x]
            return scalar b_residuals = _b[residuals]
            return scalar se_x = _se[x]
            return scalar se_residuals = _se[residuals]  
            gen newid = id
            xtset newid Year
            bootstrap  r(b_x) r(b_residuals)  r(se_x) r(se_residuals), ///    
            reps(40000) saving(datiboot_40000, replace double) seed(123456789) cluster(oldid) idcluster(newid) nodrop: cre_cf
            2)

            Code:
             program cre_cf, rclass  
            xtreg x z_1 z_2 w i.Year, fe
            predict double residuals, e
            xtpoisson y x z_2 residuals w i.Year, fe
            return scalar b_x = _b[x]
            return scalar b_residuals = _b[residuals]
            return scalar se_x = _se[x]
            return scalar se_residuals = _se[residuals]  
            gen newid = id
            xtset newid Year
            bootstrap  r(b_x) r(b_residuals)  r(se_x) r(se_residuals), ///    
            reps(40000) saving(datiboot_40000, replace double) seed(123456789) cluster(oldid) idcluster(newid) nodrop: cre_cf
            or something like two bootstraps with the first stage having first only z_1 and then only z_2, like this:

            a)

            Code:
             program cre_cf, rclass  
            xtreg x z_1 w i.Year, fe
            predict double residuals, e
            xtpoisson y x z_1 residuals w i.Year, fe
            return scalar b_x = _b[x]
            return scalar b_residuals = _b[residuals]
            return scalar se_x = _se[x]
            return scalar se_residuals = _se[residuals]  
            gen newid = id
            xtset newid Year
            bootstrap  r(b_x) r(b_residuals)  r(se_x) r(se_residuals), ///    
            reps(40000) saving(datiboot_40000, replace double) seed(123456789) cluster(oldid) idcluster(newid) nodrop: cre_cf

            b)
            ​​​​​​​
            Code:
             program cre_cf, rclass  
            xtreg x z_2 w i.Year, fe
            predict double residuals, e
            xtpoisson y x z_1 residuals w i.Year, fe
            return scalar b_x = _b[x]
            return scalar b_residuals = _b[residuals]
            return scalar se_x = _se[x]
            return scalar se_residuals = _se[residuals]  
            gen newid = id
            xtset newid Year
            ​​​​​​​bootstrap  r(b_x) r(b_residuals)  r(se_x) r(se_residuals), ///    
            ​​​​​​​ reps(40000) saving(datiboot_40000, replace double) seed(123456789) cluster(oldid) idcluster(newid) nodrop: cre_cf
            and then compare the t_1 stats of z_1 of a) and b). Or neither of the two...

            ​​​​​​​Thank you again!
            Last edited by Federico Nutarelli; 02 Nov 2020, 13:19.

            Comment


            • #7
              The first stage has to have both z_1 and z_2, otherwise there is no overidentification. Did you try it? You'll get perfect collinearity. And besides, you're not returning the proper statistics, which are _b[z_1] and _se[_z1], and so on. For the overid test, the coefficients on x and residuals is irrelevant.

              Comment


              • #8
                Jeff Wooldridge

                Thank you again for the reply.

                The first stage has to have both z_1 and z_2, otherwise there is no overidentification. Did you try it? You'll get perfect collinearity
                Yes indeed. I just tried the first approach (1 and 2) for the same reason you stated. Of course I did not put z_1 and z_2 in second stage.

                And besides, you're not returning the proper statistics, which are _b[z_1] and _se[_z1], and so on
                This is a typo of mine. In the actual code I bootstrapped _b[z_1] and _se[z_1] and then recovered the t stat as _b[z_1]/_se[z_1]

                Now the result. I obtained .0187089 as the mean for the standard errors of z_2 and .0190751 as the mean for the standard errors of z_1. So quite the same value. However the t-stat differs a lot (0.533 for z_1 and -1.02 for z_2). However this is due to the different values of the coefficients (mean of _b[z_1] is .003111 and mean of _b[z_2] is -.013205). So standard errors are quite the same. As far as I understood the test,, it is similar to a Sargan/Hansen test. Indeed, if the overidentified restriction is correlated with the residuals, we should have strong multicollinearity among residuals and over identified restriction thus invalidating the significatività of the z. If the interpretation is correct, then having obtained the same standard errors I should be fine right? I mean, the differences in t-statistics are implied by the coefficient values but in both cases also with t-statistics the z are not significant.

                ​​​​​​​Thank you again

                Comment


                • #9
                  While it won’t change the conclusion, this is a case where you should bootstrap the p-value for the t statistic. That is what is invariant to using z_1 versus z_2 when you use the absolute value. You should see that in your original data: the absolute values of the t stats are the same. If you bootstrap the standard errors instead and then form the t stats with the original estimates then you won’t get equivalence, as you found.

                  Comment


                  • #10
                    Jeff Wooldridge
                    While it won’t change the conclusion, this is a case where you should bootstrap the p-value for the t statistic. That is what is invariant to using z_1 versus z_2 when you use the absolute value. You should see that in your original data: the absolute values of the t stats are the same.
                    Ok. So basically I should do something like:

                    Code:
                    di 2*ttail(e(df_r),abs(‘t’))
                    on the .dta file of the bootstrap to obtain the bootstrapped p-values for the t-statistic right?

                    Can you please provide me a toy example on that? Maybe I have a toy example that you shared in the past:
                    Code:
                    use airfare, clear
                    
                    egen double concenbar = mean(concen), by(id)
                    egen double lfarebar = mean(lfare), by(id)
                    
                    
                    capture program drop cre_cf
                    
                    program cre_cf, rclass
                    
                    xtreg lfare concen y98 y99 y00, fe
                    predict double u2ddh, e
                    reg lfare concen concenbar ldist ldistsq y98 y99 y00
                    predict double v2h, resid
                    
                    xtpoisson passen lfare u2ddh y98 y99 y00, fe
                    return scalar b_lfare_1 = _b[lfare]
                    glm passen lfare v2h concenbar lfarebar y98 y99 y00, fam(poiss)
                    return scalar b_lfare_2 = _b[lfare]
                    glm passen lfare v2h concenbar y98 y99 y00, fam(poiss)
                    return scalar b_lfare_3 = _b[lfare]
                    
                    drop u2ddh v2h
                    
                    end
                    
                    gen newid = id
                    xtset newid year
                    bootstrap r(b_lfare_1) r(b_lfare_2) r(b_lfare_3), ///
                        reps(500) saving(datiboot_wooldridge, replace double) seed(123) cluster(id) idcluster(newid): cre_cf
                    Further, if I have clustered data, should I correct for heteroschedasticity by including the option ice(robust) to xtpoisson (II ask since I don't see the option in the toy example).

                    Thank you again!
                    Last edited by Federico Nutarelli; 03 Nov 2020, 11:41.

                    Comment

                    Working...
                    X