Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel data: xtoverid rejects RE for model with only time controls; odd result or information?

    Dear Statalisters,

    I am using interrupted time series methods on household panel data. I have monthly data on about 2000 households over a three-year period (18 months before the event, 18 months after). The panel is unbalanced, with a mean follow-up of 29 months. I construct my counterfactual (extrapolate the pre trends into the post period) using a linear FE model controlling for the time period (before/after the event), a linear time trend (with an interaction with the time period to allow for a change in the trend), month dummies and a set of additional covariates.

    I did a model building exercise to better understand the difference between the purely descriptive pre vs. post mean difference and the adjusted mean difference (between fitted and counterfactual values in the post period).

    Here is the issue. When I fit a random effects model with xtreg, re vce(cluster panelvar) only controlling for the time period and a linear time trend, the Hansen’s J test of overidentifying restriction for no correlation between the panel effects and the regressors (using xtoverid) fails to reject (Chi2(2) = 0.15, p = 0.9276), as expected. However, adding the interaction between the time period and the trend, the test now strongly rejects (Chi2(3) = 14.543, p = 0.0023). The coefficient estimates are slightly different than the FE estimates, but xtreg, fe vce(robust) yields corr(u_i, Xb) = 0.0053, essentially the same as when the interaction is excluded (0.0054). Also, the estimation sample is the same throughout.

    How can the test reject when (1) the additional variable (the interaction) is conceptually unrelated to the panel effects, and (2) corr(u_i, Xb) is basically the same across the two FE models (with and without the interaction)?

    Since my panel is unbalanced, I wondered if it could be due to attrition and replenishment of the sample. When I restrict the estimation sample to households with complete follow-up (about 60% of households), the Hansen’s J test rejects with the period indicator the only regressor (Chi(2) = 109.351), even though xtreg, fe vce(robust) yields virtually the same coefficient estimate and corr(u_i, Xb) = 0 (as expected).

    Am I missing something obvious? Or is this telling me something about the data?

    I am doing this exercise because household composition has a sizable impact on the fitted vs. counterfactual mean difference when it is included in the model alongside the post * t interaction, but not when the interaction is omitted, and I am not sure why.

    Any idea on why I am getting these results or on what I might do next to find out would be greatly appreciated.

    Maxime
    Code:
    . xtreg y post t, re vce(cluster id)
     
    Random-effects GLS regression                   Number of obs     =     69,494
    Group variable: id                              Number of groups  =      2,383
     
    R-sq:                                           Obs per group:
         within  = 0.0078                                         min =          1
         between = 0.0006                                         avg =       29.2
         overall = 0.0046                                         max =         36
     
                                                    Wald chi2(2)      =     174.10
    corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
     
                                     (Std. Err. adjusted for 2,383 clusters in id)
    ------------------------------------------------------------------------------
                 |               Robust
               y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
            post |  -32.61146   3.672993    -8.88   0.000    -39.81039   -25.41252
               t |  -.4647523   .2322854    -2.00   0.045    -.9200233   -.0094813
           _cons |   480.6522   6.420681    74.86   0.000     468.0679    493.2365
    -------------+----------------------------------------------------------------
         sigma_u |  231.44268
         sigma_e |  228.77659
             rho |  .50579289   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
     
    . xtoverid
     
    Test of overidentifying restrictions: fixed vs random effects
    Cross-section time-series model: xtreg re  robust cluster(id)
    Sargan-Hansen statistic   0.150  Chi-sq(2)    P-value = 0.9276
     
    . xtreg y post t, fe vce(cluster id)
     
    Fixed-effects (within) regression               Number of obs     =     69,494
    Group variable: id                              Number of groups  =      2,383
     
    R-sq:                                           Obs per group:
         within  = 0.0078                                         min =          1
         between = 0.0006                                         avg =       29.2
         overall = 0.0046                                         max =         36
     
                                                    F(2,2382)         =      86.54
    corr(u_i, Xb)  = 0.0054                         Prob > F          =     0.0000
     
                                     (Std. Err. adjusted for 2,383 clusters in id)
    ------------------------------------------------------------------------------
                 |               Robust
               y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
            post |  -32.56479   3.672894    -8.87   0.000    -39.76719   -25.36239
               t |  -.4696601   .2338195    -2.01   0.045    -.9281709   -.0111493
           _cons |    489.016   3.468408   140.99   0.000     482.2146    495.8174
    -------------+----------------------------------------------------------------
         sigma_u |  239.63657
         sigma_e |  228.77659
             rho |  .52317216   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
     
    . xtreg y post t post_t, re vce(cluster id)
     
    Random-effects GLS regression                   Number of obs     =     69,494
    Group variable: id                              Number of groups  =      2,383
     
    R-sq:                                           Obs per group:
         within  = 0.0078                                         min =          1
         between = 0.0004                                         avg =       29.2
         overall = 0.0046                                         max =         36
     
                                                    Wald chi2(3)      =     174.64
    corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
     
                                     (Std. Err. adjusted for 2,383 clusters in id)
    ------------------------------------------------------------------------------
                 |               Robust
               y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
            post |  -44.64032   8.956723    -4.98   0.000    -62.19518   -27.08547
               t |  -.7876857   .3423413    -2.30   0.021    -1.458662    -.116709
          post_t |   .6493272   .4381844     1.48   0.138    -.2094984    1.508153
           _cons |   483.5995   6.881025    70.28   0.000     470.1129     497.086
    -------------+----------------------------------------------------------------
         sigma_u |  230.67539
         sigma_e |  228.77046
             rho |  .50414608   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
     
    . xtoverid
     
    Test of overidentifying restrictions: fixed vs random effects
    Cross-section time-series model: xtreg re  robust cluster(id)
    Sargan-Hansen statistic  14.543  Chi-sq(3)    P-value = 0.0023
     
    . xtreg y post t post_t, fe vce(cluster id)
     
    Fixed-effects (within) regression               Number of obs     =     69,494
    Group variable: id                              Number of groups  =      2,383
     
    R-sq:                                           Obs per group:
         within  = 0.0078                                         min =          1
         between = 0.0003                                         avg =       29.2
         overall = 0.0046                                         max =         36
     
                                                    F(3,2382)         =      57.80
    corr(u_i, Xb)  = 0.0053                         Prob > F          =     0.0000
     
                                     (Std. Err. adjusted for 2,383 clusters in id)
    ------------------------------------------------------------------------------
                 |               Robust
               y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
            post |   -46.1116   8.987428    -5.13   0.000    -63.73559   -28.48761
               t |   -.832746   .3437206    -2.42   0.015    -1.506768   -.1587235
          post_t |   .7309716   .4394493     1.66   0.096     -.130771    1.592714
           _cons |   492.4736   4.386992   112.26   0.000     483.8709    501.0763
    -------------+----------------------------------------------------------------
         sigma_u |  239.70379
         sigma_e |  228.77046
             rho |  .52332547   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
     
    . xtreg y post if T_i == 36, re vce(cluster id)
     
    Random-effects GLS regression                   Number of obs     =     49,572
    Group variable: id                              Number of groups  =      1,377
     
    R-sq:                                           Obs per group:
         within  = 0.0000                                         min =         36
         between = 0.0000                                         avg =       36.0
         overall = 0.0041                                         max =         36
     
                                                    Wald chi2(1)      =     109.35
    corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
     
                                     (Std. Err. adjusted for 1,377 clusters in id)
    ------------------------------------------------------------------------------
                 |               Robust
               y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
            post |  -40.97813   3.918698   -10.46   0.000    -48.65864   -33.29763
           _cons |   498.4042   6.921887    72.00   0.000     484.8376    511.9709
    -------------+----------------------------------------------------------------
         sigma_u |  225.27162
         sigma_e |  223.84015
             rho |  .50318729   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
     
    . xtoverid
     
    Test of overidentifying restrictions: fixed vs random effects
    Cross-section time-series model: xtreg re  robust cluster(id)
    Sargan-Hansen statistic 109.351  Chi-sq(1)    P-value = 0.0000

  • #2
    I was able to reproduce the problem in a similar data set that is balanced. I included only year dummies and the outcome is a strong rejection using xtoverid. This is particularly troubling because we know RE and FE are numerically identical in this case; there are no restrictions to test. Here's what happens:

    Code:
    . xtreg lwage d81-d87, re vce(cluster nr)
    
    Random-effects GLS regression                   Number of obs     =      4,360
    Group variable: nr                              Number of groups  =        545
    
    R-sq:                                           Obs per group:
         within  = 0.0000                                         min =          8
         between = 0.0000                                         avg =        8.0
         overall = 0.0752                                         max =          8
    
                                                    Wald chi2(7)      =     414.10
    corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
    
                                       (Std. Err. adjusted for 545 clusters in nr)
    ------------------------------------------------------------------------------
                 |               Robust
           lwage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             d81 |   .1193902   .0244002     4.89   0.000     .0715667    .1672137
             d82 |   .1781901   .0241903     7.37   0.000      .130778    .2256022
             d83 |   .2257865   .0243712     9.26   0.000     .1780198    .2735531
             d84 |   .2968181   .0271391    10.94   0.000     .2436264    .3500098
             d85 |   .3459333    .026309    13.15   0.000     .2943686    .3974981
             d86 |   .4062418    .027297    14.88   0.000     .3527407    .4597429
             d87 |   .4730023   .0259871    18.20   0.000     .4220686    .5239361
           _cons |   1.393477   .0238999    58.30   0.000     1.346634     1.44032
    -------------+----------------------------------------------------------------
         sigma_u |  .37007665
         sigma_e |  .35469771
             rho |  .52120938   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    
    . xtoverid
    
    Test of overidentifying restrictions: fixed vs random effects
    Cross-section time-series model: xtreg re  robust cluster(nr)
    Sargan-Hansen statistic 414.096  Chi-sq(7)    P-value = 0.0000
    
    . xtreg lwage d81-d87, fe vce(cluster nr)
    
    Fixed-effects (within) regression               Number of obs     =      4,360
    Group variable: nr                              Number of groups  =        545
    
    R-sq:                                           Obs per group:
         within  = 0.1625                                         min =          8
         between =      .                                         avg =        8.0
         overall = 0.0752                                         max =          8
    
                                                    F(7,544)          =      59.16
    corr(u_i, Xb)  = -0.0000                        Prob > F          =     0.0000
    
                                       (Std. Err. adjusted for 545 clusters in nr)
    ------------------------------------------------------------------------------
                 |               Robust
           lwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             d81 |   .1193902   .0244002     4.89   0.000       .07146    .1673204
             d82 |   .1781901   .0241903     7.37   0.000     .1306722     .225708
             d83 |   .2257865   .0243712     9.26   0.000     .1779133    .2736596
             d84 |   .2968181   .0271391    10.94   0.000     .2435078    .3501284
             d85 |   .3459333    .026309    13.15   0.000     .2942536    .3976131
             d86 |   .4062418    .027297    14.88   0.000     .3526214    .4598622
             d87 |   .4730023   .0259871    18.20   0.000      .421955    .5240496
           _cons |   1.393477   .0193815    71.90   0.000     1.355405    1.431549
    -------------+----------------------------------------------------------------
         sigma_u |  .39074676
         sigma_e |  .35469771
             rho |  .54824631   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    The same happens without the vce(cluster nr) option. So it's a bug in the program. It doesn't know how to handle the degenerated case. Fortunately, it seems to work when you actually need to use it. When you balanced the panel, you should find that RE and FE are identical. Not in the unbalanced case.

    Do I understand correctly that post is just a dummy that varies only across t, and not i? That's what I'm assuming.

    JW

    Comment


    • #3
      Originally posted by Jeff Wooldridge View Post
      I was able to reproduce the problem in a similar data set that is balanced. I included only year dummies and the outcome is a strong rejection using xtoverid. This is particularly troubling because we know RE and FE are numerically identical in this case; there are no restrictions to test. Here's what happens:
      JW
      It appears that xtoverid is testing the joint significance of the year dummies in your example. If you implement the test by means of an artificial regression, the mean-deviated year-dummies will drop out, as you note that there are no restrictions to test. Only the case where sigma_u=0 (e.g., with inclusion of firm dummies in the random effects regression) does xtoverid report that the RE estimates are degenerate.

      Code:
      webuse grunfeld, clear
      tab year, gen(d)
      xtreg invest d2-d20, re vce(cluster company)
      *GENERATE MEAN DEVIATED REGRESSORS
      foreach var of varlist d2-d20{
          bys company: egen m`var'=mean(`var')
          gen md`var'=`var' -m`var'
      }
      * REGRESSION WITH MEAN DEVIATED REGRESSORS
      xtreg invest d2-d20 md2-md20, re cluster(company)
      *TEST OF OVERIDENTIFYING RESTRICTIONS
      testparm md2-md20
      Res.:

      Code:
      . *TEST OF OVERIDENTIFYING RESTRICTIONS
      
      .
      . testparm md2-md20
      no such variables;
      the specified varlist does not identify any testable coefficients
      r(111);

      Comment


      • #4
        Thanks Andrew. That's pretty weird. I should've noticed that the Wald test and the overid test were almost identical.

        Comment


        • #5
          Thank you both for your quick and helpful responses. You are correct, Jeff. I should have explained the variable names. post is a dummy that is equal to 0 in the first half of the sample period (before the intervention) and 1 in the second half (after); t is a linear time trend with monthly intervals; and post_t is their interaction. RE and FE are indeed identical in the balanced case.

          Good catch, Andrew! I did not see how close the Wald test and xtoverid were either. Given that the Wald test and xtoverid are not identical in the following, should I interpret the xtoverid result as legitimate?

          Originally posted by Maxime Bercholz View Post
          Code:
          . xtreg y post t post_t, re vce(cluster id)
           
          Random-effects GLS regression                   Number of obs     =     69,494
          Group variable: id                              Number of groups  =      2,383
           
          R-sq:                                           Obs per group:
               within  = 0.0078                                         min =          1
               between = 0.0004                                         avg =       29.2
               overall = 0.0046                                         max =         36
           
                                                          Wald chi2(3)      =     174.64
          corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
           
                                           (Std. Err. adjusted for 2,383 clusters in id)
          ------------------------------------------------------------------------------
                       |               Robust
                     y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                  post |  -44.64032   8.956723    -4.98   0.000    -62.19518   -27.08547
                     t |  -.7876857   .3423413    -2.30   0.021    -1.458662    -.116709
                post_t |   .6493272   .4381844     1.48   0.138    -.2094984    1.508153
                 _cons |   483.5995   6.881025    70.28   0.000     470.1129     497.086
          -------------+----------------------------------------------------------------
               sigma_u |  230.67539
               sigma_e |  228.77046
                   rho |  .50414608   (fraction of variance due to u_i)
          ------------------------------------------------------------------------------
           
          . xtoverid
           
          Test of overidentifying restrictions: fixed vs random effects
          Cross-section time-series model: xtreg re  robust cluster(id)
          Sargan-Hansen statistic  14.543  Chi-sq(3)    P-value = 0.0023
          Given that in the balanced case xtoverid does not reject (see below), is it fair to assume the xtoverid result above is legitimate and indicates changes in the sample (attrition and replenishment) over time such that the panel effects end up correlated with these time controls?

          Code:
          . xtreg y post t post_t if T_i == 36, re vce(cluster id)
          
          Random-effects GLS regression                   Number of obs     =     49,572
          Group variable: id                              Number of groups  =      1,377
          
          R-sq:                                           Obs per group:
               within  = 0.0000                                         min =         36
               between = 0.0000                                         avg =       36.0
               overall = 0.0042                                         max =         36
          
                                                          Wald chi2(3)      =     145.71
          corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
          
                                           (Std. Err. adjusted for 1,377 clusters in id)
          ------------------------------------------------------------------------------
                       |               Robust
                     y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                  post |  -40.81003   10.04302    -4.06   0.000    -60.49399   -21.12607
                     t |  -.4471694    .406588    -1.10   0.271    -1.244067    .3497285
                post_t |   .2865799   .5017187     0.57   0.568    -.6967708    1.269931
                 _cons |   502.6523   8.516781    59.02   0.000     485.9597    519.3449
          -------------+----------------------------------------------------------------
               sigma_u |  225.27168
               sigma_e |  223.83782
                   rho |  .50319266   (fraction of variance due to u_i)
          ------------------------------------------------------------------------------
          
          . xtoverid
          
          Test of overidentifying restrictions: fixed vs random effects
          Cross-section time-series model: xtreg re  robust cluster(id)
          Sargan-Hansen statistic   1.210  Chi-sq(1)    P-value = 0.2714
          One last question: does 'degenerated' here mean all regressors being invariant over the panels (households)?

          Originally posted by Jeff Wooldridge View Post
          It doesn't know how to handle the degenerated case.
          Thank you for your help.

          Maxime

          Comment


          • #6
            Let me add to my last post. I did not think through all of the implications of your responses. From what I understand, in the balanced case with regressors that only vary over time (not over households), there are no restrictions to test because FE and RE are identical, so whatever xtoverid gives, it's not meaningful. If this is correct, ignore my reference to the balanced case. In the unbalanced case, however, FE and RE are not identical so the xtoverid result is meaningful (unless it isn't for other reasons), did I get that right?

            Comment


            • #7
              With time dummies included in the RE regression, you probably should not trust the output of xtoverid. It is most likely arbitrarily dropping constraints and testing the significance of the remaining. I gave you an illustration of how you can manually implement the test in #3, and you should do this to find out exactly what is going on. The following is a clearer example.

              Code:
              webuse grunfeld, clear
              *UNBALANCE THE PANEL
              drop if inlist(company, 2, 3) & inlist(time, 4, 5)
              xtset company year
              *REGRESSORS IN BLUE
              xtreg invest mvalue kstock, re vce(cluster company)
              *XTOVERID COMMAND
              xtoverid
              *BY HAND
              *GENERATE MEAN DEVIATED REGRESSORS
              foreach var of varlist invest mvalue kstock{
                  bys company: egen m`var'=mean(`var')
                  gen md`var'=`var' -m`var'
              }
              * REGRESSION WITH MEAN DEVIATED REGRESSORS (WITH PREFIX "md" IN GREEN)
              xtreg invest mvalue kstock mdmvalue mdkstock, re cluster(company)
              *TEST OF OVERIDENTIFYING RESTRICTIONS
              testparm mdmvalue mdkstock 
              Res.:

              Code:
              . xtoverid
              
              Test of overidentifying restrictions: fixed vs random effects
              Cross-section time-series model: xtreg re  robust cluster(company)
              Sargan-Hansen statistic   6.456  Chi-sq(2)    P-value = 0.0396
              
              
              
              . *TEST OF OVERIDENTIFYING RESTRICTIONS
              
              . 
              . testparm mdmvalue mdkstock
              
               ( 1)  mdmvalue = 0
               ( 2)  mdkstock = 0
              
                         chi2(  2) =    6.46
                       Prob > chi2 =    0.0396

              Comment


              • #8
                Thanks Andrew, that's really helpful! I better understand what you did in #3. I will try that.

                Comment

                Working...
                X