Clustering Standard errors at industry versus company level

Jeff Wooldridge

Join Date: Apr 2014

Posts: 2174
#16

03 Aug 2020, 12:30

As a follow up -- and something I now teach in every panel data short course -- if you have a true panel and put the unit fixed effects in manually, and then cluster by unit, you get the proper standard errors for the coefficients on time-varying covariates, but the standard errors for the fixed effects are garbage. As Eric said, you are trying to estimate the standard error of a mean without putting any restriction on the correlation within each unit. In time series, it is well known that one has to do some tapering -- a la Newey and West. You can't leave all correlations unrestricted. The standard errors on the fixed effects are not quite zero because of controlling for other covariates, but they are meaningless nonetheless.

This is why using -xtreg- is the best choice unless you know what to expect. Stata won't report standard errors for the fixed effects -- and properly so.
1 like
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#17

03 Aug 2020, 12:57

Professor Wooldridge, this is exactly why this caught me off guard, because we do it all the time in panel data, and the agreement seems to be that we should be clustering by the fixed effect (Arellano, 1987; Kezdi, G., 2003). Now what I forgot was that this approach gives correct inference on the other variables, not on the fixed effects.

I actually cannot recall this fact (that the standard errors on the fixed effects with clustering are identically zero, if you have only the fixed effects in the model and nothing else) being very conspicuously stated in any of your books.

I do recall you criticising Donald, and Lang (2007)'s approach to the end of that in the standard two sample problem, if we do what Donald, and Lang (2007) suggest we are ending up with 2 observations having to estimate two parameters. Which is what is happening here as well, in the two sample problem if we include two fixed effects and cluster by the fixed effects (that is cluster by the sample), we would not be able to carry out inference.

Originally posted by Jeff Wooldridge View Post

As a follow up -- and something I now teach in every panel data short course -- if you have a true panel and put the unit fixed effects in manually, and then cluster by unit, you get the proper standard errors for the coefficients on time-varying covariates, but the standard errors for the fixed effects are garbage. As Eric said, you are trying to estimate the standard error of a mean without putting any restriction on the correlation within each unit. In time series, it is well known that one has to do some tapering -- a la Newey and West. You can't leave all correlations unrestricted. The standard errors on the fixed effects are not quite zero because of controlling for other covariates, but they are meaningless nonetheless.

This is why using -xtreg- is the best choice unless you know what to expect. Stata won't report standard errors for the fixed effects -- and properly so.
Comment

Joro Kolev

Join Date: Aug 2018
Posts: 3050

#18

03 Aug 2020, 13:04

Here is an illustration, with only fixed effects, standard errors on the fixed effects are identically zero:

Code:

. sysuse auto, clear
(1978 Automobile Data)

. reg price i.rep, cluster(rep)

Linear regression                               Number of obs     =         69
                                                F(0, 4)           =          .
                                                Prob > F          =          .
                                                R-squared         =     0.0145
                                                Root MSE          =     2980.2

                                  (Std. Err. adjusted for 5 clusters in rep78)
------------------------------------------------------------------------------
             |               Robust
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       rep78 |
          2  |   1403.125   2.83e-11  5.0e+13   0.000     1403.125    1403.125
          3  |   1864.733   2.83e-11  6.6e+13   0.000     1864.733    1864.733
          4  |       1507   2.83e-11  5.3e+13   0.000         1507        1507
          5  |     1348.5   2.83e-11  4.8e+13   0.000       1348.5      1348.5
             |
       _cons |     4564.5   2.83e-11  1.6e+14   0.000       4564.5      4564.5
------------------------------------------------------------------------------

However if you have another regressor, the standard errors on the fixed effects are no longer identically zero. So if you are not particularly interested in the fixed effects, like I have never been, you can spend a life time running such regressions, and not notice that there is something dodgy here:

Code:

. reg mpg price i.rep, cluster(rep)

Linear regression                               Number of obs     =         69
                                                F(0, 4)           =          .
                                                Prob > F          =          .
                                                R-squared         =     0.4241
                                                Root MSE          =     4.6251

                                  (Std. Err. adjusted for 5 clusters in rep78)
------------------------------------------------------------------------------
             |               Robust
         mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       price |  -.0008829   .0002458    -3.59   0.023    -.0015653   -.0002006
             |
       rep78 |
          2  |  -.6361411   .3448288    -1.84   0.139    -1.593539    .3212572
          3  |   .0797594   .4582726     0.17   0.870    -1.192609    1.352128
          4  |    1.99724   .3703569     5.39   0.006     .9689641    3.025515
          5  |   7.554265   .3314043    22.79   0.000     6.634139    8.474391
             |
       _cons |   25.03013   1.121761    22.31   0.000     21.91562    28.14464
------------------------------------------------------------------------------

There is nothing suspiciously looking in the above regression, except that the F-statistic for overall significance of the model is missing, and is all wrong--I do not know how Stata comes up with 0 numerator and 4 denominator degrees of freedom here...

Comment

lal mohan kumar

Join Date: May 2019

Posts: 265
#19

03 Aug 2020, 22:31

Thanks Eric. As you pointed in my case it will be 30-30-1=-1 hence negative DoF. Thanks
Comment

Announcement

Comment

Comment

Comment

Comment