A conceptual question about when should I add fixed effect and cluster the fixed effect?

John Williamss

Join Date: Mar 2021

Posts: 42
#1

A conceptual question about when should I add fixed effect and cluster the fixed effect?

31 Oct 2022, 02:34

Suppose I add a time-fixed effect to a panel data regression that I want to estimate using OLS. My question is conceptually when should I also cluster by time (in addition to adding fixed effects)?
Tags: None

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17851

31 Oct 2022, 03:22

John:
let's assume that you are dealing with a short panel (N>T) and want to go -fe- via OLS (BTW: this appoach is outperformed by -xtreg,fe-).
In that case you should cluster on -panelid- (in addition to adding fixed effect) as the epsilon term might be correlated within the observations belonging to the same panel.
Clustering on -timevar- only is not recommended (as you're mainly interested in -panelid-), whereas you might be willing to cluster your standard errors on both N and T dimensions (let's assume that a give "shock" is expected to hit panels differently across time). You can get this double clustering via the community-contributed module -reghdfe-:

Code:

use https://www.stata-press.com/data/r17/nlswork.dta
xtreg ln_wage age i.year, fe vce(cluster idcode)
. xtreg ln_wage age i.year, fe vce(cluster idcode)

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-squared:                                      Obs per group:
     Within  = 0.1060                                         min =          1
     Between = 0.0914                                         avg =        6.1
     Overall = 0.0805                                         max =         15

                                                F(15,4709)        =      69.49
corr(u_i, Xb) = 0.0467                          Prob > F          =     0.0000

                             (Std. err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |   .0125992   .0123091     1.02   0.306    -.0115323    .0367308
             |
        year |
         69  |   .0748621   .0156425     4.79   0.000     .0441955    .1055287
         70  |   .0478697   .0265729     1.80   0.072    -.0042256    .0999649
         71  |   .0865577   .0385328     2.25   0.025     .0110155       .1621
         72  |   .0856757   .0505004     1.70   0.090    -.0133288    .1846802
         73  |   .0880069   .0626993     1.40   0.160    -.0349132    .2109269
         75  |   .0778607   .0865126     0.90   0.368    -.0917446     .247466
         77  |    .108365   .1111117     0.98   0.329    -.1094659    .3261959
         78  |   .1309518   .1237306     1.06   0.290    -.1116181    .3735217
         80  |   .1142649   .1480678     0.77   0.440    -.1760172    .4045471
         82  |   .1090451   .1724619     0.63   0.527    -.2290608    .4471511
         83  |   .1211272   .1846402     0.66   0.512    -.2408539    .4831083
         85  |   .1465637   .2092454     0.70   0.484    -.2636552    .5567825
         87  |   .1382642   .2341219     0.59   0.555    -.3207242    .5972527
         88  |   .1799741   .2500607     0.72   0.472    -.3102618      .67021
             |
       _cons |   1.203731    .235213     5.12   0.000     .7426037    1.664859
-------------+----------------------------------------------------------------
     sigma_u |   .4058746
     sigma_e |  .30300411
         rho |  .64212421   (fraction of variance due to u_i)
------------------------------------------------------------------------------

reghdfe ln_w age i.year, absorb(idcode) vce(cluster idcode year)
. reghdfe ln_w age i.year, absorb(idcode) vce(cluster idcode year)
(dropped 551 singleton observations)
(MWFE estimator converged in 1 iterations)
Warning: VCV matrix was non-positive semi-definite; adjustment from Cameron, Gelbach & Miller applied.
warning: missing F statistic; dropped variables due to collinearity or too few clusters

HDFE Linear regression                            Number of obs   =     27,959
Absorbing 1 HDFE group                            F(  15,     14) =          .
Statistics robust to heteroskedasticity           Prob > F        =          .
                                                  R-squared       =     0.6553
                                                  Adj R-squared   =     0.5949
Number of clusters (idcode)  =      4,159         Within R-sq.    =     0.1060
Number of clusters (year)    =         15         Root MSE        =     0.3030

                           (Std. err. adjusted for 15 clusters in idcode year)
------------------------------------------------------------------------------
             |               Robust
     ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |   .0125992   .0109367     1.15   0.269    -.0108576    .0360561
             |
        year |
         69  |   .0748621   .0107168     6.99   0.000     .0518768    .0978474
         70  |   .0478697   .0232197     2.06   0.058    -.0019315    .0976709
         71  |   .0865577   .0354149     2.44   0.028     .0106003    .1625152
         72  |   .0856757   .0475462     1.80   0.093    -.0163007    .1876521
         73  |   .0880069   .0600766     1.46   0.165    -.0408447    .2168584
         75  |   .0778607   .0818643     0.95   0.358    -.0977207    .2534421
         77  |    .108365   .1034378     1.05   0.313    -.1134871    .3302171
         78  |   .1309518   .1155376     1.13   0.276    -.1168516    .3787553
         80  |   .1142649   .1367118     0.84   0.417    -.1789528    .4074827
         82  |   .1090451   .1562581     0.70   0.497    -.2260953    .4441855
         83  |   .1211272   .1662107     0.73   0.478    -.2353592    .4776136
         85  |   .1465637   .1877183     0.78   0.448    -.2560521    .5491794
         87  |   .1382642   .2093267     0.66   0.520    -.3106968    .5872253
         88  |   .1799741   .2237412     0.80   0.435     -.299903    .6598513
             |
       _cons |   1.205651   .2071379     5.82   0.000     .7613846    1.649918
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
      idcode |      4159        4159           0    *|
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation

.

Kind regards,
Carlo
(Stata 19.0)

Comment

John Williamss

Join Date: Mar 2021

Posts: 42
#3

31 Oct 2022, 11:04

1)I think there was a misunderstanding about time and firm fixed effects. My context is asset pricing. Each observation is a country and LHS is the return of stock index in that country in a given quarter and RHS consists of some macro factors like GDP growth. So my question was whether it is necessary to cluster by time when we already have added time fixed effect (There 40=N<T=300).
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17851
#4

31 Oct 2022, 11:25

John:
for T>N panel datasets with -panelid- fixed effect, see -xtregar,fe- that does not support clustered standard errors.

Last edited by Carlo Lazzaro; 31 Oct 2022, 11:27.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement