Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • 2 way clustering in fixed effects panel data regression

    Hi,

    I'm trying a fixed effects regression with panel data where I cluster across two dimensions. Following the discussion here, I tried two-way clustering in the two ways and got non-identical results. This has left me confused as to which method is the correct one. For demonstration, I use a sample data given by
    Code:
    use "http://www.stata-press.com/data/r14/nlswork.dta", clear
    Method 1:
    Code:
    use "http://www.stata-press.com/data/r14/nlswork.dta", clear
    reghdfe ln_wage hours age i.race i.year,a(idcode) vce (cluster occ_code year)
    (dropped 548 singleton observations)
    note: 2bn.race is probably collinear with the fixed effects (all partialled-out values are close to zero; tol =
    >  1.0e-09)
    note: 3bn.race is probably collinear with the fixed effects (all partialled-out values are close to zero; tol =
    >  1.0e-09)
    (MWFE estimator converged in 1 iterations)
    Warning: VCV matrix was non-positive semi-definite; adjustment from Cameron, Gelbach & Miller applied.
    warning: missing F statistic; dropped variables due to collinearity or too few clusters
    note: 2.race omitted because of collinearity
    note: 3.race omitted because of collinearity
    
    HDFE Linear regression                            Number of obs   =     27,775
    Absorbing 1 HDFE group                            F(  16,     12) =          .
    Statistics robust to heteroskedasticity           Prob > F        =          .
                                                      R-squared       =     0.6555
                                                      Adj R-squared   =     0.5948
    Number of clusters (occ_code) =         13        Within R-sq.    =     0.1066
    Number of clusters (year)    =         15         Root MSE        =     0.3031
    
                             (Std. Err. adjusted for 13 clusters in occ_code year)
    ------------------------------------------------------------------------------
                 |               Robust
         ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           hours |   .0007912   .0013155     0.60   0.559     -.002075    .0036573
             age |   .0123773    .008567     1.44   0.174    -.0062886    .0310433
                 |
            race |
          black  |          0   7.23e-10     0.00   1.000    -1.57e-09    1.57e-09
          other  |          0   1.85e-09     0.00   1.000    -4.04e-09    4.04e-09
                 |
            year |
             69  |    .074031   .0289008     2.56   0.025     .0110616    .1370005
             70  |   .0475113   .0335979     1.41   0.183    -.0256922    .1207148
             71  |   .0867244   .0357381     2.43   0.032     .0088578    .1645911
             72  |    .085603   .0452419     1.89   0.083    -.0129706    .1841766
             73  |   .0889179   .0482453     1.84   0.090    -.0161995    .1940353
             75  |   .0790484   .0628044     1.26   0.232    -.0577907    .2158875
             77  |    .110587   .0820362     1.35   0.203    -.0681546    .2893286
             78  |    .133743   .0908718     1.47   0.167    -.0642497    .3317356
             80  |   .1167207    .097275     1.20   0.253    -.0952233    .3286646
             82  |   .1137017   .1244713     0.91   0.379    -.1574979    .3849013
             83  |   .1261455   .1254465     1.01   0.334    -.1471791      .39947
             85  |    .151498   .1362975     1.11   0.288    -.1454688    .4484649
             87  |   .1422716   .1548657     0.92   0.376    -.1951518     .479695
             88  |   .1837026   .1757012     1.05   0.316    -.1991175    .5665227
                 |
           _cons |   1.181475   .1358746     8.70   0.000     .8854294     1.47752
    ------------------------------------------------------------------------------
    
    Absorbed degrees of freedom:
    -----------------------------------------------------+
     Absorbed FE | Categories  - Redundant  = Num. Coefs |
    -------------+---------------------------------------|
          idcode |      4149           0        4149     |
    -----------------------------------------------------+
    
    clear


    Method 2:
    Code:
    use "http://www.stata-press.com/data/r14/nlswork.dta", clear
    egen occ_year=group(occ_code year)
    areg ln_wage hours age i.race i.year,a(idcode) cluster(occ_year)
    note: 2.race omitted because of collinearity
    note: 3.race omitted because of collinearity
    
    Linear regression, absorbing indicators         Number of obs     =     28,323
                                                    F(  16,    175)   =      17.61
                                                    Prob > F          =     0.0000
                                                    R-squared         =     0.6648
                                                    Adj R-squared     =     0.5980
                                                    Root MSE          =     0.3031
    
                                 (Std. Err. adjusted for 176 clusters in occ_year)
    ------------------------------------------------------------------------------
                 |               Robust
         ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           hours |   .0007912   .0005501     1.44   0.152    -.0002944    .0018768
             age |   .0123773   .0152226     0.81   0.417    -.0176661    .0424208
                 |
            race |
          black  |          0  (omitted)
          other  |          0  (omitted)
                 |
            year |
             69  |    .074031   .0427223     1.73   0.085    -.0102861    .1583482
             70  |   .0475113     .05194     0.91   0.362    -.0549982    .1500208
             71  |   .0867244   .0607844     1.43   0.155    -.0332405    .2066893
             72  |    .085603   .0725549     1.18   0.240    -.0575924    .2287983
             73  |   .0889179   .0877736     1.01   0.312    -.0843132     .262149
             75  |   .0790484    .113218     0.70   0.486    -.1444001    .3024969
             77  |    .110587   .1410838     0.78   0.434    -.1678578    .3890318
             78  |    .133743   .1569636     0.85   0.395    -.1760424    .4435283
             80  |   .1167207   .1853747     0.63   0.530    -.2491371    .4825784
             82  |   .1137017   .2147943     0.53   0.597     -.310219    .5376224
             83  |   .1261455   .2315117     0.54   0.587    -.3307688    .5830597
             85  |    .151498   .2606477     0.58   0.562    -.3629195    .6659155
             87  |   .1422716   .2896411     0.49   0.624    -.4293677    .7139108
             88  |   .1837026   .2959214     0.62   0.536    -.4003315    .7677367
                 |
           _cons |   1.179466   .2917784     4.04   0.000     .6036086    1.755324
    -------------+----------------------------------------------------------------
          idcode |   absorbed                                    (4697 categories)
    
    
    
    clear
    As can be seen, these two methods give nonidentical results, with the standard error from Method 1 being higher than that from Method 2. I also found the Method 2 standard error to be higher than the robust standard error (result not shown here).

    Can anyone suggest which amongst these two is the correct way to do two-way clustering and why?

    Thanks,

  • #2
    Titir:
    I think you should prefer the community-contributed programme -reghdfe- to accomplish what you have in mind.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Originally posted by Carlo Lazzaro View Post
      Titir:
      I think you should prefer the community-contributed programme -reghdfe- to accomplish what you have in mind.
      Thank you Carlo for your response. Can you please tell me why reghdfe is more suited compared to the other method? I tried looking it up but could not really understand in which ways they are different. That is, if one is clustering the standard error along occ_code and year, what is the other doing?

      Thanks,

      Comment


      • #4
        Titir:
        the main differences between -areg- and the community-contributed programme -reghdfe- are:
        - -areg- is not conceived from a number of groups that increase with sample size;
        - -areg- can't cluster the standard errors on more than one -clusterid- at a time.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Originally posted by Carlo Lazzaro View Post
          Titir:
          the main differences between -areg- and the community-contributed programme -reghdfe- are:
          - -areg- is not conceived from a number of groups that increase with sample size;
          - -areg- can't cluster the standard errors on more than one -clusterid- at a time.
          thank you Carlo for your response. Much appreciated.

          Comment

          Working...
          X