Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Issue with multi‑way clustering code

    I am an undergraduate management student learning how to conduct research by reading academic papers and studying the code they provide. In a recent paper, the authors mention “two‑way clustering at the province and industry levels” and use this code: reghdfe Y X $Firm_Controls $City_Controls, absorb(year firm year##industry) cluster(province##industry)
    However, I remember that two‑way clustering should be specified as cluster(province industry), not cluster(province##industry). Did the authors make a mistake, or am I misunderstanding something?
    I am very grateful to anyone who can answer my question.

  • #2
    ## is the Stata notation for interaction effect. As reghdfe is rather flexible, it not only absorbs the fixed effects of year and industry, but also all potential interactions between the two. For clustering, the handling is apparently a bit different, as the documentation states:
    Each clustervar permits interactions of the type var1#var2. This is equivalent to using egen group(var1 var2) to create a new variable, but more convenient and faster. For instance, vce(cluster firm#year) will estimate SEs with one-way clustering i.e. where all observations of a given firm and year are clustered together.
    https://scorreia.com/help/reghdfe.html
    Whether or not this kind of clustering makes sense is a different question, I am not an expert on this.
    Best wishes

    Stata 18.0 MP | ORCID | Google Scholar

    Comment


    • #3
      You are correct in your statement:

      Originally posted by Fu Wenqiang View Post
      However, I remember that two‑way clustering should be specified as cluster(province industry)
      In the authors' code, province##industry expands to province industry province#industry. So, depending on whether provinces are nested within industries or vice versa, the authors are effectively using either one-way clustering at the province-industry level (if fully nested), or three-way clustering at the province, industry, and province-industry levels (if partially nested).

      Comment


      • #4
        Originally posted by Andrew Musau View Post
        You are correct in your statement:



        In the authors' code, province##industry expands to province industry province#industry. So, depending on whether provinces are nested within industries or vice versa, the authors are effectively using either one-way clustering at the province-industry level (if fully nested), or three-way clustering at the province, industry, and province-industry levels (if partially nested).
        Thanks to your explanation, I believe I now understand. Thank you very much for your response.

        Comment


        • #5
          Originally posted by Felix Bittmann View Post
          ## is the Stata notation for interaction effect. As reghdfe is rather flexible, it not only absorbs the fixed effects of year and industry, but also all potential interactions between the two. For clustering, the handling is apparently a bit different, as the documentation states:

          https://scorreia.com/help/reghdfe.html
          Whether or not this kind of clustering makes sense is a different question, I am not an expert on this.
          Thank you very much for your response.

          Comment


          • #6
            Actually, if the authors used reghdfe from https://github.com/sergiocorreia/reghdfe, the three-way expansion does not occur—province##industry is equivalent to province#industry. Therefore, I am confident in saying that their code implements one-way clustering at the province-industry level, as the example below demonstrates. Their statement is thus misleading.

            Code:
            webuse nlswork, clear
            set seed 07172025
            gen industry=runiformint(1, 40)
            *COMPARE
            reghdfe ln_wage tenure hours, absorb(id) cluster(id##industry)
            reghdfe ln_wage tenure hours, absorb(id) cluster(id industry id#industry)
            reghdfe ln_wage tenure hours, absorb(id) cluster(id#industry)
            Res.:

            Code:
            . reghdfe ln_wage tenure hours, absorb(id) cluster(id##industry)
            (dropped 556 singleton observations)
            (MWFE estimator converged in 1 iterations)
            
            HDFE Linear regression                            Number of obs   =     27,480
            Absorbing 1 HDFE group                            F(   2,  23336) =    1189.07
            Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                              R-squared       =     0.6540
                                                              Adj R-squared   =     0.5926
                                                              Within R-sq.    =     0.0975
            Number of clusters (idcode##industry) =     25,122Root MSE        =     0.3037
            
                              (Std. err. adjusted for 25,122 clusters in idcode##industry)
            ------------------------------------------------------------------------------
                         |               Robust
                 ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
            -------------+----------------------------------------------------------------
                  tenure |   .0342778   .0007033    48.74   0.000     .0328993    .0356563
                   hours |  -.0003775   .0003964    -0.95   0.341    -.0011544    .0003994
                   _cons |   1.585676   .0149167   106.30   0.000     1.556439    1.614914
            ------------------------------------------------------------------------------
            
            Absorbed degrees of freedom:
            -----------------------------------------------------+
             Absorbed FE | Categories  - Redundant  = Num. Coefs |
            -------------+---------------------------------------|
                  idcode |      4142           0        4142     |
            -----------------------------------------------------+
            
            .
            . reghdfe ln_wage tenure hours, absorb(id) cluster(id industry id#industry)
            (dropped 556 singleton observations)
            (MWFE estimator converged in 1 iterations)
            Warning: VCV matrix was non-positive semi-definite; adjustment from Cameron, Gelbach & Miller applied.
            
            HDFE Linear regression                            Number of obs   =     27,480
            Absorbing 1 HDFE group                            F(   2,     39) =     661.47
            Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                              R-squared       =     0.6540
            Number of clusters (idcode)  =      4,142         Adj R-squared   =     0.5926
            Number of clusters (industry) =         40        Within R-sq.    =     0.0975
            Number of clusters (idcode#industry) =     25,122 Root MSE        =     0.3037
            
                   (Std. err. adjusted for 40 clusters in idcode industry idcode#industry)
            ------------------------------------------------------------------------------
                         |               Robust
                 ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
            -------------+----------------------------------------------------------------
                  tenure |   .0342778   .0009468    36.20   0.000     .0323626     .036193
                   hours |  -.0003775   .0005256    -0.72   0.477    -.0014405    .0006855
                   _cons |   1.585676    .019421    81.65   0.000     1.546394    1.624959
            ------------------------------------------------------------------------------
            
            Absorbed degrees of freedom:
            -----------------------------------------------------+
             Absorbed FE | Categories  - Redundant  = Num. Coefs |
            -------------+---------------------------------------|
                  idcode |      4142        4142           0    *|
            -----------------------------------------------------+
            * = FE nested within cluster; treated as redundant for DoF computation
            
            .
            . reghdfe ln_wage tenure hours, absorb(id) cluster(id#industry)
            (dropped 556 singleton observations)
            (MWFE estimator converged in 1 iterations)
            
            HDFE Linear regression                            Number of obs   =     27,480
            Absorbing 1 HDFE group                            F(   2,  23336) =    1189.07
            Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                              R-squared       =     0.6540
                                                              Adj R-squared   =     0.5926
                                                              Within R-sq.    =     0.0975
            Number of clusters (idcode#industry) =     25,122 Root MSE        =     0.3037
            
                               (Std. err. adjusted for 25,122 clusters in idcode#industry)
            ------------------------------------------------------------------------------
                         |               Robust
                 ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
            -------------+----------------------------------------------------------------
                  tenure |   .0342778   .0007033    48.74   0.000     .0328993    .0356563
                   hours |  -.0003775   .0003964    -0.95   0.341    -.0011544    .0003994
                   _cons |   1.585676   .0149167   106.30   0.000     1.556439    1.614914
            ------------------------------------------------------------------------------
            
            Absorbed degrees of freedom:
            -----------------------------------------------------+
             Absorbed FE | Categories  - Redundant  = Num. Coefs |
            -------------+---------------------------------------|
                  idcode |      4142           0        4142     |
            -----------------------------------------------------+

            Comment


            • #7
              Originally posted by Andrew Musau View Post
              Actually, if the authors used reghdfe from https://github.com/sergiocorreia/reghdfe, the three-way expansion does not occur—province##industry is equivalent to province#industry. Therefore, I am confident in saying that their code implements one-way clustering at the province-industry level, as the example below demonstrates. Their statement is thus misleading.

              Code:
              webuse nlswork, clear
              set seed 07172025
              gen industry=runiformint(1, 40)
              *COMPARE
              reghdfe ln_wage tenure hours, absorb(id) cluster(id##industry)
              reghdfe ln_wage tenure hours, absorb(id) cluster(id industry id#industry)
              reghdfe ln_wage tenure hours, absorb(id) cluster(id#industry)
              Res.:

              Code:
              . reghdfe ln_wage tenure hours, absorb(id) cluster(id##industry)
              (dropped 556 singleton observations)
              (MWFE estimator converged in 1 iterations)
              
              HDFE Linear regression Number of obs = 27,480
              Absorbing 1 HDFE group F( 2, 23336) = 1189.07
              Statistics robust to heteroskedasticity Prob > F = 0.0000
              R-squared = 0.6540
              Adj R-squared = 0.5926
              Within R-sq. = 0.0975
              Number of clusters (idcode##industry) = 25,122Root MSE = 0.3037
              
               (Std. err. adjusted for 25,122 clusters in idcode##industry)
              ------------------------------------------------------------------------------
              | Robust
              ln_wage | Coefficient std. err. t P>|t| [95% conf. interval]
              -------------+----------------------------------------------------------------
              tenure | .0342778 .0007033 48.74 0.000 .0328993 .0356563
              hours | -.0003775 .0003964 -0.95 0.341 -.0011544 .0003994
              _cons | 1.585676 .0149167 106.30 0.000 1.556439 1.614914
              ------------------------------------------------------------------------------
              
              Absorbed degrees of freedom:
              -----------------------------------------------------+
              Absorbed FE | Categories - Redundant = Num. Coefs |
              -------------+---------------------------------------|
              idcode | 4142 0 4142 |
              -----------------------------------------------------+
              
              .
              . reghdfe ln_wage tenure hours, absorb(id) cluster(id industry id#industry)
              (dropped 556 singleton observations)
              (MWFE estimator converged in 1 iterations)
              Warning: VCV matrix was non-positive semi-definite; adjustment from Cameron, Gelbach & Miller applied.
              
              HDFE Linear regression Number of obs = 27,480
              Absorbing 1 HDFE group F( 2, 39) = 661.47
              Statistics robust to heteroskedasticity Prob > F = 0.0000
              R-squared = 0.6540
              Number of clusters (idcode) = 4,142 Adj R-squared = 0.5926
              Number of clusters (industry) = 40 Within R-sq. = 0.0975
              Number of clusters (idcode#industry) = 25,122 Root MSE = 0.3037
              
               (Std. err. adjusted for 40 clusters in idcode industry idcode#industry)
              ------------------------------------------------------------------------------
              | Robust
              ln_wage | Coefficient std. err. t P>|t| [95% conf. interval]
              -------------+----------------------------------------------------------------
              tenure | .0342778 .0009468 36.20 0.000 .0323626 .036193
              hours | -.0003775 .0005256 -0.72 0.477 -.0014405 .0006855
              _cons | 1.585676 .019421 81.65 0.000 1.546394 1.624959
              ------------------------------------------------------------------------------
              
              Absorbed degrees of freedom:
              -----------------------------------------------------+
              Absorbed FE | Categories - Redundant = Num. Coefs |
              -------------+---------------------------------------|
              idcode | 4142 4142 0 *|
              -----------------------------------------------------+
              * = FE nested within cluster; treated as redundant for DoF computation
              
              .
              . reghdfe ln_wage tenure hours, absorb(id) cluster(id#industry)
              (dropped 556 singleton observations)
              (MWFE estimator converged in 1 iterations)
              
              HDFE Linear regression Number of obs = 27,480
              Absorbing 1 HDFE group F( 2, 23336) = 1189.07
              Statistics robust to heteroskedasticity Prob > F = 0.0000
              R-squared = 0.6540
              Adj R-squared = 0.5926
              Within R-sq. = 0.0975
              Number of clusters (idcode#industry) = 25,122 Root MSE = 0.3037
              
              (Std. err. adjusted for 25,122 clusters in idcode#industry)
              ------------------------------------------------------------------------------
              | Robust
              ln_wage | Coefficient std. err. t P>|t| [95% conf. interval]
              -------------+----------------------------------------------------------------
              tenure | .0342778 .0007033 48.74 0.000 .0328993 .0356563
              hours | -.0003775 .0003964 -0.95 0.341 -.0011544 .0003994
              _cons | 1.585676 .0149167 106.30 0.000 1.556439 1.614914
              ------------------------------------------------------------------------------
              
              Absorbed degrees of freedom:
              -----------------------------------------------------+
              Absorbed FE | Categories - Redundant = Num. Coefs |
              -------------+---------------------------------------|
              idcode | 4142 0 4142 |
              -----------------------------------------------------+
              Indeed, your evidence is persuasive. Based on his paper, I have confirmed that the provinces and industries in his data are not fully nested; therefore, his code is not actually performing the so‑called “two‑way clustering at the province and industry levels.”

              Comment

              Working...
              X