Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Two-way Fixed Effects Model with units of observation not nested within clusters.

    Hi, my name is David and this is the first time posting to this forum.

    I'm actually coming from R, I hope you are not to harsh on the code below for not beeing too elegant.

    Say I have a panel of four banks, which are overseen by 5 administrative authorities ("Admin_A, ... "Admin_E").
    As is indicated by the data, in 2019 Banks C and D are overseen by Authority C and D ("Admin_C", "Admin_D") , which merge in 2020 and form authority E ("Admin_E"). Therefore, the units of observations (the banks), are not "nested within clusters", as they appear within multiple clusters (which does reflect the "real world" and is factually correct, i.e., there is no error in the data).

    I am wondering about the correct econometric approach in such a situation (assuming that clusted standard errors are the "way to go").
    • Do I use the "nonest" option and force Stata to compute clusterered standard errors anyway?
    • Do I "fix" clusters as they are in the first period (2019) and assume that banks C and D are continued to be overseen by Admin C and D, which however does not reflect the real data (and would assume that admin E does not exist, therefore irgnoring any effects the mergers has on superviesed banks)
    • Any other appropriate solutions, that I did not yet come up with (and the literature is not very clear on this issue).
    Any hint is highly appreciated (Sidenote: Unlike Stata, R does not complain about "panels are not nested within clusters". I can provide R code if so desired. ).

    Below is a worked out minimal working example.

    Best wishes,
    David.

    Code:
    version 17, clear all // Should work on older versions as well. 
    clear all
    /*     Example Dataset 
        -> The actual numbers do not matter. */ 
    
    /* Generate Data */ 
    
    input str20 firm year y cvar1 cvar2 str20 clust 
    "Bank_A" 2019 0.090 0.324 0.234 "Admin_A"
    "Bank_A" 2020 0.808 0.234 0.182 "Admin_A"
    "Bank_A" 2021 1.592 8.289 1.582 "Admin_A"
    "Bank_B" 2019 8.294 5.283 1.534 "Admin_B"
    "Bank_B" 2020 7.284 4.272 1.643 "Admin_B"
    "Bank_B" 2021 5.298 2.524 -5.25 "Admin_B"
    "Bank_C" 2019 8.252 2.553 1.53  "Admin_C"
    "Bank_C" 2020 6.153 6.535 8.535 "Admin_E" 
    "Bank_C" 2021 5.255 2.645 1.564 "Admin_E"
    "Bank_D" 2019 4.253 5.256 2.654 "Admin_D" 
    "Bank_D" 2020 5.256 0.532 5.285 "Admin_E" 
    "Bank_D" 2021 6.594 5.352 1.564 "Admin_E"
    end 
    
    
    encode(firm), gen(firm_enc)
    
    summarize 
    // Set Panel IDs
    xtset firm_enc year
    // Run TWFE Regressin and force clustered standard errors. 
    xtreg y cvar1 cvar2 i.year , fe vce(cl clust) nonest

  • #2
    David:
    welcome to this forum.
    Have you considered the community-contributed module -reghdfe-?
    Code:
    . egen panelid=group( firm)
    
    .egen admin=group( cl)
    
    . xtset panelid year
    
    . reghdfe y cvar1 cvar2 , abs(panelid year) vce(cl panelid admin )
    (MWFE estimator converged in 2 iterations)
    Warning: VCV matrix was non-positive semi-definite; adjustment from Cameron, Gelbach & Miller applied.
    
    HDFE Linear regression                            Number of obs   =         12
    Absorbing 2 HDFE groups                           F(   2,      3) =       0.61
    Statistics robust to heteroskedasticity           Prob > F        =     0.5992
                                                      R-squared       =     0.8701
                                                      Adj R-squared   =     0.5237
    Number of clusters (panelid) =          4         Within R-sq.    =     0.1335
    Number of clusters (admin)   =          5         Root MSE        =     1.9131
    
                              (Std. err. adjusted for 4 clusters in panelid admin)
    ------------------------------------------------------------------------------
                 |               Robust
               y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
           cvar1 |   .1455099   .2979472     0.49   0.659    -.8026911    1.093711
           cvar2 |   .0248832   .4843503     0.05   0.962    -1.516535    1.566302
           _cons |   4.352654   .4500577     9.67   0.002     2.920369    5.784938
    ------------------------------------------------------------------------------
    
    Absorbed degrees of freedom:
    -----------------------------------------------------+
     Absorbed FE | Categories  - Redundant  = Num. Coefs |
    -------------+---------------------------------------|
         panelid |         4           4           0    *|
            year |         3           0           3     |
    -----------------------------------------------------+
    * = FE nested within cluster; treated as redundant for DoF computation
    
    .
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Dear Carlo Lazzaro,

      thank you very much for your quick reply. Your solution to the problem would be what is known in the empirical literature as "multiway clustering", correct ? I.e, to cluster standard errors by firm and cluster (that is the firm and administrative level). Can a single firm be a cluster (and does that make sense ? )

      Would you consider this approach as a "technical solution "to the problem, or a "valid solution" if one tries to capture the effects (in terms of standard errors), caused by the cluster mergers ?

      I replicated your Stata results in R (as I was not aware of the ssc "package" reghdfe (sorry if the terminology is incorrect; R calculates some of the stats different from Stata and I did not apply a "small sample correction", but the coefficients check out. ).

      Best wishes,
      David.
      Click image for larger version

Name:	stataforum_equivalent_table.JPG
Views:	1
Size:	26.0 KB
ID:	1736867




      Comment


      • #4
        David:
        1) "multiway clustering" is correct;
        2) reading your original post once more, I notice that, while a panel (-firm-, in your case), can be considered as a -cluster, you need at least 30 clusters to make cluster-robust standard errors work out fine (https://cameron.econ.ucdavis.edu/res...February.pdf);
        3) conversely, if the number of your panels is <30, you should stick with the default standard errors (unless you detected heteroskedasticity and this nuisance cannot be fixed by a transformation of the regressand, say, logging it).
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Dear Carlo,

          thank you for your answer. In the dataset I work with, I observe roughly 7000 distinct "firms" over 20 years. The number of administrative bodies is roughly 210 and decreasing to roughly 190 (i.e. some of them merge), thereby causing the problem of firms appearing in multiple clusters (The minimal working example above just tries to illustrate the problem).

          Given the information above (N=7000, T = 20, Clusters 210 -> 190 over time), would you recommend or prefer multiway clustering from an econometric perspective to say, fixing the clusters in the first year (and act, as if the cluster mergers did not happen? ?

          Once again, your help and expertise is highly appreciated.

          David.

          Comment


          • #6
            David:
            1) given your clarification, cluster-robust standard errors are mandatory;
            2) i would probably keep things simpler and cluster on the -panelid- only;
            3) I would add, as usual, -i.year- among the predictors, regardless of administative authorities merge.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment

            Working...
            X