Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference between FE estimation with or without clustered errors

    Dear Statalist, I am estimating the effect of a sectoral variable (that varies by firms) on a firm-level variable. However, I do not fully understand why when doing the regression with firms and sectoral-year FEs (first table below) differs from the same regression but with clustered errors (second table below). I thought that sector-year FEs would control by the correlation among the firms belonging to the same sector; but looking at how different the results are statistically speaking I am not sure. Could some one explain me why would it be necessary to cluster the errors even though I am controlling by sectoral-year FEs? Thanks in advance for your help.

    Code:
    reghdfe y L.c.x1 , absorb(firm year#sec, resid) 
    (dropped 282 singleton observations)
    (MWFE estimator converged in 8 iterations)
    
    HDFE Linear regression                            Number of obs   =    105,737
    Absorbing 2 HDFE groups                           F(   1,  94172) =       5.11
                                                      Prob > F        =     0.0237
                                                      R-squared       =     0.4577
                                                      Adj R-squared   =     0.3911
                                                      Within R-sq.    =     0.0001
                                                      Root MSE        =  1.043e+05
    
    ------------------------------------------------------------------------------
               y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
              x1 |
             L1. |   1.611094   .7124217     2.26   0.024     .2147551    3.007433
                 |
           _cons |  -68204.06   38008.35    -1.79   0.073      -142700    6291.881
    ------------------------------------------------------------------------------
    
    Absorbed degrees of freedom:
    -----------------------------------------------------+
     Absorbed FE | Categories  - Redundant  = Num. Coefs |
    -------------+---------------------------------------|
            firm |     11146           0       11146     |
        year#sec |       456          38         418     |
    -----------------------------------------------------+
    Code:
    reghdfe y L.c.x1 , absorb(firm year#sec, resid) cluster(sec)
    (dropped 282 singleton observations)
    (MWFE estimator converged in 8 iterations)
    
    HDFE Linear regression                            Number of obs   =    105,737
    Absorbing 2 HDFE groups                           F(   1,     37) =       0.14
    Statistics robust to heteroskedasticity           Prob > F        =     0.7093
                                                      R-squared       =     0.4577
                                                      Adj R-squared   =     0.3908
                                                      Within R-sq.    =     0.0001
    Number of clusters (sec)     =         38         Root MSE        =  1.043e+05
    
                                       (Std. Err. adjusted for 38 clusters in sec)
    ------------------------------------------------------------------------------
                 |               Robust
               y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
              x1 |
             L1. |   1.611094   4.288293     0.38   0.709    -7.077813        10.3
                 |
           _cons |  -68204.06   228776.2    -0.30   0.767    -531748.7    395340.5
    ------------------------------------------------------------------------------
    
    Absorbed degrees of freedom:
    -----------------------------------------------------+
     Absorbed FE | Categories  - Redundant  = Num. Coefs |
    -------------+---------------------------------------|
            firm |     11146       11146           0    *|
        year#sec |       456         456           0    *|
    -----------------------------------------------------+
    * = FE nested within cluster; treated as redundant for DoF computation

  • #2
    The coefficients are unaffected by the method employed to calculate standard errors.

    In the first table you show, the OLS standard errors (conventional) that are displayed are most likely heavily biased and overly optimistic (i.e. biased downwards) because you are assuming homoscedasticity and independence of each observation (ruling out serial correlation).

    In your second table, not only are you correcting for heteroscedasticity and possible dependence of observations within sectors, but you are also tackling serial autocorrelation. The less you assume, the less you can deliver but the more likely that you are correct. This explains why in your second table, by assuming a lot less of the data generating process, the estimated standard errors are closer to the true standard deviation of the estimate, which is quite massive relative to the unreallistically low standard errors of your first table.

    That being said, we now run in to another problem. The cluster robust standard errors of the second table are most likely biased downwards as well as you have a very low number of clusters (less than 50). If I were you, I would cluster by firm.

    Abadie, Athey, Imbens and Wooldridge (2017) have written a brilliant paper on this topic, which I suggest you read: https://www.nber.org/papers/w24003

    Hope this helps!

    Comment


    • #3
      Dear Maxence, thanks for your help and for the suggested paper. I thought that the sectoral-year FEs would control for the within sectoral correlation, mainly because this way assumes no independence of observations, but I will read in deep the suggested paper. Also I thought that when several levels are used (here firms and sectors), one would be to cluster at the highest level, which in this case is the sector, even though I agree that maybe 38 groups are not too much groups. Thanks again.

      Comment


      • #4
        Given that you have multiple levels in your data, you may want to try a mixed-effects model (Stata's mixed command) as a robustness check.

        Sectoral-year FEs allow year shocks to be sector-specific, which is good and will tackle instances of cross-sectional correlation for instance. However, they almost certainly will not remove heteroscedasticity and intra group (i.e. here intra-sector) correlation, which is in turn addressed by clustering standard errors.

        Comment

        Working...
        X