Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • reghdfe degrees of freedom

    I am running a regression with fixed effects, time effects, and using two-way clustered standard errors (clustering on both person and time). I am using reghdfe to do this. I am using the
    Code:
    noabsorb
    option because I'd like to estimate all of the time and fixed effects coefficients. However, I am confused by the degrees of freedom reported in the output.
    The table reports the following:


    Code:
    HDFE Linear regression                            Number of obs   =      2,547
    Absorbing 1 HDFE group                            F( 163,     71) =          .
    Statistics robust to heteroskedasticity           Prob > F        =          .
                                                      R-squared       =     0.4645
                                                      Adj R-squared   =     0.4279
    Number of clusters (person)  =         91         Within R-sq.    =     0.4645
    Number of clusters (yq)      =         72         Root MSE        =     0.2933
    
                                 (Std. Err. adjusted for 72 clusters in person yq)
    So are the degrees of freedom used to compute the t-statistics based on the yq number of clusters? And why is this the case when I have both person and yq clusters? Should the degrees of freedom correspond to 91+72 ? Any clarification on this would be great. Thank you!

  • #2
    So are the degrees of freedom used to compute the t-statistics based on the yq number of clusters?
    No, reghdfe allows multi-way clustering, so it is person-yq clusters. The command uses tuples from SSC to create multi-way clusters, but the algorithm for a pair of cluster variables is such that the number of clusters will be the minimum of the number of variables in the two cluster variables. Anyway, to check that person-yq clusters are different from yq clusters, compare

    Code:
    reghdfe..., cluster(person yq) noa
    with

    Code:
    reghdfe..., cluster(yq) noa
    and note the difference.

    Comment


    • #3
      Hi Andrew,
      Thanks for the answer. So, just to clarify--when Stata reports significance of a coefficient at the 5% level, for example, it is using a two-tailed test that is based on critical values that are determined using the degrees of freedom associated with the person-yq cluster number above?
      Thanks,
      Vicki

      Comment


      • #4
        In other words--when I am reporting degrees of freedom in my results table, should I be using "df_r" or the residual degrees of freedom, as those correspond to the aforementioned number of clusters?

        Comment


        • #5
          So, just to clarify--when Stata reports significance of a coefficient at the 5% level, for example, it is using a two-tailed test that is based on critical values that are determined using the degrees of freedom associated with the person-yq cluster number above?
          Exactly, that minus 1. This is a t-test and you can replicate it using Stata's test command. This will output an F-statistic, but we can get the t-statistic by simply noting \(\text{t}=\sqrt{\text{F}}\).

          Code:
          . webuse grunfeld
          
          . reghdfe invest mvalue kstock, clust(company year) noa
          (MWFE estimator converged in 1 iterations)
          
          HDFE Linear regression                            Number of obs   =        200
          Absorbing 1 HDFE group                            F(   2,      9) =      57.99
          Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                            R-squared       =     0.8124
                                                            Adj R-squared   =     0.8105
          Number of clusters (company) =         10         Within R-sq.    =     0.8124
          Number of clusters (year)    =         20         Root MSE        =    94.4084
          
                                    (Std. Err. adjusted for 10 clusters in company year)
          ------------------------------------------------------------------------------
                       |               Robust
                invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                mvalue |   .1155622   .0163518     7.07   0.000     .0785719    .1525524
                kstock |   .2306785   .0784739     2.94   0.016     .0531582    .4081988
                 _cons |  -42.71437   19.50561    -2.19   0.056    -86.83912    1.410381
          ------------------------------------------------------------------------------
          
          . test mvalue
          
           ( 1)  mvalue = 0
          
                 F(  1,     9) =   49.95
                      Prob > F =    0.0001
          
          . di sqrt(r(F))
          7.0672637

          Comment


          • #6
            Read "minimum of the number of variables in the two cluster variables" as "minimum of the number of groups in the two cluster variables" in #2.

            Comment

            Working...
            X