reghdfe degrees of freedom

Victoria Consolvo

Join Date: Mar 2019

Posts: 31
#1

reghdfe degrees of freedom

19 Feb 2020, 09:25

I am running a regression with fixed effects, time effects, and using two-way clustered standard errors (clustering on both person and time). I am using reghdfe to do this. I am using the

Code:

noabsorb

option because I'd like to estimate all of the time and fixed effects coefficients. However, I am confused by the degrees of freedom reported in the output.
The table reports the following:

Code:

HDFE Linear regression Number of obs = 2,547 Absorbing 1 HDFE group F( 163, 71) = . Statistics robust to heteroskedasticity Prob > F = . R-squared = 0.4645 Adj R-squared = 0.4279 Number of clusters (person) = 91 Within R-sq. = 0.4645 Number of clusters (yq) = 72 Root MSE = 0.2933 (Std. Err. adjusted for 72 clusters in person yq)

So are the degrees of freedom used to compute the t-statistics based on the yq number of clusters? And why is this the case when I have both person and yq clusters? Should the degrees of freedom correspond to 91+72 ? Any clarification on this would be great. Thank you!
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 10195
#2

19 Feb 2020, 10:38

So are the degrees of freedom used to compute the t-statistics based on the yq number of clusters?

No, reghdfe allows multi-way clustering, so it is person-yq clusters. The command uses tuples from SSC to create multi-way clusters, but the algorithm for a pair of cluster variables is such that the number of clusters will be the minimum of the number of variables in the two cluster variables. Anyway, to check that person-yq clusters are different from yq clusters, compare

Code:

reghdfe..., cluster(person yq) noa

with

Code:

reghdfe..., cluster(yq) noa

and note the difference.
Comment
Victoria Consolvo

Join Date: Mar 2019

Posts: 31
#3

19 Feb 2020, 12:45

Hi Andrew,
Thanks for the answer. So, just to clarify--when Stata reports significance of a coefficient at the 5% level, for example, it is using a two-tailed test that is based on critical values that are determined using the degrees of freedom associated with the person-yq cluster number above?
Thanks,
Vicki
Comment
Victoria Consolvo

Join Date: Mar 2019

Posts: 31
#4

19 Feb 2020, 12:48

In other words--when I am reporting degrees of freedom in my results table, should I be using "df_r" or the residual degrees of freedom, as those correspond to the aforementioned number of clusters?
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10195

19 Feb 2020, 14:12

So, just to clarify--when Stata reports significance of a coefficient at the 5% level, for example, it is using a two-tailed test that is based on critical values that are determined using the degrees of freedom associated with the person-yq cluster number above?

Exactly, that minus 1. This is a t-test and you can replicate it using Stata's test command. This will output an F-statistic, but we can get the t-statistic by simply noting \(\text{t}=\sqrt{\text{F}}\).

Code:

. webuse grunfeld

. reghdfe invest mvalue kstock, clust(company year) noa
(MWFE estimator converged in 1 iterations)

HDFE Linear regression                            Number of obs   =        200
Absorbing 1 HDFE group                            F(   2,      9) =      57.99
Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                  R-squared       =     0.8124
                                                  Adj R-squared   =     0.8105
Number of clusters (company) =         10         Within R-sq.    =     0.8124
Number of clusters (year)    =         20         Root MSE        =    94.4084

                          (Std. Err. adjusted for 10 clusters in company year)
------------------------------------------------------------------------------
             |               Robust
      invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      mvalue |   .1155622   .0163518     7.07   0.000     .0785719    .1525524
      kstock |   .2306785   .0784739     2.94   0.016     .0531582    .4081988
       _cons |  -42.71437   19.50561    -2.19   0.056    -86.83912    1.410381
------------------------------------------------------------------------------

. test mvalue

 ( 1)  mvalue = 0

       F(  1,     9) =   49.95
            Prob > F =    0.0001

. di sqrt(r(F))
7.0672637

Comment

Andrew Musau

Join Date: Oct 2014

Posts: 10195
#6

21 Feb 2020, 04:16

Read "minimum of the number of variables in the two cluster variables" as "minimum of the number of groups in the two cluster variables" in #2.
Comment

Announcement