DF in clustered regression linear coefficient test

Dimitriy V. Masterov

Join Date: Mar 2014
Posts: 609

DF in clustered regression linear coefficient test

17 Jan 2019, 19:25

I am not sure I understand why the degrees of freedom are 36 rather than 390 in the following Wald test in a clustered regression with N=395, 5 parameters, and N_clusters = 37. I tried to do the same analysis in R, and I got the same F statistic, but R used F(2,390) rather than F(2,36). Not sure which is correct.

Code:

. use https://stats.idre.ucla.edu/stat/stata/webbooks/reg/elemapi2, clear

. regress api00 acs_k3 acs_46 full enroll, cluster(dnum)

Linear regression                               Number of obs     =        395
                                                F(4, 36)          =      31.18
                                                Prob > F          =     0.0000
                                                R-squared         =     0.3849
                                                Root MSE          =      112.2

                                  (Std. Err. adjusted for 37 clusters in dnum)
------------------------------------------------------------------------------
             |               Robust
       api00 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      acs_k3 |   6.954381   6.901117     1.01   0.320    -7.041734     20.9505
      acs_46 |   5.966015   2.531075     2.36   0.024     .8327565    11.09927
        full |   4.668221   .7034641     6.64   0.000      3.24153    6.094913
      enroll |  -.1059909   .0429478    -2.47   0.018    -.1930931   -.0188888
       _cons |  -5.200407   121.7856    -0.04   0.966     -252.193    241.7922
------------------------------------------------------------------------------

. test acs_k3=enroll=0

 ( 1)  acs_k3 - enroll = 0
 ( 2)  acs_k3 = 0

       F(  2,    36) =    3.95
            Prob > F =    0.0281

Last edited by Dimitriy V. Masterov; 17 Jan 2019, 19:30.

Tags: cluster, wald

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

17 Jan 2019, 21:49

With the cluster robust vce, which you specified in your command, the denominator df is the number of clusters - 1, which is 36 in your data.
Comment
Dimitriy V. Masterov

Join Date: Mar 2014

Posts: 609
#3

17 Jan 2019, 23:11

I realize that is where the 36 comes from. Any intuition why that is, or which choice is correct?
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#4

18 Jan 2019, 01:59

There is no much intuition, the degrees of freedom Stata uses are a zoo. As it happens the degrees of freedom Stata uses in clustering is (G-1) where G is the number of clusters.

This is a problem only if you are numerically comparing different commands within Stata, or same estimation across softwares (like you did, Stata vs R). Stata has given me plenty of headache along these lines...

As far as econometrics is concerned, the "degrees of freedom" are not a meaningful concept in the context of robust and clustered standard errors, because there is no small sample theory behind these estimators. They "work" as your sample size goes to infinity (robust), or as your number of clusters goes to infinity (clustered standard errors).

In my view what you wrote that R is reporting as degrees of freedom is patently insane. And very very misleading.

My personal view is that your degrees of freedom (again, meaningless concept in this case, but if you insist on it) is (G-K), the number of clusters minus the number of regressors. What Stata does, (G-1) is not totally crazy either.

Last edited by Joro Kolev; 18 Jan 2019, 02:02.
Comment

Announcement

DF in clustered regression linear coefficient test

Comment

Comment

Comment