Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • DF in clustered regression linear coefficient test

    I am not sure I understand why the degrees of freedom are 36 rather than 390 in the following Wald test in a clustered regression with N=395, 5 parameters, and N_clusters = 37. I tried to do the same analysis in R, and I got the same F statistic, but R used F(2,390) rather than F(2,36). Not sure which is correct.

    Code:
    . use https://stats.idre.ucla.edu/stat/stata/webbooks/reg/elemapi2, clear
    
    . regress api00 acs_k3 acs_46 full enroll, cluster(dnum)
    
    Linear regression                               Number of obs     =        395
                                                    F(4, 36)          =      31.18
                                                    Prob > F          =     0.0000
                                                    R-squared         =     0.3849
                                                    Root MSE          =      112.2
    
                                      (Std. Err. adjusted for 37 clusters in dnum)
    ------------------------------------------------------------------------------
                 |               Robust
           api00 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          acs_k3 |   6.954381   6.901117     1.01   0.320    -7.041734     20.9505
          acs_46 |   5.966015   2.531075     2.36   0.024     .8327565    11.09927
            full |   4.668221   .7034641     6.64   0.000      3.24153    6.094913
          enroll |  -.1059909   .0429478    -2.47   0.018    -.1930931   -.0188888
           _cons |  -5.200407   121.7856    -0.04   0.966     -252.193    241.7922
    ------------------------------------------------------------------------------
    
    . test acs_k3=enroll=0
    
     ( 1)  acs_k3 - enroll = 0
     ( 2)  acs_k3 = 0
    
           F(  2,    36) =    3.95
                Prob > F =    0.0281
    Last edited by Dimitriy V. Masterov; 17 Jan 2019, 19:30.

  • #2
    With the cluster robust vce, which you specified in your command, the denominator df is the number of clusters - 1, which is 36 in your data.

    Comment


    • #3
      I realize that is where the 36 comes from. Any intuition why that is, or which choice is correct?

      Comment


      • #4
        There is no much intuition, the degrees of freedom Stata uses are a zoo. As it happens the degrees of freedom Stata uses in clustering is (G-1) where G is the number of clusters.

        This is a problem only if you are numerically comparing different commands within Stata, or same estimation across softwares (like you did, Stata vs R). Stata has given me plenty of headache along these lines...

        As far as econometrics is concerned, the "degrees of freedom" are not a meaningful concept in the context of robust and clustered standard errors, because there is no small sample theory behind these estimators. They "work" as your sample size goes to infinity (robust), or as your number of clusters goes to infinity (clustered standard errors).

        In my view what you wrote that R is reporting as degrees of freedom is patently insane. And very very misleading.

        My personal view is that your degrees of freedom (again, meaningless concept in this case, but if you insist on it) is (G-K), the number of clusters minus the number of regressors. What Stata does, (G-1) is not totally crazy either.
        Last edited by Joro Kolev; 18 Jan 2019, 02:02.

        Comment

        Working...
        X