Testing for joint significance of FEs with clustered se

Elena Paltseva

Join Date: Sep 2015

Posts: 8
#1

Testing for joint significance of FEs with clustered se

16 Sep 2015, 07:52

Dear Statalisters,

I have a FE estimation with clustered standard errors, - of a type xtreg y x if ...., cluster(panelid) fe, - and I need to test whether the fixed effects are jointly significant.
As I have clustered standard errors, Stata does not provide me with the F-statistic directly. My understanding for why is that the respective statistics is no longer F-distributed, as (and I am quoting STATA FAQs) "When you have clustering, the observations are no longer independent; thus the joint distribution function for the sample is no longer the product of the distribution functions for each observation."

I have found an earlier suggestion from the Stata list on how one can do this by Kit Baum (see here http://www.stata.com/statalist/archi.../msg00544.html and here http://www.stata.com/statalist/archi.../msg00373.html ), but then also a discussion on why one should not apply this procedure (http://www.stata.com/statalist/archi.../msg01127.html) which is in line with the quote above and which made me concerned.

So, in short, is there any way for me to actually do the testing?
In particular, would using a Wald test adjusted for heteroscedasticity (through adjusting the variance-covariance matrix), and Chi2 with respective degrees of freedom as a (limiting) distribution make any sense? (in my understanding the suggested procedure above is a version of such a test, but I am not fully sure of the exact connection). Or am I saying complete nonsense here?

Some information about my data, just in case: I have different samples for different regressions, they are all unbalanced panels, ranging from, say, 60 to 140 panelids (countries) over 120 time periods (quarters), and an average number of observations for each panel id also varies btw 50 and 105 time periods p/panelid. Clustering is at panelid level. Variables are already in first diff.

Finally, I realize that you may be puzzled by why would I need this procedure at all (as opposed to, say, testing between FE and RE), but this is a request from a referee so my freedom of choice here is rather limited

Thank you in advance,
Regards,
Elena
Tags: cluster, fixed effects, panel data, testing
Andrew Musau

Join Date: Oct 2014

Posts: 10214
#2

17 Sep 2015, 08:45

The so called 'robustified' F-test you obtain from running -testparm- after specifying the robust option should be sufficient to tell you whether the fixed effects are significant, and hence the choice between pooled OLS and fixed effects. With such a large number of countries and time period, I would be skeptical if someone indicated that the fixed effects are not significant.

I think it should be OK to report these in the paper and have a short footnote describing that these are 'robustified' F-tests ala Wooldridge.

Reference:
Introductory Econometrics: A Modern Approach, 5th Edition. Jeffrey M. Wooldridge (Chapter 8)

Last edited by Andrew Musau; 17 Sep 2015, 08:57.
Comment
Elena Paltseva

Join Date: Sep 2015

Posts: 8
#3

18 Sep 2015, 03:47

Dear Andrew,

Many thanks for you answer!

I tried this option (also after talking to some of my colleagues), and it did not exactly work. Two things were happening: First, Stata was dropping some of my linear constraints. Second, it was returning quite large (based on my uneducated feeling) F-stats.

I did some more googling and here is my interpretation of what I get: Apparently, the cluster-robust standard error (CRSE) estimator only has as many degrees of freedom as the min(number of clusters M-1, number of parameters) (see p.40 here http://cameron.econ.ucdavis.edu/rese...ober152013.pdf). So it is only able to test M-1 linear restrictions (and this is the number of restrictions for join significance of FEs) when I have M-1 other parameters.
Further, even when it is able to test all M-1 linear restrictions (which did not happen in my regressions because I do not have that many parameters), it does not behave well (see last paragraph on p.33 here http://www.stata.com/meeting/13uk/nichols_crse.pdf).

Just to make sure I did what you advised me to do, here it how I proceeded: I run clustered OLS with dummies for FE instead of xtreg ..., fe to be able to use testparm afterwords, that is

xi: reg y x i.panelid if conditions, cluster(panelid)
xi: testparm i.panelid

Bottom line: I am still puzzled on how (or if) I can proceed on this task...
Comment
Elena Paltseva

Join Date: Sep 2015

Posts: 8
#4

18 Sep 2015, 05:28

See also the middle of p.18 in http://cameron.econ.ucdavis.edu/rese...ober152013.pdf, where G stands for the number of clusters and m - for the number of regressors..
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10214
#5

19 Sep 2015, 18:41

Hi Elena.... sorry for the delay in replying. The heteroskedasticity robust F-statistic is the one you obtain after running the command -testparm- after specifying vce(robust) in your regression. This uses the White sandwich estimator. Once you use a clustered sandwich estimator (cluster clustervar), you run into the problems that you reference above. So my point was - if your goal is to determine whether fixed effects are significant in the presence of heteroskedasticity- the heteroskedasticity robust F-statistic for testing the joint hypothesis is sufficient. I illustrate its computation using the grunfeld data set (as well as the homoskedasticity only F statistic)

Homoskedasticity only F-statistic is based on the sum of squares of two regressions:

F = [(RRSS - URSS) / (N +T - 2)] / [URSS/ (N -1) (T - 1) - K]

where RRSS is the residual sum of squares obtained from the restricted regression, URSS is the residual sum of squares obtained from the unrestricted regression, N is the number of cross sectional units, T is the number time periods, and K the number of independent variables.

Code:

. webuse grunfeld,clear . qui reg invest mvalue i.company . testparm i.company ( 1) 2.company = 0 ( 2) 3.company = 0 ( 3) 4.company = 0 ( 4) 5.company = 0 ( 5) 6.company = 0 ( 6) 7.company = 0 ( 7) 8.company = 0 ( 8) 9.company = 0 ( 9) 10.company = 0 F( 9, 189) = 15.97 Prob > F = 0.0000

Manually from two regressions:

Code:

. qui reg invest mvalue . scalar rss1 = e(rss) . scalar p1= e(df_m)+1 . qui reg invest mvalue i.company . scalar rss2= e(rss) . scalar p2= e(df_m)+1 . scalar N= e(N) . scalar df_n = p2-p1 . scalar df_d= N-p2 . scalar F = ((rss1-rss2)/df_n)/(rss2/df_d) . di df_n 9 . di df_d 189 . di F 15.974004

You need matrix notation to illustrate the computation of the heteroskedasticity robust F-statistic, but this is already programmed into the stata command -testparm-.
Chapter 18 of Stock and Watson (Introduction to Econometrics) is a good reference for details of the computation.

Code:

. qui reg invest mvalue i.company, vce(robust) . testparm i.company ( 1) 2.company = 0 ( 2) 3.company = 0 ( 3) 4.company = 0 ( 4) 5.company = 0 ( 5) 6.company = 0 ( 6) 7.company = 0 ( 7) 8.company = 0 ( 8) 9.company = 0 ( 9) 10.company = 0 F( 9, 189) = 61.75 Prob > F = 0.0000

Notice that the fact that the F-statistics are large is overwhelming evidence of the existence of fixed effects. In this case, there is no doubt that pooled OLS will yield biased and inconsistent estimates of the regression coefficients.

Last edited by Andrew Musau; 19 Sep 2015, 19:09.
Comment
Elena Paltseva

Join Date: Sep 2015

Posts: 8
#6

21 Sep 2015, 05:57

Hi Andrew,

Many thanks again!
So, basically, what you are saying is that if my goal is to check the joint significance of the FE, I can only do it by assuming general heteroscedasticity rather than clustering. Is my understanding correct?
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10214

21 Sep 2015, 10:21

Elena, as long as you have the m (the number of regressors) being less than G-1, the total number of clusters less one (and in your case you potentially have 100+ clusters), then it makes no sense to perform this test of joint significance of the dummies. You have the Cameron reference. You should point this out to the referee and refer him or her to the Cameron paper. My suggestion is to use the clustered estimator to obtain the coefficients, and perform the tests assuming general heteroskedasticity (and in your reply to the referee, indicate why the clustered test is redundant).

Since we have a data set, let us see what happens when you have m< G-1 and you go ahead and perform the test.

Code:

reg invest mvalue i.company, cluster(company)

Linear regression                                      Number of obs =     200
                                                       F(  0,     9) =       .
                                                       Prob > F      =       .
                                                       R-squared     =  0.8491
                                                       Root MSE      =  86.444

                               (Std. Err. adjusted for 10 clusters in company)
------------------------------------------------------------------------------
             |               Robust
      invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      mvalue |   .1898776   .0385543     4.92   0.001     .1026616    .2770936
             |
     company |
          2  |   250.9496   91.06615     2.76   0.022     44.94365    456.9555
          3  |  -51.44415   92.24205    -0.56   0.591    -260.1102    157.2219
          4  |   169.3784   140.3623     1.21   0.258    -148.1432       486.9
          5  |   232.7314   158.1644     1.47   0.175    -125.0613    590.5242
          6  |    190.568    150.901     1.26   0.238    -150.7937    531.9296
          7  |   234.0336   161.3135     1.45   0.181    -130.8829    598.9502
          8  |   130.3806   141.2221     0.92   0.380    -189.0859    449.8472
          9  |   193.4162   154.2249     1.25   0.241    -155.4648    542.2973
         10  |   204.4981   164.3543     1.24   0.245    -167.2971    576.2933
             |
       _cons |  -214.8799   167.0886    -1.29   0.231    -592.8605    163.1007
------------------------------------------------------------------------------

. testparm i.company

 ( 1)  2.company = 0
 ( 2)  3.company = 0
 ( 3)  4.company = 0
 ( 4)  5.company = 0
 ( 5)  6.company = 0
 ( 6)  7.company = 0
 ( 7)  8.company = 0
 ( 8)  9.company = 0
 ( 9)  10.company = 0
       Constraint 1 dropped
       Constraint 2 dropped
       Constraint 3 dropped
       Constraint 4 dropped
       Constraint 5 dropped
       Constraint 6 dropped
       Constraint 7 dropped
       Constraint 8 dropped

       F(  1,     9) =    1.55
            Prob > F =    0.2

Here you have m=1 and G-1= 9. What happens is the first 8 constraints of the test are dropped. So what you actually are testing is that only the dummy of the 10th firm is equal to zero... which is clearly not what you want.

Code:

. testparm 10.company

 ( 1)  10.company = 0

       F(  1,     9) =    1.55
            Prob > F =    0.2448

Last edited by Andrew Musau; 21 Sep 2015, 10:28.

Comment

Elena Paltseva

Join Date: Sep 2015

Posts: 8
#8

21 Sep 2015, 10:28

Sounds like a good way to proceed, thanks a lot!
Comment

Announcement