Vce(robust) or vce(cluster id) when autocorrelation and heteroskedasticity

Niels Pranger

Join Date: May 2020

Posts: 3
#1

Vce(robust) or vce(cluster id) when autocorrelation and heteroskedasticity

26 May 2020, 09:21

Dear users,

I have been reading this forum and other places for some time now, but can't find whether vce(robust) or vce(cluster id) is preferred in the case where both autocorrelation and heteroskedasticity exists. My dataset conists of states in the United States, with quarterly observations from 2000-2019 (80 observations per state). It matters a lot in my research, since some coefficients are not significant anymore when using vce(cluster id). Thanks a lot in advance.

Kind regards,
Niels

Last edited by Niels Pranger; 26 May 2020, 09:23.
Tags: None

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17853

26 May 2020, 09:33

Niels:
welcome to this forum.
The two options for non-default standard errors give back exactly the same results in -xtreg-.
That said, the main issue with your reserch seems to rest on your striving for statistical significance: what's wrong if your coefficients do not reach p<0.05? Aren't they equally informative?
Eventually, if your data do suffer from heteroskedasticity and/or autocorrelation, using default standard errors make your results biased, which is far worse than being non-significant.

The following toy-example migh be helpful (warning: actually, I did not test for heteroskedasticity and/or autocorrelation before invoking non-default standard errors, as the aim of the toy-example is just to show that both options do the very same job under -xtreg-):

Code:

use "https://www.stata-press.com/data/r16/nlswork.dta"
. xtreg ln_wage age, re vce(robust)

Random-effects GLS regression                   Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1026                                         min =          1
     between = 0.0877                                         avg =        6.1
     overall = 0.0774                                         max =         15

                                                Wald chi2(1)      =    1064.91
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

                             (Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0185667    .000569    32.63   0.000     .0174516    .0196819
       _cons |   1.120439   .0159154    70.40   0.000     1.089245    1.151632
-------------+----------------------------------------------------------------
     sigma_u |  .36972456
     sigma_e |  .30349389
         rho |  .59743613   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. xtreg ln_wage age, re vce(cluster idcode)

Random-effects GLS regression                   Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1026                                         min =          1
     between = 0.0877                                         avg =        6.1
     overall = 0.0774                                         max =         15

                                                Wald chi2(1)      =    1064.91
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

                             (Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0185667    .000569    32.63   0.000     .0174516    .0196819
       _cons |   1.120439   .0159154    70.40   0.000     1.089245    1.151632
-------------+----------------------------------------------------------------
     sigma_u |  .36972456
     sigma_e |  .30349389
         rho |  .59743613   (fraction of variance due to u_i)
------------------------------------------------------------------------------

.

Last edited by Carlo Lazzaro; 26 May 2020, 09:40.

Kind regards,
Carlo
(Stata 19.0)

Comment

Niels Pranger

Join Date: May 2020

Posts: 3
#3

26 May 2020, 10:07

Dear Carlo,

Thanks for your quick response, I really appreciate it. Concerning statistical (in)significance you are right. I should've given a bit more background. My supervisor wants me to use both pooled OLS (reg) and xtreg. Whereas for xtreg both non-default standard errors give the same results, this is not the case for the pooled OLS regression. It matters a great deal for my results, as a few important coefficients go from statistically significant to statistically insignificant. I'm not looking for statistically significant results, but I'm looking for the right standard errors. And that has a great influence on my results and eventually on the conclusion.

To come down to my question, I have tested and found evidence for heteroskedasticity and autocorrelation, and I'm not sure whether I should use the vce(robust) option or vce(cluster id) option. Would vce(robust) suffice to account for both heteroskedasticity and autocorrelation, or is vce(cluster id) needed?

Last edited by Niels Pranger; 26 May 2020, 10:31.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17853
#4

26 May 2020, 12:34

Niels:
under -regress-, -robust- and -cluster()- options actually give back different results, because they are different beasts.
That said, if you have detected bioth heteroskedasticity and autocorrelation after -regress-, my advice is to re-run the model with clustered standard errors.
That said, if you run a pooled OLS, you should cluster your standard errors anyhow, as your observations are not independent.
Eventually, in order to get more helpful replies, I would recommend you to be clearer about -re- and -fe- specifications under -xtreg-. Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment
Niels Pranger

Join Date: May 2020

Posts: 3
#5

26 May 2020, 13:09

Carlo,

Thanks. Concerning the random effects model; I use RE because I have time-invariant dummies, which makes it not possible to use FE. My most simple regression looks the following:

xtreg Log_HPIPO2SA Boom_Hypo Bust_Hypo NonRecourse NonRecourseBoom NonRecourseBurst InBetween InBetweenBoom InBetweenBurst, re vce(cluster state). I put the output in the attachment, I couldn't find how I could do it differently. Sorry for that.

All variables are dummies. I try to explain the house price index growth.

Attached Files
Comment

Announcement