Robust / Clustered standard errors

Kevin Musoni

Join Date: Jul 2016

Posts: 30
#1

Robust / Clustered standard errors

14 Oct 2021, 23:56

Hello together,

I have a question regarding the application of standard errors in case of heteroskedasticity and autocorrelation. I conduct three different regressions [(1), (2) and (3)]. All using the same dependent variable, but with variations in control variables (dummies, interactions etc.). For all three, I identify heteroskedasticity, therefore I will use robust standard errors. Furthermore, Reg (2) contains autocorrelation, which is not the case for (1) and (3). Thus, clustered errors for (2) are definitely necessary.

My question is, is it better to use robust standard errors for (1) and (3) and clustered for (2)? Or is it mathematically applicable to use clustered on all three, even though autocorrelation does not affect each model? I would be happy if a sensible answer could be shortly explained, so I can expand my knowledge.

Apreciate your help and thanks in advance.

/Kevin
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#2

15 Oct 2021, 01:02

Kevin:
it depends on what you're doing.
1) if you have panel data (-xtreg-) both -robust- and -vce(cluster clusterid)- options takes both heteroskedastcity and/or autocorrelation into account;
2) if you are using OLS (regress), -robust- takes heteroskedasticity only into account, whereas -vce(cluster clusterid)- considers autocorrelation only.
That said, if you detect both heterokedasticiy and autocorrelation in epsilon under OLS, a relevant source for (and over and above) Stata users (
https://www.stata.com/bookstore/environmental-econometrics-using-stata
) suggests -newey- (HAC standard errors) or -vce(cluster clusterid)- to deal with this situation (see pages
https://www.stata.com/bookstore/environmental-econometrics-using-stata, pages
26-30).

Kind regards,
Carlo
(Stata 19.0)
Comment
Kevin Musoni

Join Date: Jul 2016

Posts: 30
#3

15 Oct 2021, 01:39

Originally posted by Carlo Lazzaro View Post

Kevin:
it depends on what you're doing.
1) if you have panel data (-xtreg-) both -robust- and -vce(cluster clusterid)- options takes both heteroskedastcity and/or autocorrelation into account;
2) if you are using OLS (regress), -robust- takes heteroskedasticity only into account, whereas -vce(cluster clusterid)- considers autocorrelation only.
That said, if you detect both heterokedasticiy and autocorrelation in epsilon under OLS, a relevant source for (and over and above) Stata users (
https://www.stata.com/bookstore/environmental-econometrics-using-stata
) suggests -newey- (HAC standard errors) or -vce(cluster clusterid)- to deal with this situation (see pages
https://www.stata.com/bookstore/environmental-econometrics-using-stata, pages
26-30).

Hi Carlo,

thanks, understood.

I use -xtreg, fe-. Can I use robust or cluster as I like? Or is there an advantage of using one over the other?
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17712

15 Oct 2021, 02:04

Kevin:
both options do the very same job, as they invoke cluster-robust standard errors:

Code:

. use "https://www.stata-press.com/data/r16/nlswork.dta"
(National Longitudinal Survey.  Young Women 14-26 years of age in 1968)

. xtreg ln_wage c.age##c.age, fe robust

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1087                                         min =          1
     between = 0.1006                                         avg =        6.1
     overall = 0.0865                                         max =         15

                                                F(2,4709)         =     507.42
corr(u_i, Xb)  = 0.0440                         Prob > F          =     0.0000

                             (Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0539076    .004307    12.52   0.000     .0454638    .0623515
             |
 c.age#c.age |  -.0005973    .000072    -8.30   0.000    -.0007384   -.0004562
             |
       _cons |    .639913   .0624195    10.25   0.000     .5175415    .7622845
-------------+----------------------------------------------------------------
     sigma_u |   .4039153
     sigma_e |  .30245467
         rho |  .64073314   (fraction of variance due to u_i)
------------------------------------------------------------------------------


. xtreg ln_wage c.age##c.age, fe vce(cluster idcode)

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1087                                         min =          1
     between = 0.1006                                         avg =        6.1
     overall = 0.0865                                         max =         15

                                                F(2,4709)         =     507.42
corr(u_i, Xb)  = 0.0440                         Prob > F          =     0.0000

                             (Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0539076    .004307    12.52   0.000     .0454638    .0623515
             |
 c.age#c.age |  -.0005973    .000072    -8.30   0.000    -.0007384   -.0004562
             |
       _cons |    .639913   .0624195    10.25   0.000     .5175415    .7622845
-------------+----------------------------------------------------------------
     sigma_u |   .4039153
     sigma_e |  .30245467
         rho |  .64073314   (fraction of variance due to u_i)
------------------------------------------------------------------------------

.

Differences between the two option arise if your -clusterid- differs from your -panelid- (but it is seldom the case).

Last edited by Carlo Lazzaro; 15 Oct 2021, 02:15.

Kind regards,
Carlo
(Stata 19.0)

Comment

Kevin Musoni

Join Date: Jul 2016
Posts: 30

15 Oct 2021, 02:08

Originally posted by Carlo Lazzaro View Post

Kevin:
both options do the very same job, as they invoke cluster-robust standard errors:

Code:

. use "https://www.stata-press.com/data/r16/nlswork.dta"
(National Longitudinal Survey. Young Women 14-26 years of age in 1968)

. xtreg ln_wage c.age##c.age, fe robust

Fixed-effects (within) regression Number of obs = 28,510
Group variable: idcode Number of groups = 4,710

R-sq: Obs per group:
within = 0.1087 min = 1
between = 0.1006 avg = 6.1
overall = 0.0865 max = 15

F(2,4709) = 507.42
corr(u_i, Xb) = 0.0440 Prob > F = 0.0000

(Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
| Robust
ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .0539076 .004307 12.52 0.000 .0454638 .0623515
|
c.age#c.age | -.0005973 .000072 -8.30 0.000 -.0007384 -.0004562
|
_cons | .639913 .0624195 10.25 0.000 .5175415 .7622845
-------------+----------------------------------------------------------------
sigma_u | .4039153
sigma_e | .30245467
rho | .64073314 (fraction of variance due to u_i)
------------------------------------------------------------------------------


. xtreg ln_wage c.age##c.age, fe vce(cluster idcode)

Fixed-effects (within) regression Number of obs = 28,510
Group variable: idcode Number of groups = 4,710

R-sq: Obs per group:
within = 0.1087 min = 1
between = 0.1006 avg = 6.1
overall = 0.0865 max = 15

F(2,4709) = 507.42
corr(u_i, Xb) = 0.0440 Prob > F = 0.0000

(Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
| Robust
ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .0539076 .004307 12.52 0.000 .0454638 .0623515
|
c.age#c.age | -.0005973 .000072 -8.30 0.000 -.0007384 -.0004562
|
_cons | .639913 .0624195 10.25 0.000 .5175415 .7622845
-------------+----------------------------------------------------------------
sigma_u | .4039153
sigma_e | .30245467
rho | .64073314 (fraction of variance due to u_i)
------------------------------------------------------------------------------

.

Difference between the two option arise if your -clusterid- differs from your -panelid- (but it is seldom the case).

Perfetto!

Thank you as always sir!

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#6

15 Oct 2021, 02:14

Kevin:
thanks, but Carlo is enough!

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement