'xtreg' or 'areg' fixed effects command with 'robust' option: which standard errors are supposed to be more appropriate?

Antonio Sicari

Join Date: Aug 2015

Posts: 12
#1

'xtreg' or 'areg' fixed effects command with 'robust' option: which standard errors are supposed to be more appropriate?

10 Sep 2015, 07:46

Hello everybody.

I am currently running a fixed effects regression on an unbalanced short panel data of 129 companies for 6 years. I am a bit confused for the different results produced by the two commands 'xtreg' and 'areg'. More in details, these are the codes employed:

Code:

xtreg depvar varlist i.year, fe robust

Code:

areg depvar varlist i.year, absorb(Companyname) robust

The results are different in terms of R-squared and standard errors. Although coefficients are the same in both cases, the 'xtreg' command generates smaller R-sq (below traditional threshold of 10%) and larger standard errors than 'areg'. Therefore, which one should be more accurate? And, in addtion, may 'areg' command with 'robust' option be able to control for both heteroskedasticity and autocorrelation as well?

Thank you in advance for your answers.
Tags: None
FernandoRios

Join Date: Apr 2014

Posts: 2467
#2

10 Sep 2015, 08:24

Hi Antonio,
The fact of the matter is that both xtreg,fe and areg are identifical in point estimates because they do the same. Control for the individual fixed effect, without estimating it. The different in the R2 comes from comparing the traditional goodness of fit of the model, which would include the fixed effect, vs comparing the goodness of fit of the model, after excluding the impact of the fixed effect (which is the within R2).
Neither will give you a more "accurate" result. But because of the nature of fixed effects in panel data, the standard deviations provide for the xtreg, fe command, correct for the fact that the number of parameters can increase with the number of observations.
Robust option may correct for heteroskedasticity, while clustering might correct for some correlation, but autocorrelation is a different type of problem for which you have to look into other methods such as xtregar.
Hope this helps
Fernando
1 like
Comment
Antonio Sicari

Join Date: Aug 2015

Posts: 12
#3

10 Sep 2015, 11:31

Fernando, thank you very much or your answer.

As you mentioned above, 'robust' option may control for heteroskedasticity, whereas clustering could correct for serial correlation. However, when using 'xtreg' or 'areg' commands, options 'robust' and 'cluster(clusterid)' produce the same standard errors, that's why I thought I could use the first one just for convenience. Is that right?
Comment
Alfonso Sánchez-Peñalver

Join Date: Mar 2014

Posts: 432
#4

10 Sep 2015, 19:56

I don't know if you were aware of a similar previous discussion in this forum. I hope this throws some more light to your question.

areg and xtreg,fe with cluster option: which one is better? - Statalist

http://www.statalist.org

Dear all, I am doing a FE regression with year and firm fixed effects and tired both: -xtreg, fe vce(cluster ID) -areg, absorb(ID) vce(cluster ID) Both

Alfonso Sanchez-Penalver
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#5

10 Sep 2015, 22:46

Antonio:
as others pointed out, for-xt- (but not for -regression-, for instance), -vce(robust)- and vce(cluster clusterid)- are interchangeable.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment

Prathvajeeth Rajmohan

Join Date: Aug 2017
Posts: 70

31 Aug 2017, 08:07

Originally posted by Carlo Lazzaro View Post

Antonio:
as others pointed out, for-xt- (but not for -regression-, for instance), -vce(robust)- and vce(cluster clusterid)- are interchangeable.

Hi Carlo, I wanted to ask:

this is my ols model:

Code:

regress lntobinsq lnassets FXDerivatives10 IRDerivatives10  bookleverage_w1 roa_w1 cratio_w1 rnd_rev_w1 cash_to_totalassets_w1 div_yield_w1 year2016  if inlist(year,2015,2016)

      Source |       SS           df       MS      Number of obs   =       586
-------------+----------------------------------   F(10, 575)      =    116.37
       Model |  121.234329        10  12.1234329   Prob > F        =    0.0000
    Residual |  59.9060682       575  .104184467   R-squared       =    0.6693
-------------+----------------------------------   Adj R-squared   =    0.6635
       Total |  181.140398       585  .309641705   Root MSE        =    .32278

----------------------------------------------------------------------------------------
             lntobinsq |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-----------------------+----------------------------------------------------------------
              lnassets |  -.0252197   .0097835    -2.58   0.010    -.0444355   -.0060039
       FXDerivatives10 |   .0640463   .0310722     2.06   0.040     .0030175    .1250752
       IRDerivatives10 |  -.0676639   .0342443    -1.98   0.049    -.1349231   -.0004047
       bookleverage_w1 |   .1292828   .0686172     1.88   0.060    -.0054881    .2640538
                roa_w1 |   .0808277   .0030718    26.31   0.000     .0747944    .0868611
             cratio_w1 |  -.0515988    .012241    -4.22   0.000    -.0756413   -.0275564
            rnd_rev_w1 |   .0157829   .0027658     5.71   0.000     .0103506    .0212152
cash_to_totalassets_w1 |   .4436779    .177796     2.50   0.013      .094469    .7928868
          div_yield_w1 |   -.057862   .0064511    -8.97   0.000    -.0705326   -.0451914
              year2016 |  -.0090626    .026751    -0.34   0.735    -.0616043     .043479
                 _cons |   .3793741   .0789911     4.80   0.000     .2242279    .5345203
----------------------------------------------------------------------------------------

And this is the same model with industry dummies:

Code:

. regress lntobinsq lnassets FXDerivatives10 IRDerivatives10  bookleverage_w1 roa_w1 cratio_w1 rnd_rev_w1 cash_to_totalassets_w1 div_yield_w1 year2016 ind2* if inlist(year,2015,2016)
note: ind23 omitted because of collinearity
note: ind240 omitted because of collinearity
note: ind249 omitted because of collinearity

      Source |       SS           df       MS      Number of obs   =       586
-------------+----------------------------------   F(58, 527)      =     28.55
       Model |  137.411853        58  2.36916987   Prob > F        =    0.0000
    Residual |  43.7285448       527  .082976366   R-squared       =    0.7586
-------------+----------------------------------   Adj R-squared   =    0.7320
       Total |  181.140398       585  .309641705   Root MSE        =    .28806

----------------------------------------------------------------------------------------
             lntobinsq |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-----------------------+----------------------------------------------------------------
              lnassets |  -.0177746   .0102313    -1.74   0.083    -.0378737    .0023245
       FXDerivatives10 |   .0067225   .0331022     0.20   0.839     -.058306    .0717509
       IRDerivatives10 |  -.0194092   .0336402    -0.58   0.564    -.0854945    .0466762
       bookleverage_w1 |   .0040885   .0701131     0.06   0.954    -.1336471     .141824
                roa_w1 |     .07362   .0031229    23.57   0.000     .0674851    .0797548
             cratio_w1 |  -.0482924   .0127153    -3.80   0.000    -.0732714   -.0233135
            rnd_rev_w1 |   .0079494   .0029869     2.66   0.008     .0020816    .0138172
cash_to_totalassets_w1 |   .3616529    .175847     2.06   0.040     .0162058       .7071
          div_yield_w1 |  -.0516211   .0063148    -8.17   0.000    -.0640264   -.0392157
              year2016 |  -.0152311    .023984    -0.64   0.526    -.0623472    .0318849
                 
                 _cons |  -.4663429   .2172607    -2.15   0.032    -.8931463   -.0395395
----------------------------------------------------------------------------------------

Sorry for asking all these questions but I'm new to stata/econometrics in general and I was wondering, if I wanted to use robust standard errors with each model would it be correct to just use the robust option after each of these commands ie.

for the OLS:

Code:


. regress lntobinsq lnassets FXDerivatives10 IRDerivatives10  bookleverage_w1 roa_w1 cratio_w1 rnd_rev_w1 cash_to_totalassets_w1 div_yield_w1 year2016  if inlist(year,2015,2016), robust

Linear regression                               Number of obs     =        586
                                                F(10, 575)        =      55.35
                                                Prob > F          =     0.0000
                                                R-squared         =     0.6693
                                                Root MSE          =     .32278

----------------------------------------------------------------------------------------
                       |               Robust
             lntobinsq |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-----------------------+----------------------------------------------------------------
              lnassets |  -.0252197   .0090013    -2.80   0.005    -.0428991   -.0075403
       FXDerivatives10 |   .0640463   .0312396     2.05   0.041     .0026887     .125404
       IRDerivatives10 |  -.0676639   .0318751    -2.12   0.034    -.1302698    -.005058
       bookleverage_w1 |   .1292828   .0607934     2.13   0.034     .0098787     .248687
                roa_w1 |   .0808277   .0057895    13.96   0.000     .0694566    .0921989
             cratio_w1 |  -.0515988   .0128861    -4.00   0.000    -.0769084   -.0262893
            rnd_rev_w1 |   .0157829   .0035511     4.44   0.000     .0088081    .0227577
cash_to_totalassets_w1 |   .4436779   .1617669     2.74   0.006     .1259519    .7614039
          div_yield_w1 |   -.057862   .0095507    -6.06   0.000    -.0766205   -.0391035
              year2016 |  -.0090626   .0262189    -0.35   0.730    -.0605591    .0424338
                 _cons |   .3793741   .0879458     4.31   0.000       .20664    .5521082
----------------------------------------------------------------------------------------

and for the ols with industry dummies:

Code:

. regress lntobinsq lnassets FXDerivatives10 IRDerivatives10  bookleverage_w1 roa_w1 cratio_w1 rnd_rev_w1 cash_to_totalassets_w1 div_yield_w1 year2016 ind2* if inlist(year,2015,2016), robust
note: ind23 omitted because of collinearity
note: ind240 omitted because of collinearity
note: ind249 omitted because of collinearity

Linear regression                               Number of obs     =        586
                                                F(57, 527)        =          .
                                                Prob > F          =          .
                                                R-squared         =     0.7586
                                                Root MSE          =     .28806

----------------------------------------------------------------------------------------
                       |               Robust
             lntobinsq |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-----------------------+----------------------------------------------------------------
              lnassets |  -.0177746   .0103867    -1.71   0.088     -.038179    .0026299
       FXDerivatives10 |   .0067225   .0319361     0.21   0.833    -.0560153    .0694602
       IRDerivatives10 |  -.0194092   .0305851    -0.63   0.526    -.0794928    .0406744
       bookleverage_w1 |   .0040885   .0638413     0.06   0.949    -.1213263    .1295032
                roa_w1 |     .07362   .0069632    10.57   0.000      .059941    .0872989
             cratio_w1 |  -.0482924   .0140862    -3.43   0.001    -.0759643   -.0206205
            rnd_rev_w1 |   .0079494   .0031605     2.52   0.012     .0017406    .0141581
cash_to_totalassets_w1 |   .3616529   .1894926     1.91   0.057    -.0106007    .7339065
          div_yield_w1 |  -.0516211   .0091091    -5.67   0.000    -.0695157   -.0337265
              year2016 |  -.0152311   .0233949    -0.65   0.515    -.0611899    .0307276
                
                 _cons |  -.4663429   .1755054    -2.66   0.008    -.8111189   -.1215669
----------------------------------------------------------------------------------------

.

or does it get any more complicated ie do we have to use different commands for each, or would it correct to add "robust" to the command to each as I have done. Thanks

sorry for the odd question, just got confused when I read about clustering etc.

Thanks

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#7

31 Aug 2017, 08:14

Prash:
two remarks about your post:
- if you're dealing with panel data, -xtreg- usually outperforms -regress-;
- under -regress- the -robust- option corrects for heteroskedasticty only (and not for serial correlation). Again, if you've panel data and want to go -regress- (when the F-test of joint individual effects fails to reach statistical significance), you should -cluster- your standard errors on -panelid-

Kind regards,
Carlo
(Stata 19.0)
Comment
Prathvajeeth Rajmohan

Join Date: Aug 2017

Posts: 70
#8

31 Aug 2017, 09:08

Hi Carlo thank you, I completely understand that fe is obviously better suited for panel, its for my project so I have to use all 3 models and show transition of my method ie the OLS, OLS with dummies and then Fe (its just in my final model that I argue that FE is obviously best suited/ explains the onobersved hetrogenity) , in my project I just want to show that I have used robust standard errors and so I was just asking if the way I've implemented the robust errors for each are correct purely in terms of correcting for hetro.

-so for the ols and ols industry dummies I have just used "robust" after the original command as shown above, is the right? Im just concerned its completely different for the industry ols?

for the firm fixed effects model,

Code:

xtreg lntobinsq lnassets FXDerivatives10 IRDerivatives10 bookleverage_w1 roa_w1 cratio_w1 rnd_rev_w1 cash_to_totalassets_w1 div_yield_w1 year2016 if inlist(year,2015,2016), fe

to:

Code:

xtreg lntobinsq lnassets FXDerivatives10 IRDerivatives10 bookleverage_w1 roa_w1 cratio_w1 rnd_rev_w1 cash_to_totalassets_w1 div_yield_w1 year2016 if inlist(year,2015,2016), fe robust

- I presume this is defintely correct?

Thanks so much for all the help Carlo, you must be sick of me by this point aha.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#9

31 Aug 2017, 09:34

Prash:
no sick at all, you're welcome.
I fail to get if what you call transition from OLS to panel data regression is one of your research goals or is simply something you want to pursue.
If the latter were the case, I would focus on -xtreg-only.
About your second question, you're correct:
- since you have a large N, small T panel dataset, -robust- (or -cluster-) option accomodates for both heteroskedasticity and autocorrelation.

As an aside, I am a bit doubtful about your theretical justification aimed at supporting -xtreg, fe- (didn't you perform a -hausman- test?): you're correct in stating that -fe- removes both observed and unobserved heterogeneity, but that holds for time-invariant predictors only. If a part of unobserved heterogeneity rests in some time-varying predictor that you did not nclude among the set of independent variables (and so lurks in the residuals), -fe- cannot do anything in that respect.

Kind regards,
Carlo
(Stata 19.0)
Comment
Prathvajeeth Rajmohan

Join Date: Aug 2017

Posts: 70
#10

02 Sep 2017, 18:05

Originally posted by Carlo Lazzaro View Post

Prash:
two remarks about your post:
- if you're dealing with panel data, -xtreg- usually outperforms -regress-;
- under -regress- the -robust- option corrects for heteroskedasticty only (and not for serial correlation). Again, if you've panel data and want to go -regress- (when the F-test of joint individual effects fails to reach statistical significance), you should -cluster- your standard errors on -panelid-

Hi Carlo just to clarify what you're saying here:
1) in panel data: for the -regress- , the robust options corrects for hetroskedasticity?

2) in panel data: for the -regress- , the cluster(firmrid) options corrects for both hetro and serial correlation ( with the serial correlation treated as each firm observations are not independent over time is this what you mean?)

thanks, mostly important want to confirm that number 1) is correct. Cheers.

Last edited by Prathvajeeth Rajmohan; 02 Sep 2017, 18:17.
Comment
Philip Gigliotti

Join Date: Nov 2016

Posts: 118
#11

02 Sep 2017, 18:34

The correct code for fixed effects with robust standard errors clustered by panelid is:

Code:

xtreg y x, fe vce(cluster panelid)

The following code:

Code:

xtreg y x, fe robust

Only calculates robust standard errors, but does not cluster by panelid.

The following code

Code:

xtreg y x, fe vce(robust)

Is equivalent to the first, correct command. This command also produces robust standard errors, clustered by panelid.

Robust standard errors control for heteroskedasticity, clustering controls for autocorrelation .
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17707

#12

03 Sep 2017, 02:33

Prash:
1) said that -regress- is (usually) not the proper statistical tool for dealing with panel data, the -robust- option in -regress- accomodates for heteroskedasticity only;
2) under -xtreg-, the robust/cluster option accomodates for both heteroskedasticity and/or serial correlation in the idiosyncratic error.
3) About the correct code(s):

Code:

. use http://www.stata-press.com/data/r14/nlswork.dta
(National Longitudinal Survey.  Young Women 14-26 years of age in 1968)

. xtreg ln_wage tenure i.race, vce(robust)

Random-effects GLS regression                   Number of obs     =     28,101
Group variable: idcode                          Number of groups  =      4,699

R-sq:                                           Obs per group:
     within  = 0.0972                                         min =          1
     between = 0.2079                                         avg =        6.0
     overall = 0.1569                                         max =         15

                                                Wald chi2(3)      =    1797.00
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

                             (Std. Err. adjusted for 4,699 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      tenure |   .0376405   .0009364    40.20   0.000     .0358052    .0394758
             |
        race |
      black  |  -.1345322   .0120266   -11.19   0.000    -.1581039   -.1109605
      other  |   .1039944    .062132     1.67   0.094     -.017782    .2257708
             |
       _cons |    1.59266   .0067239   236.86   0.000     1.579481    1.605838
-------------+----------------------------------------------------------------
     sigma_u |  .33623102
     sigma_e |  .30357621
         rho |  .55090591   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. xtreg ln_wage tenure i.race, vce(cluster idcode)

Random-effects GLS regression                   Number of obs     =     28,101
Group variable: idcode                          Number of groups  =      4,699

R-sq:                                           Obs per group:
     within  = 0.0972                                         min =          1
     between = 0.2079                                         avg =        6.0
     overall = 0.1569                                         max =         15

                                                Wald chi2(3)      =    1797.00
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

                             (Std. Err. adjusted for 4,699 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      tenure |   .0376405   .0009364    40.20   0.000     .0358052    .0394758
             |
        race |
      black  |  -.1345322   .0120266   -11.19   0.000    -.1581039   -.1109605
      other  |   .1039944    .062132     1.67   0.094     -.017782    .2257708
             |
       _cons |    1.59266   .0067239   236.86   0.000     1.579481    1.605838
-------------+----------------------------------------------------------------
     sigma_u |  .33623102
     sigma_e |  .30357621
         rho |  .55090591   (fraction of variance due to u_i)
------------------------------------------------------------------------------


xtreg ln_wage tenure i.race, robust

Random-effects GLS regression                   Number of obs     =     28,101
Group variable: idcode                          Number of groups  =      4,699

R-sq:                                           Obs per group:
     within  = 0.0972                                         min =          1
     between = 0.2079                                         avg =        6.0
     overall = 0.1569                                         max =         15

                                                Wald chi2(3)      =    1797.00
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

                             (Std. Err. adjusted for 4,699 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      tenure |   .0376405   .0009364    40.20   0.000     .0358052    .0394758
             |
        race |
      black  |  -.1345322   .0120266   -11.19   0.000    -.1581039   -.1109605
      other  |   .1039944    .062132     1.67   0.094     -.017782    .2257708
             |
       _cons |    1.59266   .0067239   236.86   0.000     1.579481    1.605838
-------------+----------------------------------------------------------------
     sigma_u |  .33623102
     sigma_e |  .30357621
         rho |  .55090591   (fraction of variance due to u_i)
------------------------------------------------------------------------------

The SEs are always the same.

See also the Technical note on this topic (-xtreg- entry, page 413, Stata 14 .pdf manual).

Kind regards,
Carlo
(Stata 19.0)

Comment

Prathvajeeth Rajmohan

Join Date: Aug 2017

Posts: 70
#13

03 Sep 2017, 04:51

Originally posted by Carlo Lazzaro View Post

Prash:
1) said that -regress- is (usually) not the proper statistical tool for dealing with panel data, the -robust- option in -regress- accomodates for heteroskedasticity only;
2) under -xtreg-, the robust/cluster option accomodates for both heteroskedasticity and/or serial correlation in the idiosyncratic error.

Hi Thanks so much completely get that for panel xtreg is preffered to regress, but putting that aside just to confirm:

1) In Panel data, when we use -regress-, the robust options does indeed deal with hetroskedasticity right?
thanks Carlo
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#14

03 Sep 2017, 04:57

Prash:
-yes: the -robust- option in -regress- deal with heteroskedasticity only.
Again, please note thet if you use -regress- with panel data and you omit to cluster the standard errors on -panelid-, your results will be biased,as you consider all the observations as independent and neglect their panel structure.

Kind regards,
Carlo
(Stata 19.0)
Comment
Prathvajeeth Rajmohan

Join Date: Aug 2017

Posts: 70
#15

03 Sep 2017, 05:08

Originally posted by Carlo Lazzaro View Post

Prash:
-yes: the -robust- option in -regress- deal with heteroskedasticity only.

Sorry to keep asking ( and putting aside that cluster is better than robust for panel data using -regress-)
when we have panel data and use regress, we can implement the robust command to simply deal with hetroskedascity right? This is fine?

sorry its just that you keep saying "the -robust- option in -regress- deal with heteroskedasticity only" and wanted to make sure this is in the context of panel data aswell

Thanks so much Carlo.
Comment

Announcement