Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • 'xtreg' or 'areg' fixed effects command with 'robust' option: which standard errors are supposed to be more appropriate?

    Hello everybody.

    I am currently running a fixed effects regression on an unbalanced short panel data of 129 companies for 6 years. I am a bit confused for the different results produced by the two commands 'xtreg' and 'areg'. More in details, these are the codes employed:

    Code:
    xtreg depvar varlist i.year, fe robust
    Code:
    areg depvar varlist i.year, absorb(Companyname) robust
    The results are different in terms of R-squared and standard errors. Although coefficients are the same in both cases, the 'xtreg' command generates smaller R-sq (below traditional threshold of 10%) and larger standard errors than 'areg'. Therefore, which one should be more accurate? And, in addtion, may 'areg' command with 'robust' option be able to control for both heteroskedasticity and autocorrelation as well?

    Thank you in advance for your answers.

  • #2
    Hi Antonio,
    The fact of the matter is that both xtreg,fe and areg are identifical in point estimates because they do the same. Control for the individual fixed effect, without estimating it. The different in the R2 comes from comparing the traditional goodness of fit of the model, which would include the fixed effect, vs comparing the goodness of fit of the model, after excluding the impact of the fixed effect (which is the within R2).
    Neither will give you a more "accurate" result. But because of the nature of fixed effects in panel data, the standard deviations provide for the xtreg, fe command, correct for the fact that the number of parameters can increase with the number of observations.
    Robust option may correct for heteroskedasticity, while clustering might correct for some correlation, but autocorrelation is a different type of problem for which you have to look into other methods such as xtregar.
    Hope this helps
    Fernando

    Comment


    • #3
      Fernando, thank you very much or your answer.

      As you mentioned above, 'robust' option may control for heteroskedasticity, whereas clustering could correct for serial correlation. However, when using 'xtreg' or 'areg' commands, options 'robust' and 'cluster(clusterid)' produce the same standard errors, that's why I thought I could use the first one just for convenience. Is that right?

      Comment


      • #4
        I don't know if you were aware of a similar previous discussion in this forum. I hope this throws some more light to your question.

        Dear all, I am doing a FE regression with year and firm fixed effects and tired both: -xtreg, fe vce(cluster ID) -areg, absorb(ID) vce(cluster ID) Both
        Alfonso Sanchez-Penalver

        Comment


        • #5
          Antonio:
          as others pointed out, for-xt- (but not for -regression-, for instance), -vce(robust)- and vce(cluster clusterid)- are interchangeable.
          Kind regards,
          Carlo
          (Stata 18.0 SE)

          Comment


          • #6
            Originally posted by Carlo Lazzaro View Post
            Antonio:
            as others pointed out, for-xt- (but not for -regression-, for instance), -vce(robust)- and vce(cluster clusterid)- are interchangeable.
            Hi Carlo, I wanted to ask:

            this is my ols model:

            Code:
            regress lntobinsq lnassets FXDerivatives10 IRDerivatives10  bookleverage_w1 roa_w1 cratio_w1 rnd_rev_w1 cash_to_totalassets_w1 div_yield_w1 year2016  if inlist(year,2015,2016)
            
                  Source |       SS           df       MS      Number of obs   =       586
            -------------+----------------------------------   F(10, 575)      =    116.37
                   Model |  121.234329        10  12.1234329   Prob > F        =    0.0000
                Residual |  59.9060682       575  .104184467   R-squared       =    0.6693
            -------------+----------------------------------   Adj R-squared   =    0.6635
                   Total |  181.140398       585  .309641705   Root MSE        =    .32278
            
            ----------------------------------------------------------------------------------------
                         lntobinsq |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -----------------------+----------------------------------------------------------------
                          lnassets |  -.0252197   .0097835    -2.58   0.010    -.0444355   -.0060039
                   FXDerivatives10 |   .0640463   .0310722     2.06   0.040     .0030175    .1250752
                   IRDerivatives10 |  -.0676639   .0342443    -1.98   0.049    -.1349231   -.0004047
                   bookleverage_w1 |   .1292828   .0686172     1.88   0.060    -.0054881    .2640538
                            roa_w1 |   .0808277   .0030718    26.31   0.000     .0747944    .0868611
                         cratio_w1 |  -.0515988    .012241    -4.22   0.000    -.0756413   -.0275564
                        rnd_rev_w1 |   .0157829   .0027658     5.71   0.000     .0103506    .0212152
            cash_to_totalassets_w1 |   .4436779    .177796     2.50   0.013      .094469    .7928868
                      div_yield_w1 |   -.057862   .0064511    -8.97   0.000    -.0705326   -.0451914
                          year2016 |  -.0090626    .026751    -0.34   0.735    -.0616043     .043479
                             _cons |   .3793741   .0789911     4.80   0.000     .2242279    .5345203
            ----------------------------------------------------------------------------------------
            And this is the same model with industry dummies:

            Code:
            . regress lntobinsq lnassets FXDerivatives10 IRDerivatives10  bookleverage_w1 roa_w1 cratio_w1 rnd_rev_w1 cash_to_totalassets_w1 div_yield_w1 year2016 ind2* if inlist(year,2015,2016)
            note: ind23 omitted because of collinearity
            note: ind240 omitted because of collinearity
            note: ind249 omitted because of collinearity
            
                  Source |       SS           df       MS      Number of obs   =       586
            -------------+----------------------------------   F(58, 527)      =     28.55
                   Model |  137.411853        58  2.36916987   Prob > F        =    0.0000
                Residual |  43.7285448       527  .082976366   R-squared       =    0.7586
            -------------+----------------------------------   Adj R-squared   =    0.7320
                   Total |  181.140398       585  .309641705   Root MSE        =    .28806
            
            ----------------------------------------------------------------------------------------
                         lntobinsq |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -----------------------+----------------------------------------------------------------
                          lnassets |  -.0177746   .0102313    -1.74   0.083    -.0378737    .0023245
                   FXDerivatives10 |   .0067225   .0331022     0.20   0.839     -.058306    .0717509
                   IRDerivatives10 |  -.0194092   .0336402    -0.58   0.564    -.0854945    .0466762
                   bookleverage_w1 |   .0040885   .0701131     0.06   0.954    -.1336471     .141824
                            roa_w1 |     .07362   .0031229    23.57   0.000     .0674851    .0797548
                         cratio_w1 |  -.0482924   .0127153    -3.80   0.000    -.0732714   -.0233135
                        rnd_rev_w1 |   .0079494   .0029869     2.66   0.008     .0020816    .0138172
            cash_to_totalassets_w1 |   .3616529    .175847     2.06   0.040     .0162058       .7071
                      div_yield_w1 |  -.0516211   .0063148    -8.17   0.000    -.0640264   -.0392157
                          year2016 |  -.0152311    .023984    -0.64   0.526    -.0623472    .0318849
                             
                             _cons |  -.4663429   .2172607    -2.15   0.032    -.8931463   -.0395395
            ----------------------------------------------------------------------------------------

            Sorry for asking all these questions but I'm new to stata/econometrics in general and I was wondering, if I wanted to use robust standard errors with each model would it be correct to just use the robust option after each of these commands ie.

            for the OLS:

            Code:
            
            . regress lntobinsq lnassets FXDerivatives10 IRDerivatives10  bookleverage_w1 roa_w1 cratio_w1 rnd_rev_w1 cash_to_totalassets_w1 div_yield_w1 year2016  if inlist(year,2015,2016), robust
            
            Linear regression                               Number of obs     =        586
                                                            F(10, 575)        =      55.35
                                                            Prob > F          =     0.0000
                                                            R-squared         =     0.6693
                                                            Root MSE          =     .32278
            
            ----------------------------------------------------------------------------------------
                                   |               Robust
                         lntobinsq |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -----------------------+----------------------------------------------------------------
                          lnassets |  -.0252197   .0090013    -2.80   0.005    -.0428991   -.0075403
                   FXDerivatives10 |   .0640463   .0312396     2.05   0.041     .0026887     .125404
                   IRDerivatives10 |  -.0676639   .0318751    -2.12   0.034    -.1302698    -.005058
                   bookleverage_w1 |   .1292828   .0607934     2.13   0.034     .0098787     .248687
                            roa_w1 |   .0808277   .0057895    13.96   0.000     .0694566    .0921989
                         cratio_w1 |  -.0515988   .0128861    -4.00   0.000    -.0769084   -.0262893
                        rnd_rev_w1 |   .0157829   .0035511     4.44   0.000     .0088081    .0227577
            cash_to_totalassets_w1 |   .4436779   .1617669     2.74   0.006     .1259519    .7614039
                      div_yield_w1 |   -.057862   .0095507    -6.06   0.000    -.0766205   -.0391035
                          year2016 |  -.0090626   .0262189    -0.35   0.730    -.0605591    .0424338
                             _cons |   .3793741   .0879458     4.31   0.000       .20664    .5521082
            ----------------------------------------------------------------------------------------
            and for the ols with industry dummies:

            Code:
            . regress lntobinsq lnassets FXDerivatives10 IRDerivatives10  bookleverage_w1 roa_w1 cratio_w1 rnd_rev_w1 cash_to_totalassets_w1 div_yield_w1 year2016 ind2* if inlist(year,2015,2016), robust
            note: ind23 omitted because of collinearity
            note: ind240 omitted because of collinearity
            note: ind249 omitted because of collinearity
            
            Linear regression                               Number of obs     =        586
                                                            F(57, 527)        =          .
                                                            Prob > F          =          .
                                                            R-squared         =     0.7586
                                                            Root MSE          =     .28806
            
            ----------------------------------------------------------------------------------------
                                   |               Robust
                         lntobinsq |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -----------------------+----------------------------------------------------------------
                          lnassets |  -.0177746   .0103867    -1.71   0.088     -.038179    .0026299
                   FXDerivatives10 |   .0067225   .0319361     0.21   0.833    -.0560153    .0694602
                   IRDerivatives10 |  -.0194092   .0305851    -0.63   0.526    -.0794928    .0406744
                   bookleverage_w1 |   .0040885   .0638413     0.06   0.949    -.1213263    .1295032
                            roa_w1 |     .07362   .0069632    10.57   0.000      .059941    .0872989
                         cratio_w1 |  -.0482924   .0140862    -3.43   0.001    -.0759643   -.0206205
                        rnd_rev_w1 |   .0079494   .0031605     2.52   0.012     .0017406    .0141581
            cash_to_totalassets_w1 |   .3616529   .1894926     1.91   0.057    -.0106007    .7339065
                      div_yield_w1 |  -.0516211   .0091091    -5.67   0.000    -.0695157   -.0337265
                          year2016 |  -.0152311   .0233949    -0.65   0.515    -.0611899    .0307276
                            
                             _cons |  -.4663429   .1755054    -2.66   0.008    -.8111189   -.1215669
            ----------------------------------------------------------------------------------------
            
            .
            or does it get any more complicated ie do we have to use different commands for each, or would it correct to add "robust" to the command to each as I have done. Thanks

            sorry for the odd question, just got confused when I read about clustering etc.

            Thanks

            Comment


            • #7
              Prash:
              two remarks about your post:
              - if you're dealing with panel data, -xtreg- usually outperforms -regress-;
              - under -regress- the -robust- option corrects for heteroskedasticty only (and not for serial correlation). Again, if you've panel data and want to go -regress- (when the F-test of joint individual effects fails to reach statistical significance), you should -cluster- your standard errors on -panelid-
              Kind regards,
              Carlo
              (Stata 18.0 SE)

              Comment


              • #8
                Hi Carlo thank you, I completely understand that fe is obviously better suited for panel, its for my project so I have to use all 3 models and show transition of my method ie the OLS, OLS with dummies and then Fe (its just in my final model that I argue that FE is obviously best suited/ explains the onobersved hetrogenity) , in my project I just want to show that I have used robust standard errors and so I was just asking if the way I've implemented the robust errors for each are correct purely in terms of correcting for hetro.

                -so for the ols and ols industry dummies I have just used "robust" after the original command as shown above, is the right? Im just concerned its completely different for the industry ols?

                for the firm fixed effects model,

                Code:
                 xtreg lntobinsq lnassets FXDerivatives10 IRDerivatives10  bookleverage_w1 roa_w1 cratio_w1 rnd_rev_w1 cash_to_totalassets_w1 div_yield_w1 year2016  if inlist(year,2015,2016), fe
                to:

                Code:
                 xtreg lntobinsq lnassets FXDerivatives10 IRDerivatives10  bookleverage_w1 roa_w1 cratio_w1 rnd_rev_w1 cash_to_totalassets_w1 div_yield_w1 year2016  if inlist(year,2015,2016), fe robust
                - I presume this is defintely correct?

                Thanks so much for all the help Carlo, you must be sick of me by this point aha.




                Comment


                • #9
                  Prash:
                  no sick at all, you're welcome.
                  I fail to get if what you call transition from OLS to panel data regression is one of your research goals or is simply something you want to pursue.
                  If the latter were the case, I would focus on -xtreg-only.
                  About your second question, you're correct:
                  - since you have a large N, small T panel dataset, -robust- (or -cluster-) option accomodates for both heteroskedasticity and autocorrelation.

                  As an aside, I am a bit doubtful about your theretical justification aimed at supporting -xtreg, fe- (didn't you perform a -hausman- test?): you're correct in stating that -fe- removes both observed and unobserved heterogeneity, but that holds for time-invariant predictors only. If a part of unobserved heterogeneity rests in some time-varying predictor that you did not nclude among the set of independent variables (and so lurks in the residuals), -fe- cannot do anything in that respect.
                  Kind regards,
                  Carlo
                  (Stata 18.0 SE)

                  Comment


                  • #10
                    Originally posted by Carlo Lazzaro View Post
                    Prash:
                    two remarks about your post:
                    - if you're dealing with panel data, -xtreg- usually outperforms -regress-;
                    - under -regress- the -robust- option corrects for heteroskedasticty only (and not for serial correlation). Again, if you've panel data and want to go -regress- (when the F-test of joint individual effects fails to reach statistical significance), you should -cluster- your standard errors on -panelid-
                    Hi Carlo just to clarify what you're saying here:
                    1) in panel data: for the -regress- , the robust options corrects for hetroskedasticity?

                    2) in panel data: for the -regress- , the cluster(firmrid) options corrects for both hetro and serial correlation ( with the serial correlation treated as each firm observations are not independent over time is this what you mean?)

                    thanks, mostly important want to confirm that number 1) is correct. Cheers.


                    Last edited by Prathvajeeth Rajmohan; 02 Sep 2017, 18:17.

                    Comment


                    • #11
                      The correct code for fixed effects with robust standard errors clustered by panelid is:

                      Code:
                      xtreg y x, fe vce(cluster panelid)
                      The following code:

                      Code:
                      xtreg y x, fe robust
                      Only calculates robust standard errors, but does not cluster by panelid.

                      The following code

                      Code:
                      xtreg y x, fe vce(robust)
                      Is equivalent to the first, correct command. This command also produces robust standard errors, clustered by panelid.

                      Robust standard errors control for heteroskedasticity, clustering controls for autocorrelation .

                      Comment


                      • #12
                        Prash:
                        1) said that -regress- is (usually) not the proper statistical tool for dealing with panel data, the -robust- option in -regress- accomodates for heteroskedasticity only;
                        2) under -xtreg-, the robust/cluster option accomodates for both heteroskedasticity and/or serial correlation in the idiosyncratic error.
                        3) About the correct code(s):
                        Code:
                        . use http://www.stata-press.com/data/r14/nlswork.dta
                        (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
                        
                        . xtreg ln_wage tenure i.race, vce(robust)
                        
                        Random-effects GLS regression                   Number of obs     =     28,101
                        Group variable: idcode                          Number of groups  =      4,699
                        
                        R-sq:                                           Obs per group:
                             within  = 0.0972                                         min =          1
                             between = 0.2079                                         avg =        6.0
                             overall = 0.1569                                         max =         15
                        
                                                                        Wald chi2(3)      =    1797.00
                        corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
                        
                                                     (Std. Err. adjusted for 4,699 clusters in idcode)
                        ------------------------------------------------------------------------------
                                     |               Robust
                             ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                        -------------+----------------------------------------------------------------
                              tenure |   .0376405   .0009364    40.20   0.000     .0358052    .0394758
                                     |
                                race |
                              black  |  -.1345322   .0120266   -11.19   0.000    -.1581039   -.1109605
                              other  |   .1039944    .062132     1.67   0.094     -.017782    .2257708
                                     |
                               _cons |    1.59266   .0067239   236.86   0.000     1.579481    1.605838
                        -------------+----------------------------------------------------------------
                             sigma_u |  .33623102
                             sigma_e |  .30357621
                                 rho |  .55090591   (fraction of variance due to u_i)
                        ------------------------------------------------------------------------------
                        
                        . xtreg ln_wage tenure i.race, vce(cluster idcode)
                        
                        Random-effects GLS regression                   Number of obs     =     28,101
                        Group variable: idcode                          Number of groups  =      4,699
                        
                        R-sq:                                           Obs per group:
                             within  = 0.0972                                         min =          1
                             between = 0.2079                                         avg =        6.0
                             overall = 0.1569                                         max =         15
                        
                                                                        Wald chi2(3)      =    1797.00
                        corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
                        
                                                     (Std. Err. adjusted for 4,699 clusters in idcode)
                        ------------------------------------------------------------------------------
                                     |               Robust
                             ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                        -------------+----------------------------------------------------------------
                              tenure |   .0376405   .0009364    40.20   0.000     .0358052    .0394758
                                     |
                                race |
                              black  |  -.1345322   .0120266   -11.19   0.000    -.1581039   -.1109605
                              other  |   .1039944    .062132     1.67   0.094     -.017782    .2257708
                                     |
                               _cons |    1.59266   .0067239   236.86   0.000     1.579481    1.605838
                        -------------+----------------------------------------------------------------
                             sigma_u |  .33623102
                             sigma_e |  .30357621
                                 rho |  .55090591   (fraction of variance due to u_i)
                        ------------------------------------------------------------------------------
                        
                        
                        xtreg ln_wage tenure i.race, robust
                        
                        Random-effects GLS regression                   Number of obs     =     28,101
                        Group variable: idcode                          Number of groups  =      4,699
                        
                        R-sq:                                           Obs per group:
                             within  = 0.0972                                         min =          1
                             between = 0.2079                                         avg =        6.0
                             overall = 0.1569                                         max =         15
                        
                                                                        Wald chi2(3)      =    1797.00
                        corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
                        
                                                     (Std. Err. adjusted for 4,699 clusters in idcode)
                        ------------------------------------------------------------------------------
                                     |               Robust
                             ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                        -------------+----------------------------------------------------------------
                              tenure |   .0376405   .0009364    40.20   0.000     .0358052    .0394758
                                     |
                                race |
                              black  |  -.1345322   .0120266   -11.19   0.000    -.1581039   -.1109605
                              other  |   .1039944    .062132     1.67   0.094     -.017782    .2257708
                                     |
                               _cons |    1.59266   .0067239   236.86   0.000     1.579481    1.605838
                        -------------+----------------------------------------------------------------
                             sigma_u |  .33623102
                             sigma_e |  .30357621
                                 rho |  .55090591   (fraction of variance due to u_i)
                        ------------------------------------------------------------------------------
                        The SEs are always the same.

                        See also the Technical note on this topic (-xtreg- entry, page 413, Stata 14 .pdf manual).

                        Kind regards,
                        Carlo
                        (Stata 18.0 SE)

                        Comment


                        • #13
                          Originally posted by Carlo Lazzaro View Post
                          Prash:
                          1) said that -regress- is (usually) not the proper statistical tool for dealing with panel data, the -robust- option in -regress- accomodates for heteroskedasticity only;
                          2) under -xtreg-, the robust/cluster option accomodates for both heteroskedasticity and/or serial correlation in the idiosyncratic error.

                          Hi Thanks so much completely get that for panel xtreg is preffered to regress, but putting that aside just to confirm:

                          1) In Panel data, when we use -regress-, the robust options does indeed deal with hetroskedasticity right?
                          thanks Carlo

                          Comment


                          • #14
                            Prash:
                            -yes: the -robust- option in -regress- deal with heteroskedasticity only.
                            Again, please note thet if you use -regress- with panel data and you omit to cluster the standard errors on -panelid-, your results will be biased,as you consider all the observations as independent and neglect their panel structure.
                            Kind regards,
                            Carlo
                            (Stata 18.0 SE)

                            Comment


                            • #15
                              Originally posted by Carlo Lazzaro View Post
                              Prash:
                              -yes: the -robust- option in -regress- deal with heteroskedasticity only.
                              Sorry to keep asking ( and putting aside that cluster is better than robust for panel data using -regress-)
                              when we have panel data and use regress, we can implement the robust command to simply deal with hetroskedascity right? This is fine?

                              sorry its just that you keep saying "the -robust- option in -regress- deal with heteroskedasticity only" and wanted to make sure this is in the context of panel data aswell

                              Thanks so much Carlo.

                              Comment

                              Working...
                              X