Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to handle heteroskedasticity, autocorrelation, and cross-sectional dependence in panel data (xtreg, re)?

    Hi everyone,

    I have a problem and I’m not sure whether I’m doing it right. Could anyone help me with this?

    I have panel data with N = 124 companies from around the world and T = 7 years (2018–2024). I estimate the following model:
    EBITDAit = β0 + β1CO2Emit + β2ESGit + β3CO2EmXCCit + Controlit + ui + eit


    Because I include two dummy variables (Industry and Region), I decided to use a random effects model (REM).

    I then tested for:
    • Heteroskedasticity using xttest2
    • Cross-sectional dependence using xtcdf
    • Autocorrelation using xtserial
    The results show that my data suffers from all three problems. To address these issues, I use -xtreg, re vce(cluster id)
    Here are result:
    Code:
    . xtreg lnEBITDA lnCO2 ESGScore lnCO2xCC Inflation lnMktCap Size Lev Industry i.region_num, re
    
    Random-effects GLS regression                   Number of obs     =        534
    Group variable: id                              Number of groups  =        124
    
    R-squared:                                      Obs per group:
         Within  = 0.4239                                         min =          1
         Between = 0.9171                                         avg =        4.3
         Overall = 0.8981                                         max =          7
    
                                                    Wald chi2(12)     =    1564.39
    corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000
    
    --------------------------------------------------------------------------------
          lnEBITDA | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    ---------------+----------------------------------------------------------------
             lnCO2 |   .0443468   .0216099     2.05   0.040     .0019922    .0867015
          ESGScore |   .0021128   .0022001     0.96   0.337    -.0021994    .0064249
          lnCO2xCC |   .0113222   .0096175     1.18   0.239    -.0075277    .0301722
         Inflation |   .0086087   .0033217     2.59   0.010     .0020983    .0151191
          lnMktCap |   .3800489   .0348148    10.92   0.000     .3118132    .4482846
              Size |   1.468875   .0980446    14.98   0.000     1.276712    1.661039
               Lev |  -.0007403   .0033407    -0.22   0.825    -.0072879    .0058073
           Industry |  -.0019131   .0811607    -0.02   0.981     -.160985    .1571589
                   |
        region_num |
        Australia  |   .5221762   .1792769     2.91   0.004     .1707999    .8735525
           Europe  |   .4443917   .1094821     4.06   0.000     .2298106    .6589727
    North America  |    .239526   .1074525     2.23   0.026     .0289229    .4501291
    South America  |    .703292   .2437693     2.89   0.004      .225513    1.181071
                   |
             _cons |  -3.871558   .7487461    -5.17   0.000    -5.339073   -2.404042
    ---------------+----------------------------------------------------------------
           sigma_u |   .3628966
           sigma_e |  .25249424
               rho |  .67380799   (fraction of variance due to u_i)
    --------------------------------------------------------------------------------
    Code:
    . xtreg lnEBITDA lnCO2 ESGScore lnCO2xCC Inflation lnMktCap Size Lev Industry i.region_num, re vce(cluster id)
    
    Random-effects GLS regression                   Number of obs     =        534
    Group variable: id                              Number of groups  =        124
    
    R-squared:                                      Obs per group:
         Within  = 0.4239                                         min =          1
         Between = 0.9171                                         avg =        4.3
         Overall = 0.8981                                         max =          7
    
                                                    Wald chi2(12)     =    1383.87
    corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000
    
                                         (Std. err. adjusted for 124 clusters in id)
    --------------------------------------------------------------------------------
                   |               Robust
          lnEBITDA | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
    ---------------+----------------------------------------------------------------
             lnCO2 |   .0443468    .025415     1.74   0.081    -.0054657    .0941593
          ESGScore |   .0021128   .0026091     0.81   0.418     -.003001    .0072265
          lnCO2xCC |   .0113222    .007854     1.44   0.149    -.0040713    .0267158
         Inflation |   .0086087    .006487     1.33   0.184    -.0041055     .021323
          lnMktCap |   .3800489    .038064     9.98   0.000     .3054449    .4546529
              Size |   1.468875   .1169659    12.56   0.000     1.239626    1.698124
               Lev |  -.0007403   .0021294    -0.35   0.728    -.0049138    .0034332
           Industry |  -.0019131   .0711128    -0.03   0.979    -.1412916    .1374655
                   |
        region_num |
        Australia  |   .5221762   .2244736     2.33   0.020     .0822161    .9621364
           Europe  |   .4443917   .1169048     3.80   0.000     .2152625    .6735208
    North America  |    .239526   .1149305     2.08   0.037     .0142664    .4647856
    South America  |    .703292   .1486125     4.73   0.000     .4120169    .9945671
                   |
             _cons |  -3.871558   .8817226    -4.39   0.000    -5.599702   -2.143413
    ---------------+----------------------------------------------------------------
           sigma_u |   .3628966
           sigma_e |  .25249424
               rho |  .67380799   (fraction of variance due to u_i)
    --------------------------------------------------------------------------------
    My questions are:
    1. Is using vce(cluster id) with random effects sufficient when heteroskedasticity, autocorrelation, and cross-sectional dependence are all present?
    2. If not, what would be a more appropriate approach for this situation?
    Thank you!

  • #2
    Doan:
    you may want to take a lool at Panel data: testing for serialcorrelation and heteroskedasticity - Statalist.
    That said:
    1) I would stick with -vce(cluster panelid)- standard errors.
    2) I do not understand why the choice between -fe- and -re- estimator should be driven by two categorical variables instead of the comparison between them in terms of consistency and efficiency.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Carlo, thank you for your comment.
      I actually did run the Hausman test to decide between -fe- and -re-, and the result suggested that -re- is more appropriate for my data.
      So my choice of -re- was based on that test, not only on the presence of the categorical variables.

      Comment


      • #4
        Doan:
        thanks for clarifying.
        That said, if you use non-default standard errors, you should switch from -hausman. to the Stata community-contributed module -xtoverid-.
        Being glorious but a bit old-fashioned, -xtoverid- does not support -fvvarlist- notation. See -xi- prefix as a possible workaround.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Hi Carlo,
          I am inclined to use -xtreg, vce(cluster panelid)- rather than -xtgls- to address heteroskedasticity and/or serial correlation in panel data. However, in my department I have noticed that many student theses have applied -xtgls-, and their supervisors have accepted it. Therefore, I would like to ask if you could provide me with authoritative references that support the use of clustered standard errors -xtreg, vce(cluster panelid)- instead of -xtgls- for panels where N>T (with T small). In addition, my model includes interaction terms with two dummy variables (covid and postcovid), such as lnCO2xCCxcovid and lnCO2xCCxpostcovid, with lnCO2xCC serving as the baseline. To estimate the effect of lnCO2xCC during the covid period, I should use: -lincom lnCO2xCC + lnCO2xCCxcovid- Could you please confirm if this is the correct procedure?
          Thank you
          Last edited by Doan Ngan; 22 Aug 2025, 22:41.

          Comment


          • #6
            Hi Ngan,

            I see some misunderstandings in the way you have analyzed the data.
            First, xttest2 does not perform a heteroscedasticity test, but a cross-dependency test (the appropriate application context is large T panels). This may be a typo issue, you may be referring to xttest3. However, xttest3 is a FEM post-estimation command, the appropriate application context is also large T panels. For REM, to the best of my knowledge, there is currently no STATA command for heteroscedasticity testing. Some discussions use the Breusch-Pagan/White procedure for pooling errors (v_it = u_i+e_it) as a workaround.

            Second, the result of choosing REM implies that the right-hand side variables (key variables and control variables) are not correlated with u_i, which is very suspicious and too complicated to justify. As Lazzaro suggests, you should probably run the robust version of the Hausman test using the xtoverid command (after estimating the REM with the robust option), which might provide more convincing evidence.

            Third, the FGLS procedure with the xtgls command performs the Park-Kmenta estimation procedure (Park, 1967; Kmenta, 1971) which is suitable for balanced panels, fixed N, large T and T>>N. Whereas your data is an unbalanced panel, T << N. If you want to apply FGLS, the xtgls2 command (ssc install xtgls2) performs the General GLS Kiefer-Wooldridge (Kiefer, 1980; Wooldridge, 2002) estimation which is suitable in this context, after you balance the data using the xtbalance or xtbalance2 commands.
            As for your solutions.
            First, about the analysis of the two dummy variables. In case the robust Hausman gives a suitable FEM, and you want the coefficients of the two dummy variables, the Mundlak correlated random effects (CRE) model can be used. The CRE models the correlation of the right-hand side variables with u_i based on some necessary assumptions. It is important that the coefficients of the time-varying variables of the CRE are the same as those of the FEM.

            Second, regarding the three problems of errors (heteroscedasticity, autocorrelation, cross-sectional dependence). You can use Rogers standard errors (Rogers, 1993) by running the xtreg command with the vce(cluster cluster_var) option. In this, the observations are assumed to be arbitrarily correlated within each cluster, but independent between the clusters defined by cluster_var. With firm data from several different countries around the world, a candidate for cluster_var would be a country-identifying variable, or an industry-identifying variable. Firms within a country or (or industry) may suffer similar economic shocks which may lead to cross-dependence. The application context is that the number of clusters must be large enough. If not, for example <10, Wide-Bootstrap clustering is a good choice.
            It seems you are Vietnamese, I have a video tutorial for this case here: STATA - Hồi quy OLS, FEM, REM, FGLS - Phần 2. Kiểm định và chữa lỗi: Robust, Bootstrap & FGLS - YouTube

            Comment


            • #7
              Hi Manh, I'm Vietnamese, thanks for recognizing that. By the way, regarding the analysis of the two dummy variables: in case the Hausman test indicates that FEM is appropriate, but the robust Hausman test -xtoverid- supports REM, which one should I choose?

              Also, I have a question about using robust standard errors without first testing for heteroskedasticity and/or autocorrelation. As far as I know, the usual procedure is to first select the appropriate model (REM or FEM), then perform post-estimation tests for any misspecification, and only afterward apply clustered standard errors.

              Thank you

              Comment


              • #8
                Originally posted by Doan Ngan View Post
                Hi Manh, I'm Vietnamese, thanks for recognizing that. By the way, regarding the analysis of the two dummy variables: in case the Hausman test indicates that FEM is appropriate, but the robust Hausman test -xtoverid- supports REM, which one should I choose?

                Also, I have a question about using robust standard errors without first testing for heteroskedasticity and/or autocorrelation. As far as I know, the usual procedure is to first select the appropriate model (REM or FEM), then perform post-estimation tests for any misspecification, and only afterward apply clustered standard errors.

                Thank you
                If so, FEM is probably a safe choice given the potential endogeneity in the explanatory variables. In addition, the Hausman test using the hausman fe re command often has the problem of the (V_b-V_B) matrix not being positive definite. In that case, you should use the sigmamore or sigmaless option.

                You have nothing to lose by using robust standard errors in the robust Hausman test based on the CRE (Mundlak method) estimate to obtain robust inferences. If you want, you can use the xtoverid command after estimating the xtreg command without the robust option and compare the results.

                Comment


                • #9
                  Thank you both for providing me with such clear guidance.

                  Comment

                  Working...
                  X