How to handle heteroskedasticity, autocorrelation, and cross-sectional dependence in panel data (xtreg, re)?

Doan Ngan

Join Date: Jan 2025
Posts: 11

How to handle heteroskedasticity, autocorrelation, and cross-sectional dependence in panel data (xtreg, re)?

03 Aug 2025, 00:49

Hi everyone,

I have a problem and I’m not sure whether I’m doing it right. Could anyone help me with this?

I have panel data with N = 124 companies from around the world and T = 7 years (2018–2024). I estimate the following model:
EBITDA_it = β₀ + β₁CO2Em_it + β₂ESG_it + β₃CO2EmXCC_it +

Control_it + u_i + e_it

Because I include two dummy variables (Industry and Region), I decided to use a random effects model (REM).

I then tested for:

Heteroskedasticity using xttest2
Cross-sectional dependence using xtcdf
Autocorrelation using xtserial

The results show that my data suffers from all three problems. To address these issues, I use -xtreg, re vce(cluster id)
Here are result:

Code:

. xtreg lnEBITDA lnCO2 ESGScore lnCO2xCC Inflation lnMktCap Size Lev Industry i.region_num, re

Random-effects GLS regression                   Number of obs     =        534
Group variable: id                              Number of groups  =        124

R-squared:                                      Obs per group:
     Within  = 0.4239                                         min =          1
     Between = 0.9171                                         avg =        4.3
     Overall = 0.8981                                         max =          7

                                                Wald chi2(12)     =    1564.39
corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000

--------------------------------------------------------------------------------
      lnEBITDA | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
---------------+----------------------------------------------------------------
         lnCO2 |   .0443468   .0216099     2.05   0.040     .0019922    .0867015
      ESGScore |   .0021128   .0022001     0.96   0.337    -.0021994    .0064249
      lnCO2xCC |   .0113222   .0096175     1.18   0.239    -.0075277    .0301722
     Inflation |   .0086087   .0033217     2.59   0.010     .0020983    .0151191
      lnMktCap |   .3800489   .0348148    10.92   0.000     .3118132    .4482846
          Size |   1.468875   .0980446    14.98   0.000     1.276712    1.661039
           Lev |  -.0007403   .0033407    -0.22   0.825    -.0072879    .0058073
       Industry |  -.0019131   .0811607    -0.02   0.981     -.160985    .1571589
               |
    region_num |
    Australia  |   .5221762   .1792769     2.91   0.004     .1707999    .8735525
       Europe  |   .4443917   .1094821     4.06   0.000     .2298106    .6589727
North America  |    .239526   .1074525     2.23   0.026     .0289229    .4501291
South America  |    .703292   .2437693     2.89   0.004      .225513    1.181071
               |
         _cons |  -3.871558   .7487461    -5.17   0.000    -5.339073   -2.404042
---------------+----------------------------------------------------------------
       sigma_u |   .3628966
       sigma_e |  .25249424
           rho |  .67380799   (fraction of variance due to u_i)
--------------------------------------------------------------------------------

Code:

. xtreg lnEBITDA lnCO2 ESGScore lnCO2xCC Inflation lnMktCap Size Lev Industry i.region_num, re vce(cluster id)

Random-effects GLS regression                   Number of obs     =        534
Group variable: id                              Number of groups  =        124

R-squared:                                      Obs per group:
     Within  = 0.4239                                         min =          1
     Between = 0.9171                                         avg =        4.3
     Overall = 0.8981                                         max =          7

                                                Wald chi2(12)     =    1383.87
corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000

                                     (Std. err. adjusted for 124 clusters in id)
--------------------------------------------------------------------------------
               |               Robust
      lnEBITDA | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
---------------+----------------------------------------------------------------
         lnCO2 |   .0443468    .025415     1.74   0.081    -.0054657    .0941593
      ESGScore |   .0021128   .0026091     0.81   0.418     -.003001    .0072265
      lnCO2xCC |   .0113222    .007854     1.44   0.149    -.0040713    .0267158
     Inflation |   .0086087    .006487     1.33   0.184    -.0041055     .021323
      lnMktCap |   .3800489    .038064     9.98   0.000     .3054449    .4546529
          Size |   1.468875   .1169659    12.56   0.000     1.239626    1.698124
           Lev |  -.0007403   .0021294    -0.35   0.728    -.0049138    .0034332
       Industry |  -.0019131   .0711128    -0.03   0.979    -.1412916    .1374655
               |
    region_num |
    Australia  |   .5221762   .2244736     2.33   0.020     .0822161    .9621364
       Europe  |   .4443917   .1169048     3.80   0.000     .2152625    .6735208
North America  |    .239526   .1149305     2.08   0.037     .0142664    .4647856
South America  |    .703292   .1486125     4.73   0.000     .4120169    .9945671
               |
         _cons |  -3.871558   .8817226    -4.39   0.000    -5.599702   -2.143413
---------------+----------------------------------------------------------------
       sigma_u |   .3628966
       sigma_e |  .25249424
           rho |  .67380799   (fraction of variance due to u_i)
--------------------------------------------------------------------------------

My questions are:

Is using vce(cluster id) with random effects sufficient when heteroskedasticity, autocorrelation, and cross-sectional dependence are all present?
If not, what would be a more appropriate approach for this situation?

Thank you!

Tags: None

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17737
#2

03 Aug 2025, 01:11

Doan:
you may want to take a lool at Panel data: testing for serialcorrelation and heteroskedasticity - Statalist.
That said:
1) I would stick with -vce(cluster panelid)- standard errors.
2) I do not understand why the choice between -fe- and -re- estimator should be driven by two categorical variables instead of the comparison between them in terms of consistency and efficiency.

Kind regards,
Carlo
(Stata 19.0)
Comment
Doan Ngan

Join Date: Jan 2025

Posts: 11
#3

03 Aug 2025, 02:58

Carlo, thank you for your comment.
I actually did run the Hausman test to decide between -fe- and -re-, and the result suggested that -re- is more appropriate for my data.
So my choice of -re- was based on that test, not only on the presence of the categorical variables.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17737
#4

03 Aug 2025, 08:28

Doan:
thanks for clarifying.
That said, if you use non-default standard errors, you should switch from -hausman. to the Stata community-contributed module -xtoverid-.
Being glorious but a bit old-fashioned, -xtoverid- does not support -fvvarlist- notation. See -xi- prefix as a possible workaround.

Kind regards,
Carlo
(Stata 19.0)
Comment
Doan Ngan

Join Date: Jan 2025

Posts: 11
#5

22 Aug 2025, 22:38

Hi Carlo,
I am inclined to use -xtreg, vce(cluster panelid)- rather than -xtgls- to address heteroskedasticity and/or serial correlation in panel data. However, in my department I have noticed that many student theses have applied -xtgls-, and their supervisors have accepted it. Therefore, I would like to ask if you could provide me with authoritative references that support the use of clustered standard errors -xtreg, vce(cluster panelid)- instead of -xtgls- for panels where N>T (with T small). In addition, my model includes interaction terms with two dummy variables (covid and postcovid), such as lnCO2xCCxcovid and lnCO2xCCxpostcovid, with lnCO2xCC serving as the baseline. To estimate the effect of lnCO2xCC during the covid period, I should use: -lincom lnCO2xCC + lnCO2xCCxcovid- Could you please confirm if this is the correct procedure?
Thank you

Last edited by Doan Ngan; 22 Aug 2025, 22:41.
Comment
Manh Hoang Ba

Join Date: Aug 2023

Posts: 23
#6

23 Aug 2025, 03:32

Hi Ngan,

I see some misunderstandings in the way you have analyzed the data.
First, xttest2 does not perform a heteroscedasticity test, but a cross-dependency test (the appropriate application context is large T panels). This may be a typo issue, you may be referring to xttest3. However, xttest3 is a FEM post-estimation command, the appropriate application context is also large T panels. For REM, to the best of my knowledge, there is currently no STATA command for heteroscedasticity testing. Some discussions use the Breusch-Pagan/White procedure for pooling errors (v_it = u_i+e_it) as a workaround.

Second, the result of choosing REM implies that the right-hand side variables (key variables and control variables) are not correlated with u_i, which is very suspicious and too complicated to justify. As Lazzaro suggests, you should probably run the robust version of the Hausman test using the xtoverid command (after estimating the REM with the robust option), which might provide more convincing evidence.

Third, the FGLS procedure with the xtgls command performs the Park-Kmenta estimation procedure (Park, 1967; Kmenta, 1971) which is suitable for balanced panels, fixed N, large T and T>>N. Whereas your data is an unbalanced panel, T << N. If you want to apply FGLS, the xtgls2 command (ssc install xtgls2) performs the General GLS Kiefer-Wooldridge (Kiefer, 1980; Wooldridge, 2002) estimation which is suitable in this context, after you balance the data using the xtbalance or xtbalance2 commands.
As for your solutions.
First, about the analysis of the two dummy variables. In case the robust Hausman gives a suitable FEM, and you want the coefficients of the two dummy variables, the Mundlak correlated random effects (CRE) model can be used. The CRE models the correlation of the right-hand side variables with u_i based on some necessary assumptions. It is important that the coefficients of the time-varying variables of the CRE are the same as those of the FEM.

Second, regarding the three problems of errors (heteroscedasticity, autocorrelation, cross-sectional dependence). You can use Rogers standard errors (Rogers, 1993) by running the xtreg command with the vce(cluster cluster_var) option. In this, the observations are assumed to be arbitrarily correlated within each cluster, but independent between the clusters defined by cluster_var. With firm data from several different countries around the world, a candidate for cluster_var would be a country-identifying variable, or an industry-identifying variable. Firms within a country or (or industry) may suffer similar economic shocks which may lead to cross-dependence. The application context is that the number of clusters must be large enough. If not, for example <10, Wide-Bootstrap clustering is a good choice.
It seems you are Vietnamese, I have a video tutorial for this case here: STATA - Hồi quy OLS, FEM, REM, FGLS - Phần 2. Kiểm định và chữa lỗi: Robust, Bootstrap & FGLS - YouTube
1 like
Comment
Doan Ngan

Join Date: Jan 2025

Posts: 11
#7

23 Aug 2025, 04:13

Hi Manh, I'm Vietnamese, thanks for recognizing that. By the way, regarding the analysis of the two dummy variables: in case the Hausman test indicates that FEM is appropriate, but the robust Hausman test -xtoverid- supports REM, which one should I choose?

Also, I have a question about using robust standard errors without first testing for heteroskedasticity and/or autocorrelation. As far as I know, the usual procedure is to first select the appropriate model (REM or FEM), then perform post-estimation tests for any misspecification, and only afterward apply clustered standard errors.

Thank you
Comment
Manh Hoang Ba

Join Date: Aug 2023

Posts: 23
#8

23 Aug 2025, 04:32

Originally posted by Doan Ngan View Post

Hi Manh, I'm Vietnamese, thanks for recognizing that. By the way, regarding the analysis of the two dummy variables: in case the Hausman test indicates that FEM is appropriate, but the robust Hausman test -xtoverid- supports REM, which one should I choose?

Also, I have a question about using robust standard errors without first testing for heteroskedasticity and/or autocorrelation. As far as I know, the usual procedure is to first select the appropriate model (REM or FEM), then perform post-estimation tests for any misspecification, and only afterward apply clustered standard errors.

Thank you

If so, FEM is probably a safe choice given the potential endogeneity in the explanatory variables. In addition, the Hausman test using the hausman fe re command often has the problem of the (V_b-V_B) matrix not being positive definite. In that case, you should use the sigmamore or sigmaless option.

You have nothing to lose by using robust standard errors in the robust Hausman test based on the CRE (Mundlak method) estimate to obtain robust inferences. If you want, you can use the xtoverid command after estimating the xtreg command without the robust option and compare the results.
Comment
Doan Ngan

Join Date: Jan 2025

Posts: 11
#9

23 Aug 2025, 08:15

Thank you both for providing me with such clear guidance.
Comment

Announcement

How to handle heteroskedasticity, autocorrelation, and cross-sectional dependence in panel data (xtreg, re)?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment