Checking for heteroskedasticity

Alejandro Cuadros

Join Date: Jan 2021

Posts: 22
#1

Checking for heteroskedasticity

13 Jan 2021, 12:03

Hi guys,
I am running a regression with country fixed effects using the command " xtreg Y X, fe". The results were fine because all the p-values were < 0.05. I ran the hausman test and the result was that the model was well explained with fixed effects. Also no multicollinearity problem. Then I realized I had to control for heteroskedasticity. In order to do so, I ran a regression using the following command; "xtreg Y X, fe robust" and the p-values are significantly higher and above 0.05. Does this mean that my model is not correct if I include standard robust errors? Can I control this heteroskedasticity problem? ( Please take under consideration that I am very new to stata and econometrics)
Thanks in advance for the help

Last edited by Alejandro Cuadros; 13 Jan 2021, 12:13.
Tags: fixed effects, heteroskedasticity, panel data, regression
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#2

13 Jan 2021, 12:27

Alejandro:
the -robust- option in -xtreg- does the very same job the -vce(cluster clusterid)- actually does, that is taking both heteroskedasticity and/or autocorrelation into account.
You do not report (as per FAQ) what you typed (exactly) and what Stata gave you back: hence, it is difficul to avoid a bit of guess-working in replying. The clustered-robust standard errors can be correct (and their default counterpart misleading) or the other way round: if your panels are enough (say, around 50), the clustered-robust standard errors are the way to go, otherwise they can be even more misleading than the default ones.
It may also be that you have serial correlation issues that are captuted by the -robust- option.
As an aside, please note that it is not correct to test for possible heteroskedastcity (and/or autocorrelation) after -hausman- (that does not support non-default standard errors). You should check it before comparing -fe- vs -re- specification and, if clustered-robust standard errors are necessary, you should forget -hausman- and consider the community-contributed programme -xtoverid- (that, in turns, does not support the -fvvarlist- notation; see -xi:- help file then if you have categorical -i.predictors- in the right-hand side of your regression equation).

Kind regards,
Carlo
(Stata 19.0)
Comment
Alejandro Cuadros

Join Date: Jan 2021

Posts: 22
#3

13 Jan 2021, 13:21

Thanks Carlo,

I must reveal that I am a little bit confused with your answer. I will provide more detail so you can actually see what comes out using first only " xtreg y x, fe" then using the command "xtreg y x, fe robust" and then using the command " xtreg y x, fe vce(robust)".

With my model I'm trying to explain the influence of Economic Freedom on the Female Labor Force Participation Rate. For this I am using data from 100 countries over 15 years. The idea is to demonstrate this relationship using a fixed effects model, in order to analyze the effect within countries. I added 3 independent variables besides Economic Freedom and there is no problem of multicollinearity. The result we got is that there was a significant influence (p value<0.05) of all independent variables on Female Labor Force Participation. After running the hausman test the result was that this model was well explained with Fixed Effects.

The commands that I used are the following:

egen Country1=group(Country)
xtset Country1 Year, yearly
xtsum Dependent Ind1 Ind2 Ind3 Ind4
xtreg Dependent Ind1 Ind2 Ind3 Ind4, fe
xtreg Dependent Ind1 Ind2 Ind3 Ind4, re
xtreg Dependent Ind1 Ind2 Ind3 Ind4, fe robust

Everything was perfect (significant) till the last step. Here are the pictures.

Thanks again..
Attached Files

Last edited by Alejandro Cuadros; 13 Jan 2021, 13:42.
Comment
Alejandro Cuadros

Join Date: Jan 2021

Posts: 22
#4

13 Jan 2021, 13:37

.
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17712

14 Jan 2021, 01:38

Alejandro:
just to wrap up:
1) -robust-=-vce(robust)-=-vce(cluster clusterid)-: so, under -xtreg-, you will get exactly the same results;
2) as per the number of your panles (94) you should invoke non-default standard errors (that is, one of the option reported above), no matter whether the p-values reach statistical signicance or not (this is a minor issue, regardless what we're usually taught at the univeristy);
3) -hausman- does not support non-default standard errors: hence, you should rely on the community-contributed command -xtoverid- (that you can easily download from SSC; see -help SSC-).
The following toy-example can hopefully be helpful (please note that -xtoverid- does not support -fvvarlist- notation: hence you should prefix your code with -xi:- if you have categorical variables and/or create interactions by hand):

Code:

. use "https://www.stata-press.com/data/r16/nlswork.dta"
(National Longitudinal Survey.  Young Women 14-26 years of age in 1968)

. xtreg ln_wage c.age##c.age i.nev_mar, fe robust

Fixed-effects (within) regression               Number of obs     =     28,494
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1091                                         min =          1
     between = 0.0969                                         avg =        6.0
     overall = 0.0846                                         max =         15

                                                F(3,4709)         =     341.15
corr(u_i, Xb)  = 0.0391                         Prob > F          =     0.0000

                             (Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0515872   .0046007    11.21   0.000     .0425677    .0606067
             |
 c.age#c.age |  -.0005643   .0000757    -7.45   0.000    -.0007127   -.0004159
             |
   1.nev_mar |  -.0182635    .010661    -1.71   0.087     -.039164     .002637
       _cons |    .682259   .0686616     9.94   0.000     .5476501    .8168679
-------------+----------------------------------------------------------------
     sigma_u |  .40461866
     sigma_e |  .30234418
         rho |  .64170177   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. estimates store fe

. xtreg ln_wage c.age##c.age i.nev_mar, re robust

Random-effects GLS regression                   Number of obs     =     28,494
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1087                                         min =          1
     between = 0.1024                                         avg =        6.0
     overall = 0.0876                                         max =         15

                                                Wald chi2(3)      =    1260.91
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

                             (Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0599506    .004303    13.93   0.000      .051517    .0683843
             |
 c.age#c.age |  -.0006892   .0000712    -9.68   0.000    -.0008288   -.0005496
             |
   1.nev_mar |   .0060386   .0090804     0.67   0.506    -.0117588    .0238359
       _cons |   .5319739   .0631005     8.43   0.000     .4082991    .6556486
-------------+----------------------------------------------------------------
     sigma_u |   .3642833
     sigma_e |  .30234418
         rho |  .59211886   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. estimates store re

. hausman fe re
hausman cannot be used with vce(robust), vce(cluster cvar), or p-weighted data
r(198);

. g sq_age=age^2
(24 missing values generated)

. xi: xtreg ln_wage age sq_age i.nev_mar, re robust
i.nev_mar         _Inev_mar_0-1       (naturally coded; _Inev_mar_0 omitted)

Random-effects GLS regression                   Number of obs     =     28,494
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1087                                         min =          1
     between = 0.1024                                         avg =        6.0
     overall = 0.0876                                         max =         15

                                                Wald chi2(3)      =    1260.91
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

                             (Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0599506    .004303    13.93   0.000      .051517    .0683843
      sq_age |  -.0006892   .0000712    -9.68   0.000    -.0008288   -.0005496
 _Inev_mar_1 |   .0060386   .0090804     0.67   0.506    -.0117588    .0238359
       _cons |   .5319739   .0631005     8.43   0.000     .4082991    .6556486
-------------+----------------------------------------------------------------
     sigma_u |   .3642833
     sigma_e |  .30234418
         rho |  .59211886   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. xtoverid

Test of overidentifying restrictions: fixed vs random effects
Cross-section time-series model: xtreg re  robust cluster(idcode)
Sargan-Hansen statistic  86.932  Chi-sq(3)    P-value = 0.0000

.

The null of -xtoverid- is that -re- is the way to go (as such, unlike -hausman-, there's no need to compare -fe- vs -re-; the test shows that -fe- is correct for this (probably misspecified) regression model

Kind regards,
Carlo
(Stata 19.0)

Comment

Alejandro Cuadros

Join Date: Jan 2021

Posts: 22
#6

14 Jan 2021, 04:41

Thank you again Carlo. I did as you said and this was the result, but I am not sure how to interpret it. Is this model better explain with re or fe? And second, can we conclude there is an (significant) effect of the independent variables on the dependent variable even for those p-values? If not, every tip is welcome.

. xi: xtreg FLFP EFI Puestos gdppercap agr, re robust

Random-effects GLS regression Number of obs = 1,504
Group variable: Country1 Number of groups = 94

R-sq: Obs per group:
within = 0.1229 min = 16
between = 0.0398 avg = 16.0
overall = 0.0416 max = 16

I know you said p-values were not so important, but here are they (maybe useful for interpretation):
------------+----------------------------------------------------------------
EFI : 0.149
Puestos : 0.084
gdppercap : 0.000
agr : 0.379
_cons : 0.000

Wald chi2(4) = 31.25
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

sigma_u | 12.982257
sigma_e | 1.9932807
rho | .97696876 (fraction of variance due to u_i)
------------------------------------------------------------------------------

. xtoverid

Test of overidentifying restrictions: fixed vs random effects
Cross-section time-series model: xtreg re robust cluster(Country1)
Sargan-Hansen statistic 11.177 Chi-sq(4) P-value = 0.0246

Best regards

Last edited by Alejandro Cuadros; 14 Jan 2021, 05:02.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#7

14 Jan 2021, 06:16

Alejandro:
the -xtoverid- outcome points you toward the -fe- specification (as the null is rejected).
What I would do now is to focus on your second -xtreg,fe- model (I mean the one with robust standar errors) and test whether it is misspecified or not (we discussed this issue Yesterday).

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement