Heavy tailed residuals / non normality

Mimina John

Join Date: Mar 2022

Posts: 32
#1

Heavy tailed residuals / non normality

05 Jul 2022, 10:02

Dear all,

After the great advice I got yesterday, I would be very happy if someone would know the answer to a question I have been struggling with for quite some time.

After running a regression with fixed effects in the form of

Code:

xtreg y x1 x2 i.year, fe

I obtained and plotted the residuals. There seem to be fat tails when looking at the distribution however. This would suggest many outliers of the residuals. After doing some reading, I read that non normality /Heteroskedasticity of the residuals can be taking into account by using clustered standerd errors when N is large enough. Does this mean I should not worry about fat tails?
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#2

05 Jul 2022, 10:38

Mimina:
set aside that normality ( a weak requirement) and heteroskedasticity (a minor nuisance) of the epsilon are two different beasts, it is true that if you have invoked -robust- or -vce(clustered panelid)- in your -xtreg, fe- code (both options do the very same job), you should not worry at all.
As an aside, you've described with a handful of words (that are not that informative about the issue you're reporting on) what you could have shared via a graph (that can give interested listers the exact idea of what is going on with your dataset).

Kind regards,
Carlo
(Stata 19.0)
Comment
Mimina John

Join Date: Mar 2022

Posts: 32
#3

05 Jul 2022, 11:58

Thank you very much Carlo,

I agree I should have shown a graph and therefore I eleborate my steps below for anyone who might have the same question in the future

There are 3 things I don't understand

1. Regarding the difference between normality and heteroskedasticity. These two are not the same?

set aside that normality ( a weak requirement) and heteroskedasticity (a minor nuisance) of the epsilon are two different beasts

and

Originally posted by Carlo Lazzaro View Post

Maisa:
as you should know, doing assigments on someone else's behalf is not among the goals of this list.
That said, you should first consult any decent textbooks on panel data econometrics to answer most part of your queries yourself.
As far as your questions are concerned:
- vce(cluster id) will solve both heteroskedasticity and autocorrelation (please, see Example 3, -xtreg- entry, Stata 14 .pdf manual);
- heteroskedasticity exactly means that you have non-normally distributed residuals;
- natural logging of depvar and indepvars brings about some interpretational issues of your results. It is not mandatory that you log all the variables on both sides of your regression equation.
As a closing-out aside, I would check the literature of your research field to be sure that -xtreg,fe- specification is the right one, given that your depvar is a rate.

2. I understand I should not worry about heteroskedasticity with clustering and N>T. Why should I however not worry about heavy tails? These tails imply ouliers right? Which could influence results?

3. Am I correct when I say that the ouliers in the first graph confirm the heavy tails and that heteroskedasticity is present? i don't see he usual cone shape, but the points are defenitly not constant around 0.
I will defenitly cluster around the panel id, but I would like to know why I am seeing these patterns.

I first estimated the regression and residuals (before clustering)

Code:

xtreg y xi x2 i.year, fe

Then i predicted the residuals and fitted values

Code:

predict e if e(sample), e predict xb if e (sample), xb predict xbu if e (sample), xbu

I then scattered the residuls and fitted values

Code:

scatter e xb

I then checked for fat tails with qnorm

Code:

qnorm e

Last edited by Mimina John; 05 Jul 2022, 12:06.
Comment
Mimina John

Join Date: Mar 2022

Posts: 32
#4

05 Jul 2022, 12:08

Sorry. Somehow the graphs could not be inserted.
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#5

05 Jul 2022, 12:44

Robust standard errors not only do not help for heavy tails, they make things worse. For non-robust standard errors, you need up to 2nd moment be finite. For robust standard errors, you need up to 4th order moments be finite.
Comment
Mimina John

Join Date: Mar 2022

Posts: 32
#6

05 Jul 2022, 13:18

Sorry Joro Kolev, I am afraid I don’t understand… you say I should worry when using xtreg and clustered standard errors? These standard errors are the same as the robust ones in xtreg. Why exactly should I worry?
From post #2 it seems like I should not?
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#7

05 Jul 2022, 13:45

^{This is not what Carlo said. What he said is that if you have a large sample, you should not worry about non-normality of your residuals. And that if you have calcualted robust standard errors, you should not worry about heteroskedasticity.

This is all true if you have finiate moments up to a certain order. For heteroskedasticy robust standard errors, you need up to 4th finite moments.

What I am saying is that if you do not have heteroskedasticity, the standard formula for standard errors need only up to 2nd finite moment.

The robust standard errors need up to 4th finite moment.

So not only that robust standard errors do not help in you have only heavy tails, but also they might be inconsistent while the standard standard errors are consistent, e.g., in the case when you have up to 2nd finite moments, but not finite upper moments.}

Originally posted by Mimina John View Post

Sorry Joro Kolev, I am afraid I don’t understand… you say I should worry when using xtreg and clustered standard errors? These standard errors are the same as the robust ones in xtreg. Why exactly should I worry?
From post #2 it seems like I should not?

Last edited by Joro Kolev; 05 Jul 2022, 13:47.
1 like
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#8

05 Jul 2022, 13:49

This is not what Carlo said. What he said is that if you have a large sample, you should not worry about non-normality of your residuals. And that if you have calcualted robust standard errors, you should not worry about heteroskedasticity.

This is all true if you have finite moments up to a certain order. For heteroskedasticy robust standard errors, you need up to 4th order finite moments.

What I am saying is that if you do not have heteroskedasticity, the standard formula for standard errors needs only up to 2nd order finite moment.

The robust standard errors need up to 4th order finite moment.

So not only that robust standard errors do not help if you have only heavy tails (and no heteroskedasticity), but also they might be inconsistent while the standard standard errors are consistent, e.g., in the case when you have up to 2nd order finite moments, but not finite upper moments.

[/QUOTE]
Comment
Mimina John

Join Date: Mar 2022

Posts: 32
#9

05 Jul 2022, 14:07

Thank you for clarifying Joro. During class we were always told just to use robust and not even think further.

To account for autocorrelation I will go for clustered errors. I am still struggling with interpretation of the first graph. Does it show heteroskedasticity or just large outliers of residuals and fat tails. The reason I am asking this is that I worry that the fat tails will mean that I have large outliers in my residuals which could influence results.
Previously I was told to check outliers of the variables before running a regression, but here I see outliers in the residuals.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#10

05 Jul 2022, 15:09

Mimina:
how many N and T is your panel dataset composed of?
Could you please share the outcome of your -xtreg- regression?
Have you already checked that the functional form of your regressand is correctly specified?
Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#11

05 Jul 2022, 23:27

Originally posted by Mimina John View Post

Thank you for clarifying Joro. During class we were always told just to use robust and not even think further.

To account for autocorrelation I will go for clustered errors. I am still struggling with interpretation of the first graph. Does it show heteroskedasticity or just large outliers of residuals and fat tails. The reason I am asking this is that I worry that the fat tails will mean that I have large outliers in my residuals which could influence results.
Previously I was told to check outliers of the variables before running a regression, but here I see outliers in the residuals.

They (and I) tell in class to just use robust and not worry, because the technical conditions for finite moments are hard to verify, so we just assume them to hold. Still it is wrong to think that for homoskedastic heavy tailed data the robust errors will help anything. On the opposite, for homoskedastic very heavy tailed data robust standard errors might be inconsistent where the standard standard errors are consistent.

Reading and properly understanding graphs of the type you are showing is an art in itself, and I used to teach this art long time ago, probably last time year 2006-2007 at Universitat Pompeu Fabra. There is an excellent book that explains this art (among other regression related matters), "Regression with graphics" by Lawrence Hamilton. As I have not gazed at such graphs for at least 15 years now, I have lost my graph-gazing skills. I think your graphs show heavy tails and no-heteroskedasticity, but this is based on 15 years old graph-gazing skills, so not a very reliable judgment on my side.

These graphs are very relevant when we deal with small(ish) samples. As nowadays we deal with huge samples, we do not gaze that much at such graphs anymore.

Overall Carlo is leading you in the right direction, for your fixed effects regression it is crucial now large is your N, and your T. If you have huge N, and small T, you can skip gazing at such graphs, and just do a fixed effects regression with cluster-robust standard errors.
1 like
Comment

Mimina John

Join Date: Mar 2022
Posts: 32

#12

06 Jul 2022, 07:38

Thank you both Carlo and Joro. This is immensely appreciated.
I am looking at the effect of a certain policy on the revenue generating process of different districts. My unbalanced dataset consists of nearly 300 districts for 11 years (T). My N is about 3200. During this time districts may implement the policy (P) or not depending on the year. As the RHS variables most likely have a lagged effect, I included lags. Moreover there most likely is an interaction effect depending on the intensity of the policy. I therefore estimated:

Code:

 xtreg lnDistrict_Revenue L.i.P##L.c.Intensity c.L.lnUrbanPopulation##c.L.lnUrbanPopulation  c.L.lnPropertyvalue c.L.lnGrant##c.L.lnGrant c.L.lnIncome_percapita c.L.ShareUnemployed c.L.ShareElderly c.L.ShareYoung L.lnSpending i.Year if group==0, fe cluster(District)

Fixed-effects (within) regression               Number of obs     =      3,204
Group variable: District                        Number of groups  =        298

R-sq:                                           Obs per group:
     within  = 0.7285                                         min =          6
     between = 0.0179                                         avg =       10.8
     overall = 0.0325                                         max =         11

                                                F(23,297)         =     172.78
corr(u_i, Xb)  = -0.9326                        Prob > F          =     0.0000

                                                          (Std. Err. adjusted for 298 clusters in District)
-----------------------------------------------------------------------------------------------------------
                                          |               Robust
                       lnDistrict_Revenue |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
------------------------------------------+----------------------------------------------------------------
                                      L.P |
                                       1  |   .0351146   .0151471     2.32   0.021     .0053053    .0649238
                                          |
                                Intensity |
                                      L1. |     .19091   .0975182     1.96   0.051    -.0010041    .3828242
                                          |
                         L.P#cL.Intensity |
                                       1  |  -.1870479   .1081467    -1.73   0.085    -.3998788     .025783
                                          |
                        lnUrbanPopulation |
                                      L1. |   2.370568   1.676327     1.41   0.158    -.9284165    5.669553
                                          |
cL.lnUrbanPopulation#cL.lnUrbanPopulation |  -.1596966   .0787233    -2.03   0.043    -.3146229   -.0047704
                                          |
                          lnPropertyvalue |
                                      L1. |  -.7620712   .0773144    -9.86   0.000    -.9142247   -.6099178
                                          |
                                  lnGrant |
                                      L1. |   .9593062   .2999702     3.20   0.002     .3689699    1.549642
                                          |
                    cL.lnGrant#cL.lnGrant |  -.0228989   .0081084    -2.82   0.005    -.0388561   -.0069416
                                          |
                       lnIncome_percapita |
                                      L1. |   .0228362   .1156852     0.20   0.844    -.2048304    .2505027
                                          |
                          ShareUnemployed |
                                      L1. |  -.0163021   .0095847    -1.70   0.090    -.0351648    .0025605
                                          |
                             ShareElderly |
                                      L1. |  -.0026668    .002733    -0.98   0.330    -.0080453    .0027117
                                          |
                               ShareYoung |
                                      L1. |  -.0039125    .003059    -1.28   0.202    -.0099326    .0021076
                                          |
                               lnSpending |
                                      L1. |  -.0034049   .0030913    -1.10   0.272    -.0094886    .0026787
                                          |
                                     Year |
                                    2002  |   .0454305   .0154763     2.94   0.004     .0149733    .0758876
                                    2003  |    .110855   .0188005     5.90   0.000      .073856     .147854
                                    2004  |   .1813692   .0234291     7.74   0.000     .1352611    .2274773
                                    2005  |   .1985591   .0317878     6.25   0.000     .1360012     .261117
                                    2006  |   .1323212   .0377318     3.51   0.001     .0580657    .2065767
                                    2007  |   .1298228    .042398     3.06   0.002     .0463842    .2132615
                                    2008  |   .1263483   .0453511     2.79   0.006     .0370981    .2155986
                                    2009  |   .1168673   .0481765     2.43   0.016     .0220568    .2116778
                                    2010  |   .1327302   .0562972     2.36   0.019     .0219383    .2435221
                                    2011  |   .1738778   .0612029     2.84   0.005     .0534314    .2943241
                                          |
                                    _cons |  -5.173469   8.324948    -0.62   0.535    -21.55683    11.20989
------------------------------------------+----------------------------------------------------------------
                                  sigma_u |  .66077134
                                  sigma_e |  .06458734
                                      rho |  .99053626   (fraction of variance due to u_i)
-----------------------------------------------------------------------------------------------------------

The interaction terms between the logged variabels is because of the specification test.
I then tested for groupwise heteroskedasticity. With the community contributed xttest3 command from ssc.

Code:

  
. predict xb if e(sample), xb
(2,368 missing values generated)

.
. generate xb2=xb^2
(2,368 missing values generated)

.
. generate xb3=xb^3
(2,368 missing values generated)

.
. generate xb4=xb^4
(2,368 missing values generated)

. xttest3

Modified Wald test for groupwise heteroskedasticity
in fixed effect regression model

H0: sigma(i)^2 = sigma^2 for all i

chi2 (298)  =  14176.94
Prob>chi2 =      0.0000

This would indicate that the variance of the residuals are not constant across districts if I am not mistaken.
I then again tested for misspecification which seems ok now.

Code:

xtreg lnDistrict_Revenue L.i.P##L.c.Intensity c.L.lnUrbanPopulation##c.L.lnUrbanPopulation  c.L.lnPropertyvalue c.L.lnGrant##c.L.l
> nGrant c.L.lnIncome_percapita c.L.ShareUnemployed c.L.ShareElderly c.L.ShareYoung L.lnSpending i.Year xb2 xb3 xb4 if group==0, fe
> cluster(District)

Fixed-effects (within) regression               Number of obs     =      3,204
Group variable: District                        Number of groups  =        298

R-sq:                                           Obs per group:
     within  = 0.7290                                         min =          6
     between = 0.0185                                         avg =       10.8
     overall = 0.0324                                         max =         11

                                                F(26,297)         =     186.31
corr(u_i, Xb)  = -0.9360                        Prob > F          =     0.0000

                                                          (Std. Err. adjusted for 298 clusters in District)
-----------------------------------------------------------------------------------------------------------
                                          |               Robust
                       lnDistrict_Revenue |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
------------------------------------------+----------------------------------------------------------------
                                      L.P |
                                       1  |   .0336473   .0150911     2.23   0.027     .0039483    .0633463
                                          |
                                Intensity |
                                      L1. |   .1756475   .0999316     1.76   0.080    -.0210161    .3723112
                                          |
                         L.P#cL.Intensity |
                                       1  |  -.1731547   .1087956    -1.59   0.113    -.3872627    .0409533
                                          |
                        lnUrbanPopulation |
                                      L1. |   2.840668   1.774916     1.60   0.111    -.6523364    6.333673
                                          |
cL.lnUrbanPopulation#cL.lnUrbanPopulation |  -.1805637   .0835569    -2.16   0.031    -.3450024   -.0161251
                                          |
                          lnPropertyvalue |
                                      L1. |  -.7429113   .0892687    -8.32   0.000    -.9185906    -.567232
                                          |
                                  lnGrant |
                                      L1. |   .8768876   .2961761     2.96   0.003      .294018    1.459757
                                          |
                    cL.lnGrant#cL.lnGrant |  -.0208299   .0080221    -2.60   0.010    -.0366173   -.0050425
                                          |
                       lnIncome_percapita |
                                      L1. |   .0122068   .1165146     0.10   0.917    -.2170921    .2415056
                                          |
                          ShareUnemployed |
                                      L1. |  -.0150679   .0100154    -1.50   0.134     -.034778    .0046423
                                          |
                             ShareElderly |
                                      L1. |  -.0027982   .0027062    -1.03   0.302    -.0081241    .0025276
                                          |
                               ShareYoung |
                                      L1. |  -.0033152   .0031507    -1.05   0.294    -.0095158    .0028854
                                          |
                               lnSpending |
                                      L1. |  -.0028392   .0032665    -0.87   0.385    -.0092676    .0035893
                                          |
                                     Year |
                                    2002  |    .043793   .0165126     2.65   0.008     .0112965    .0762895
                                    2003  |   .1053737   .0222043     4.75   0.000     .0616761    .1490714
                                    2004  |   .1712626     .03015     5.68   0.000      .111928    .2305972
                                    2005  |   .1870068   .0387219     4.83   0.000     .1108026    .2632109
                                    2006  |   .1244704   .0413466     3.01   0.003     .0431009    .2058399
                                    2007  |   .1229813    .045523     2.70   0.007     .0333928    .2125697
                                    2008  |   .1208621    .048003     2.52   0.012      .026393    .2153312
                                    2009  |    .113623   .0500109     2.27   0.024     .0152024    .2120435
                                    2010  |   .1312226   .0580849     2.26   0.025     .0169124    .2455327
                                    2011  |   .1721488   .0634854     2.71   0.007     .0472105    .2970871
                                          |
                                      xb2 |   .0233128   .0243301     0.96   0.339    -.0245684     .071194
                                      xb3 |  -.0037314   .0058588    -0.64   0.525    -.0152613    .0077985
                                      xb4 |   .0002341   .0019449     0.12   0.904    -.0035934    .0040616
                                    _cons |  -7.237485   8.843269    -0.82   0.414    -24.64089    10.16592
------------------------------------------+----------------------------------------------------------------
                                  sigma_u |  .67667806
                                  sigma_e |  .06456705
                                      rho |  .99097761   (fraction of variance due to u_i)
-----------------------------------------------------------------------------------------------------------

.
. test xb2=xb3=xb4=0

 ( 1)  xb2 - xb3 = 0
 ( 2)  xb2 - xb4 = 0
 ( 3)  xb2 = 0

       F(  3,   297) =    1.60
            Prob > F =    0.1897

I still have a question regarding the normality of residuals vs heteroskedasticity. Based on the previous posts this is not the same thing. A teacher once told me it was and the post I quoted in #3 says the same thing right? Based on xttest3 I can reject the null, this means that the variance is not constant across groups. What does the normality tell me however? It also tells me that I have outliers right? And the variance is not constant?

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#13

06 Jul 2022, 10:42

Mimina:
this thread might be useful: https://stats.stackexchange.com/ques...-about-my-data.

Kind regards,
Carlo
(Stata 19.0)
Comment
Mimina John

Join Date: Mar 2022

Posts: 32
#14

06 Jul 2022, 11:09

Thank you Carlo.

I came across this thread explaining why you can have both or one of the two. I found it quite useful. I now understand how it can be that these two outcomes can differ. For anyone who was just as confused as me: there are nice graphs which is always useful.

https://stats.stackexchange.com/ques...uals-normality

I know I can account for the outcome from the xttest3 (community contributed) with clustering.
Is it true that non normality does not impact the coefficients and removing the outliers will therefore not impact the coefficients greatly? The plot would suggests that my model does not do well for certain observations. However, looking at a pdf provided by my institution it says non normality of the residuals will not impact the coefficients, but it can impact standerd errors and therefore p values. I would think that those outliers might mean I should run separate regressions as there might be different effects when not including those outliers?

I have one last question: I know it is possible to identify those observations for which e is an outlier. However is it possible to identify those observations for which the effect of the policy does not hold? So although there might be an overal effect, the effect might not present in certain districts.
I don’t think there really is right? Because the coefficients of the output are based on all observations. I was thinking about running separate regressions with and without the policy variable and then comparing the fitted values or residuals. This would however not really do the trick I think because here I would be looking at model fit. Would you think running different regressions for different sub samples be better?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#15

06 Jul 2022, 11:30

Mimina:
1) removing outliers means, first, what an outlier is, Usually, exception made for apparen mistakes in data entry, removing "weird behaving" observations is not advisable, as you may end up with a dataset that is loosely related to your original one;
2) I do share all you concerns and the way you tried your best to address them by yourself (which is laudable indeed). That said, I would stick with the data as they are. All in all, you cannot rule out that the data generating process you're investigating produces for some observations relevant residuals, exactly because the model fit cannot be at its best for some observations (whereas your test based on the powers of fitted values looks encouraging in terms of absence of evidence of misspecification of the functional form of the regressand).

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement