Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Heavy tailed residuals / non normality

    Dear all,

    After the great advice I got yesterday, I would be very happy if someone would know the answer to a question I have been struggling with for quite some time.

    After running a regression with fixed effects in the form of
    Code:
     xtreg y x1 x2 i.year, fe
    I obtained and plotted the residuals. There seem to be fat tails when looking at the distribution however. This would suggest many outliers of the residuals. After doing some reading, I read that non normality /Heteroskedasticity of the residuals can be taking into account by using clustered standerd errors when N is large enough. Does this mean I should not worry about fat tails?


  • #2
    Mimina:
    set aside that normality ( a weak requirement) and heteroskedasticity (a minor nuisance) of the epsilon are two different beasts, it is true that if you have invoked -robust- or -vce(clustered panelid)- in your -xtreg, fe- code (both options do the very same job), you should not worry at all.
    As an aside, you've described with a handful of words (that are not that informative about the issue you're reporting on) what you could have shared via a graph (that can give interested listers the exact idea of what is going on with your dataset).
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thank you very much Carlo,

      I agree I should have shown a graph and therefore I eleborate my steps below for anyone who might have the same question in the future

      There are 3 things I don't understand

      1. Regarding the difference between normality and heteroskedasticity. These two are not the same?

      set aside that normality ( a weak requirement) and heteroskedasticity (a minor nuisance) of the epsilon are two different beasts
      and

      Originally posted by Carlo Lazzaro View Post
      Maisa:
      as you should know, doing assigments on someone else's behalf is not among the goals of this list.
      That said, you should first consult any decent textbooks on panel data econometrics to answer most part of your queries yourself.
      As far as your questions are concerned:
      - vce(cluster id) will solve both heteroskedasticity and autocorrelation (please, see Example 3, -xtreg- entry, Stata 14 .pdf manual);
      - heteroskedasticity exactly means that you have non-normally distributed residuals;
      - natural logging of depvar and indepvars brings about some interpretational issues of your results. It is not mandatory that you log all the variables on both sides of your regression equation.
      As a closing-out aside, I would check the literature of your research field to be sure that -xtreg,fe- specification is the right one, given that your depvar is a rate.
      2. I understand I should not worry about heteroskedasticity with clustering and N>T. Why should I however not worry about heavy tails? These tails imply ouliers right? Which could influence results?

      3. Am I correct when I say that the ouliers in the first graph confirm the heavy tails and that heteroskedasticity is present? i don't see he usual cone shape, but the points are defenitly not constant around 0.
      I will defenitly cluster around the panel id, but I would like to know why I am seeing these patterns.








      I first estimated the regression and residuals (before clustering)

      Code:
      xtreg y xi x2 i.year, fe
      Then i predicted the residuals and fitted values

      Code:
      predict e if e(sample), e
      predict xb if e (sample), xb
      predict xbu if e (sample), xbu
      I then scattered the residuls and fitted values

      Code:
      scatter e xb
      I then checked for fat tails with qnorm

      Code:
      qnorm e







      Last edited by Mimina John; 05 Jul 2022, 12:06.

      Comment


      • #4
        Sorry. Somehow the graphs could not be inserted.

        Click image for larger version

Name:	vraag1.PNG
Views:	1
Size:	27.0 KB
ID:	1672272


        Click image for larger version

Name:	vraag2.PNG
Views:	1
Size:	8.8 KB
ID:	1672273

        Comment


        • #5
          Robust standard errors not only do not help for heavy tails, they make things worse. For non-robust standard errors, you need up to 2nd moment be finite. For robust standard errors, you need up to 4th order moments be finite.

          Comment


          • #6
            Sorry Joro Kolev, I am afraid I don’t understand… you say I should worry when using xtreg and clustered standard errors? These standard errors are the same as the robust ones in xtreg. Why exactly should I worry?
            From post #2 it seems like I should not?

            Comment


            • #7
              This is not what Carlo said. What he said is that if you have a large sample, you should not worry about non-normality of your residuals. And that if you have calcualted robust standard errors, you should not worry about heteroskedasticity.

              This is all true if you have finiate moments up to a certain order. For heteroskedasticy robust standard errors, you need up to 4th finite moments.

              What I am saying is that if you do not have heteroskedasticity, the standard formula for standard errors need only up to 2nd finite moment.

              The robust standard errors need up to 4th finite moment.

              So not only that robust standard errors do not help in you have only heavy tails, but also they might be inconsistent while the standard standard errors are consistent, e.g., in the case when you have up to 2nd finite moments, but not finite upper moments.


              Originally posted by Mimina John View Post
              Sorry Joro Kolev, I am afraid I don’t understand… you say I should worry when using xtreg and clustered standard errors? These standard errors are the same as the robust ones in xtreg. Why exactly should I worry?
              From post #2 it seems like I should not?
              Last edited by Joro Kolev; 05 Jul 2022, 13:47.

              Comment


              • #8
                This is not what Carlo said. What he said is that if you have a large sample, you should not worry about non-normality of your residuals. And that if you have calcualted robust standard errors, you should not worry about heteroskedasticity.

                This is all true if you have finite moments up to a certain order. For heteroskedasticy robust standard errors, you need up to 4th order finite moments.

                What I am saying is that if you do not have heteroskedasticity, the standard formula for standard errors needs only up to 2nd order finite moment.

                The robust standard errors need up to 4th order finite moment.

                So not only that robust standard errors do not help if you have only heavy tails (and no heteroskedasticity), but also they might be inconsistent while the standard standard errors are consistent, e.g., in the case when you have up to 2nd order finite moments, but not finite upper moments.


                [/QUOTE]

                Comment


                • #9
                  Thank you for clarifying Joro. During class we were always told just to use robust and not even think further.

                  To account for autocorrelation I will go for clustered errors. I am still struggling with interpretation of the first graph. Does it show heteroskedasticity or just large outliers of residuals and fat tails. The reason I am asking this is that I worry that the fat tails will mean that I have large outliers in my residuals which could influence results.
                  Previously I was told to check outliers of the variables before running a regression, but here I see outliers in the residuals.

                  Comment


                  • #10
                    Mimina:
                    how many N and T is your panel dataset composed of?
                    Could you please share the outcome of your -xtreg- regression?
                    Have you already checked that the functional form of your regressand is correctly specified?
                    Thanks.
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment


                    • #11
                      Originally posted by Mimina John View Post
                      Thank you for clarifying Joro. During class we were always told just to use robust and not even think further.

                      To account for autocorrelation I will go for clustered errors. I am still struggling with interpretation of the first graph. Does it show heteroskedasticity or just large outliers of residuals and fat tails. The reason I am asking this is that I worry that the fat tails will mean that I have large outliers in my residuals which could influence results.
                      Previously I was told to check outliers of the variables before running a regression, but here I see outliers in the residuals.
                      They (and I) tell in class to just use robust and not worry, because the technical conditions for finite moments are hard to verify, so we just assume them to hold. Still it is wrong to think that for homoskedastic heavy tailed data the robust errors will help anything. On the opposite, for homoskedastic very heavy tailed data robust standard errors might be inconsistent where the standard standard errors are consistent.

                      Reading and properly understanding graphs of the type you are showing is an art in itself, and I used to teach this art long time ago, probably last time year 2006-2007 at Universitat Pompeu Fabra. There is an excellent book that explains this art (among other regression related matters), "Regression with graphics" by Lawrence Hamilton. As I have not gazed at such graphs for at least 15 years now, I have lost my graph-gazing skills. I think your graphs show heavy tails and no-heteroskedasticity, but this is based on 15 years old graph-gazing skills, so not a very reliable judgment on my side.

                      These graphs are very relevant when we deal with small(ish) samples. As nowadays we deal with huge samples, we do not gaze that much at such graphs anymore.

                      Overall Carlo is leading you in the right direction, for your fixed effects regression it is crucial now large is your N, and your T. If you have huge N, and small T, you can skip gazing at such graphs, and just do a fixed effects regression with cluster-robust standard errors.


                      Comment


                      • #12
                        Thank you both Carlo and Joro. This is immensely appreciated.
                        I am looking at the effect of a certain policy on the revenue generating process of different districts. My unbalanced dataset consists of nearly 300 districts for 11 years (T). My N is about 3200. During this time districts may implement the policy (P) or not depending on the year. As the RHS variables most likely have a lagged effect, I included lags. Moreover there most likely is an interaction effect depending on the intensity of the policy. I therefore estimated:

                        Code:
                         xtreg lnDistrict_Revenue L.i.P##L.c.Intensity c.L.lnUrbanPopulation##c.L.lnUrbanPopulation  c.L.lnPropertyvalue c.L.lnGrant##c.L.lnGrant c.L.lnIncome_percapita c.L.ShareUnemployed c.L.ShareElderly c.L.ShareYoung L.lnSpending i.Year if group==0, fe cluster(District)
                        
                        Fixed-effects (within) regression               Number of obs     =      3,204
                        Group variable: District                        Number of groups  =        298
                        
                        R-sq:                                           Obs per group:
                             within  = 0.7285                                         min =          6
                             between = 0.0179                                         avg =       10.8
                             overall = 0.0325                                         max =         11
                        
                                                                        F(23,297)         =     172.78
                        corr(u_i, Xb)  = -0.9326                        Prob > F          =     0.0000
                        
                                                                                  (Std. Err. adjusted for 298 clusters in District)
                        -----------------------------------------------------------------------------------------------------------
                                                                  |               Robust
                                               lnDistrict_Revenue |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                        ------------------------------------------+----------------------------------------------------------------
                                                              L.P |
                                                               1  |   .0351146   .0151471     2.32   0.021     .0053053    .0649238
                                                                  |
                                                        Intensity |
                                                              L1. |     .19091   .0975182     1.96   0.051    -.0010041    .3828242
                                                                  |
                                                 L.P#cL.Intensity |
                                                               1  |  -.1870479   .1081467    -1.73   0.085    -.3998788     .025783
                                                                  |
                                                lnUrbanPopulation |
                                                              L1. |   2.370568   1.676327     1.41   0.158    -.9284165    5.669553
                                                                  |
                        cL.lnUrbanPopulation#cL.lnUrbanPopulation |  -.1596966   .0787233    -2.03   0.043    -.3146229   -.0047704
                                                                  |
                                                  lnPropertyvalue |
                                                              L1. |  -.7620712   .0773144    -9.86   0.000    -.9142247   -.6099178
                                                                  |
                                                          lnGrant |
                                                              L1. |   .9593062   .2999702     3.20   0.002     .3689699    1.549642
                                                                  |
                                            cL.lnGrant#cL.lnGrant |  -.0228989   .0081084    -2.82   0.005    -.0388561   -.0069416
                                                                  |
                                               lnIncome_percapita |
                                                              L1. |   .0228362   .1156852     0.20   0.844    -.2048304    .2505027
                                                                  |
                                                  ShareUnemployed |
                                                              L1. |  -.0163021   .0095847    -1.70   0.090    -.0351648    .0025605
                                                                  |
                                                     ShareElderly |
                                                              L1. |  -.0026668    .002733    -0.98   0.330    -.0080453    .0027117
                                                                  |
                                                       ShareYoung |
                                                              L1. |  -.0039125    .003059    -1.28   0.202    -.0099326    .0021076
                                                                  |
                                                       lnSpending |
                                                              L1. |  -.0034049   .0030913    -1.10   0.272    -.0094886    .0026787
                                                                  |
                                                             Year |
                                                            2002  |   .0454305   .0154763     2.94   0.004     .0149733    .0758876
                                                            2003  |    .110855   .0188005     5.90   0.000      .073856     .147854
                                                            2004  |   .1813692   .0234291     7.74   0.000     .1352611    .2274773
                                                            2005  |   .1985591   .0317878     6.25   0.000     .1360012     .261117
                                                            2006  |   .1323212   .0377318     3.51   0.001     .0580657    .2065767
                                                            2007  |   .1298228    .042398     3.06   0.002     .0463842    .2132615
                                                            2008  |   .1263483   .0453511     2.79   0.006     .0370981    .2155986
                                                            2009  |   .1168673   .0481765     2.43   0.016     .0220568    .2116778
                                                            2010  |   .1327302   .0562972     2.36   0.019     .0219383    .2435221
                                                            2011  |   .1738778   .0612029     2.84   0.005     .0534314    .2943241
                                                                  |
                                                            _cons |  -5.173469   8.324948    -0.62   0.535    -21.55683    11.20989
                        ------------------------------------------+----------------------------------------------------------------
                                                          sigma_u |  .66077134
                                                          sigma_e |  .06458734
                                                              rho |  .99053626   (fraction of variance due to u_i)
                        -----------------------------------------------------------------------------------------------------------
                        The interaction terms between the logged variabels is because of the specification test.
                        I then tested for groupwise heteroskedasticity. With the community contributed xttest3 command from ssc.

                        Code:
                          
                        . predict xb if e(sample), xb
                        (2,368 missing values generated)
                        
                        .
                        . generate xb2=xb^2
                        (2,368 missing values generated)
                        
                        .
                        . generate xb3=xb^3
                        (2,368 missing values generated)
                        
                        .
                        . generate xb4=xb^4
                        (2,368 missing values generated)
                        
                        . xttest3
                        
                        Modified Wald test for groupwise heteroskedasticity
                        in fixed effect regression model
                        
                        H0: sigma(i)^2 = sigma^2 for all i
                        
                        chi2 (298)  =  14176.94
                        Prob>chi2 =      0.0000

                        This would indicate that the variance of the residuals are not constant across districts if I am not mistaken.
                        I then again tested for misspecification which seems ok now.

                        Code:
                        xtreg lnDistrict_Revenue L.i.P##L.c.Intensity c.L.lnUrbanPopulation##c.L.lnUrbanPopulation  c.L.lnPropertyvalue c.L.lnGrant##c.L.l
                        > nGrant c.L.lnIncome_percapita c.L.ShareUnemployed c.L.ShareElderly c.L.ShareYoung L.lnSpending i.Year xb2 xb3 xb4 if group==0, fe
                        > cluster(District)
                        
                        Fixed-effects (within) regression               Number of obs     =      3,204
                        Group variable: District                        Number of groups  =        298
                        
                        R-sq:                                           Obs per group:
                             within  = 0.7290                                         min =          6
                             between = 0.0185                                         avg =       10.8
                             overall = 0.0324                                         max =         11
                        
                                                                        F(26,297)         =     186.31
                        corr(u_i, Xb)  = -0.9360                        Prob > F          =     0.0000
                        
                                                                                  (Std. Err. adjusted for 298 clusters in District)
                        -----------------------------------------------------------------------------------------------------------
                                                                  |               Robust
                                               lnDistrict_Revenue |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                        ------------------------------------------+----------------------------------------------------------------
                                                              L.P |
                                                               1  |   .0336473   .0150911     2.23   0.027     .0039483    .0633463
                                                                  |
                                                        Intensity |
                                                              L1. |   .1756475   .0999316     1.76   0.080    -.0210161    .3723112
                                                                  |
                                                 L.P#cL.Intensity |
                                                               1  |  -.1731547   .1087956    -1.59   0.113    -.3872627    .0409533
                                                                  |
                                                lnUrbanPopulation |
                                                              L1. |   2.840668   1.774916     1.60   0.111    -.6523364    6.333673
                                                                  |
                        cL.lnUrbanPopulation#cL.lnUrbanPopulation |  -.1805637   .0835569    -2.16   0.031    -.3450024   -.0161251
                                                                  |
                                                  lnPropertyvalue |
                                                              L1. |  -.7429113   .0892687    -8.32   0.000    -.9185906    -.567232
                                                                  |
                                                          lnGrant |
                                                              L1. |   .8768876   .2961761     2.96   0.003      .294018    1.459757
                                                                  |
                                            cL.lnGrant#cL.lnGrant |  -.0208299   .0080221    -2.60   0.010    -.0366173   -.0050425
                                                                  |
                                               lnIncome_percapita |
                                                              L1. |   .0122068   .1165146     0.10   0.917    -.2170921    .2415056
                                                                  |
                                                  ShareUnemployed |
                                                              L1. |  -.0150679   .0100154    -1.50   0.134     -.034778    .0046423
                                                                  |
                                                     ShareElderly |
                                                              L1. |  -.0027982   .0027062    -1.03   0.302    -.0081241    .0025276
                                                                  |
                                                       ShareYoung |
                                                              L1. |  -.0033152   .0031507    -1.05   0.294    -.0095158    .0028854
                                                                  |
                                                       lnSpending |
                                                              L1. |  -.0028392   .0032665    -0.87   0.385    -.0092676    .0035893
                                                                  |
                                                             Year |
                                                            2002  |    .043793   .0165126     2.65   0.008     .0112965    .0762895
                                                            2003  |   .1053737   .0222043     4.75   0.000     .0616761    .1490714
                                                            2004  |   .1712626     .03015     5.68   0.000      .111928    .2305972
                                                            2005  |   .1870068   .0387219     4.83   0.000     .1108026    .2632109
                                                            2006  |   .1244704   .0413466     3.01   0.003     .0431009    .2058399
                                                            2007  |   .1229813    .045523     2.70   0.007     .0333928    .2125697
                                                            2008  |   .1208621    .048003     2.52   0.012      .026393    .2153312
                                                            2009  |    .113623   .0500109     2.27   0.024     .0152024    .2120435
                                                            2010  |   .1312226   .0580849     2.26   0.025     .0169124    .2455327
                                                            2011  |   .1721488   .0634854     2.71   0.007     .0472105    .2970871
                                                                  |
                                                              xb2 |   .0233128   .0243301     0.96   0.339    -.0245684     .071194
                                                              xb3 |  -.0037314   .0058588    -0.64   0.525    -.0152613    .0077985
                                                              xb4 |   .0002341   .0019449     0.12   0.904    -.0035934    .0040616
                                                            _cons |  -7.237485   8.843269    -0.82   0.414    -24.64089    10.16592
                        ------------------------------------------+----------------------------------------------------------------
                                                          sigma_u |  .67667806
                                                          sigma_e |  .06456705
                                                              rho |  .99097761   (fraction of variance due to u_i)
                        -----------------------------------------------------------------------------------------------------------
                        
                        .
                        . test xb2=xb3=xb4=0
                        
                         ( 1)  xb2 - xb3 = 0
                         ( 2)  xb2 - xb4 = 0
                         ( 3)  xb2 = 0
                        
                               F(  3,   297) =    1.60
                                    Prob > F =    0.1897
                        I still have a question regarding the normality of residuals vs heteroskedasticity. Based on the previous posts this is not the same thing. A teacher once told me it was and the post I quoted in #3 says the same thing right? Based on xttest3 I can reject the null, this means that the variance is not constant across groups. What does the normality tell me however? It also tells me that I have outliers right? And the variance is not constant?

                        Comment


                        • #13
                          Mimina:
                          this thread might be useful: https://stats.stackexchange.com/ques...-about-my-data.
                          Kind regards,
                          Carlo
                          (Stata 19.0)

                          Comment


                          • #14
                            Thank you Carlo.

                            I came across this thread explaining why you can have both or one of the two. I found it quite useful. I now understand how it can be that these two outcomes can differ. For anyone who was just as confused as me: there are nice graphs which is always useful.

                            https://stats.stackexchange.com/ques...uals-normality



                            I know I can account for the outcome from the xttest3 (community contributed) with clustering.
                            Is it true that non normality does not impact the coefficients and removing the outliers will therefore not impact the coefficients greatly? The plot would suggests that my model does not do well for certain observations. However, looking at a pdf provided by my institution it says non normality of the residuals will not impact the coefficients, but it can impact standerd errors and therefore p values. I would think that those outliers might mean I should run separate regressions as there might be different effects when not including those outliers?

                            I have one last question: I know it is possible to identify those observations for which e is an outlier. However is it possible to identify those observations for which the effect of the policy does not hold? So although there might be an overal effect, the effect might not present in certain districts.
                            I don’t think there really is right? Because the coefficients of the output are based on all observations. I was thinking about running separate regressions with and without the policy variable and then comparing the fitted values or residuals. This would however not really do the trick I think because here I would be looking at model fit. Would you think running different regressions for different sub samples be better?

                            Comment


                            • #15
                              Mimina:
                              1) removing outliers means, first, what an outlier is, Usually, exception made for apparen mistakes in data entry, removing "weird behaving" observations is not advisable, as you may end up with a dataset that is loosely related to your original one;
                              2) I do share all you concerns and the way you tried your best to address them by yourself (which is laudable indeed). That said, I would stick with the data as they are. All in all, you cannot rule out that the data generating process you're investigating produces for some observations relevant residuals, exactly because the model fit cannot be at its best for some observations (whereas your test based on the powers of fitted values looks encouraging in terms of absence of evidence of misspecification of the functional form of the regressand).
                              Kind regards,
                              Carlo
                              (Stata 19.0)

                              Comment

                              Working...
                              X