correcting heteroskedasticity but obtaining not normally distributed residuals

ana maria giraldo

Join Date: Sep 2019

Posts: 13
#1

correcting heteroskedasticity but obtaining not normally distributed residuals

17 Sep 2019, 10:55

Hi everyone,

I am working with panel data. I have aproximately 200 observations in total. My time variable is birth decade (from 1880 until 1960) and my cross-sectional variable is province of birth.

My dependent variable is years of schooling and my main explanatory var is migrant share. I have been trying to correct the not normally distributed errors (I already created transformation of the variables) - I solved the problem with the residuals

regression:
xi: reg yrsc i.region i.bdec logmigrantshare logurbanshare gapmalefemale cattle

Then I wanted to correct heteroskedasticity, so I transformed yrsc to lnyrsc, heteroskedasticity was gone but the residuals are again not normally distributed

At this point I do not know what is better, if having heteroskedasticity or not normally distributed res. The tests that I used were estat imtest, white and sktest for the residuals

Could someone help me with this problem?
Thanks in advance
Tags: None
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

17 Sep 2019, 11:23

Welcome to the Stata Forum/Statalist.

To start, unless you're using an old Stata version, you may get rid of the "xi: " prefix. Use factor notation instead.

With regards to the model, since the DV conveys years of schooling, you may prefer a Poisson or Negative binomial model. You may do this under a panel data as well.

Best regards,

Marcos
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17701
#3

17 Sep 2019, 12:08

Ana Maria:
as an aside to Marcos' helpful insight, please note that non-normality in OLS residual distribution is rarely a concern, as normality can be relaxed via aympotic theory.
Heteroskedasticity (if it is not a warning for model misspecification) can be handled with robust or clustered standard errors in panel data setting (by the way, as Marcos wisely suggested, take a look at the Stata suite of commands for panel data regression prefixed with -xt-.
Eventually, if, as Marcos skillfully surmised, your dependent variable takes on positive integer values only, you should consider -xtpoissson- or -xtnbreg-, if -xtpoisson- is overdispersed.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
ana maria giraldo

Join Date: Sep 2019

Posts: 13
#4

18 Sep 2019, 06:36

Dear Marcos und Carlo,

Thank you so much for your suggestion.
I will review the commands suitable for my stata version (15).

Regarding the suggestion of Carlo to use robust or clustered standard errors, I tried it but it did not change the outcome of heteroskedasticity. I do not know if I am doing something wrong but it does not get rid of the problem.

xi: reg yrsc i.bdec logmigrantshare logurbanshare femaleshare cattle, vce(robust)

The result of white´s test

White's test for Ho: homoskedasticity
against Ha: unrestricted heteroskedasticity

chi2(53) = 105.74
Prob > chi2 = 0.0000

Cameron & Trivedi's decomposition of IM-test

---------------------------------------------------
Source | chi2 df p
---------------------+-----------------------------
Heteroskedasticity | 105.74 53 0.0000
Skewness | 14.74 12 0.2560
Kurtosis | 4.06 1 0.0439
---------------------+-----------------------------
Total | 124.54 66 0.0000
Comment
ana maria giraldo

Join Date: Sep 2019

Posts: 13
#5

18 Sep 2019, 06:44

Hi Marcos,

I also already ran a log-log model and with this heteroskedasticity is gone. But I still have the problem of normality. So in this case I guess I will relax that assumption.

xi: reg lnyrsc i.region i.bdec logmigrantshare logurbanshare gapmalefemale cattle

White's test for Ho: homoskedasticity
against Ha: unrestricted heteroskedasticity

chi2(77) = 86.31
Prob > chi2 = 0.2190

Cameron & Trivedi's decomposition of IM-test

---------------------------------------------------
Source | chi2 df p
---------------------+-----------------------------
Heteroskedasticity | 86.31 77 0.2190
Skewness | 25.48 14 0.0301
Kurtosis | 3.84 1 0.0499
---------------------+-----------------------------
Total | 115.64 92 0.0484
---------------------------------------------------
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#6

18 Sep 2019, 09:32

Hello Ana,

To start, since you have Stata 15 and adopted factor notation, you may just "forget" the use of the "xi:" prefix. If you still feel unconfortable with this, just try the command for yourself and check that the result are the same.

That said, your results with - regress - shall not be the same with a Poisson-family model.

What is more, you underlined in your first post that the study has a panel data structure. Being this so, you should follow Carlo's advice in #3, which, in short, points to a "xt" prefix.

To sum up, I believe that trying to get rid of a non-normal distribution is not the real problem. Rather, I recommend to think about a) the type of DV variable, which seems to behave like a count data; b) the panel data structure, which will provide the overarching approach for, well, a longitudinal study.

Hopefully that helps.

Best regards,

Marcos
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35641
#7

18 Sep 2019, 10:08

Robust standard errors at most give a more honest summary of residual variability. They don't change the parameter estimates, so the residuals remain as they were before. Personally I think most of these tests are over-rated. I'd rather look at a plot of residuals versus fitted and one of observed versus fitted. Can the model be improved? is a better question than Is the model acceptable?

As suggested by others I'd see Poisson or related models as more natural than vanilla regression, even with a log transformation.
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17701

18 Sep 2019, 11:19

Ana Maria:
re-testing heteroskedastcity after robustified/clustered standard error have been imposed is not helpful if, as it happens, the test use the original residual distribution (ie, it ignores that you took heteroskedasticity into account via non-default standard errors):

Code:

use "C:\Program Files (x86)\Stata15\ado\base\a\auto.dta" 
. reg price mpg

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(1, 72)        =     20.26
       Model |   139449474         1   139449474   Prob > F        =    0.0000
    Residual |   495615923        72  6883554.48   R-squared       =    0.2196
-------------+----------------------------------   Adj R-squared   =    0.2087
       Total |   635065396        73  8699525.97   Root MSE        =    2623.7

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -238.8943   53.07669    -4.50   0.000    -344.7008   -133.0879
       _cons |   11253.06   1170.813     9.61   0.000     8919.088    13587.03
------------------------------------------------------------------------------

. estat hettest

Breusch-Pagan / Cook-Weisberg test for heteroskedasticity 
         Ho: Constant variance
         Variables: fitted values of price

         chi2(1)      =     7.14
         Prob > chi2  =   0.0075
. predict resi1, res

. reg price mpg, rob

Linear regression                               Number of obs     =         74
                                                F(1, 72)          =      17.28
                                                Prob > F          =     0.0001
                                                R-squared         =     0.2196
                                                Root MSE          =     2623.7

------------------------------------------------------------------------------
             |               Robust
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -238.8943   57.47701    -4.16   0.000    -353.4727    -124.316
       _cons |   11253.06   1376.393     8.18   0.000     8509.272    13996.85
------------------------------------------------------------------------------

. predict resi2, res

. list resi* in 1/5

     +-----------------------+
     |     resi1       resi2 |
     |-----------------------|
  1. | -1898.385   -1898.385 |
  2. | -2442.857   -2442.857 |
  3. | -2198.385   -2198.385 |
  4. | -1659.174   -1659.174 |
  5. |  157.3545    157.3545 |
     +-----------------------+

.

Kind regards,
Carlo
(Stata 19.0)

Comment

ana maria giraldo

Join Date: Sep 2019

Posts: 13
#9

15 Oct 2019, 01:15

Hello everyone,

I would like to thank all of you for your help. I will try all of your suggestions.

It is really amazing to have people helping you with programming questions when you have almost no previous experience in the field :D
Comment

Announcement

correcting heteroskedasticity but obtaining not normally distributed residuals

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment