Contradictory results of tests for conditional normality of residuals in OLS

Mark Edgeworth

Join Date: Oct 2018

Posts: 5
#1

Contradictory results of tests for conditional normality of residuals in OLS

03 Oct 2018, 11:51

Hello,

I am performing OLS and I have conducted two tests of the assumption of conditional normality of residuals.

For the first tested I simply used the summarize command. My understanding is that skewness or kurtosis values <0.05 implies deviation from normality, so these results seem solid.

However I then performed a skewness/kurtosis test and got a p-value of 0.0000. My understanding is that this implies deviation from normality:

How should I think about these results?

Last edited by Mark Edgeworth; 03 Oct 2018, 12:02.
Tags: OLS, regression
Rich Goldstein

Join Date: Mar 2014

Posts: 4258
#2

03 Oct 2018, 12:29

you misunderstand about skewness and kurtosis; if data are normally distributed, then skewness is 0 and kurtosis is 3 - your results imply a lack of normality

more important is that this is generally something that can be handled either in the modeling or sometimes by using robust estimates of SE's; try

Code:

help regress

and scroll down to the "vce()" option and click on "vcetype" to learn more
1 like
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 9082
#3

03 Oct 2018, 12:30

The null hypothesis in sktest swilk and sfrancia is normality, and a significant test statistic indicates that you reject the null. You can graph to have an idea before-hand

Code:

qnorm residual2 hist residual2, normal

That said, several influential people argue that testing the normality of the residual is not the same as establishing whether the error term is normally distributed. This renders this exercise not very useful, except for academic purposes.
1 like
Comment
Mark Edgeworth

Join Date: Oct 2018

Posts: 5
#4

03 Oct 2018, 15:35

Thank you for your help both, that was a simple misunderstanding on my part.

I have run the regression using robust standard errors, as suggested, and also log-transformed my outcome variable (its distribution was heavily right skewed). Here are the same results as above but using the new specification, as well as the QQ plot and a histogram of the residuals (as suggested). To me the graphics look pretty good, apart from the last couple of residuals in the top right which veer off. But the skewness and kurtosis says otherwise. What do you think?

Where should I go from here? Most of my RHS variables are binary variables so I don't have a lot of options when it comes to transformations.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 9082
#5

03 Oct 2018, 16:40

Follow the discussion here on StackExchange for more on the point in #3. You have 200+ observations, so just carry on with your regression and concentrate on other regression diagnostics. Forget about the normality of the residuals.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17073
#6

04 Oct 2018, 03:49

Mark:
as an aside to the previous helpful comments, and in the same fashion of Andrew's last reply, I would be more concerned about exploring the relationships between your predictors and the regressand. Are they all linear? Is the data generating process fairly and truly reperesented in your regression model? Just run -estat ovtest- after regress to have an idea about what's going on.

Kind regards,
Carlo
(Stata 18.0 SE)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 33602
#7

04 Oct 2018, 04:27

I agree with all previous discussants (except that, contrary to Andrew, the phrase "academic purposes" is not necessarily pejorative!) .

I find normal quantile plots (sometimes called normal probability plots) enormously more helpful here than histograms with normal density functions superimposed. The latter tend to draw attention to apparent discrepancies in the middle of the distribution, which are often of no consequence, and do not make easy any analysis of discrepancies in the tails, which sometimes can be important and informative. Histograms are also sensitive to changes in bin start and bin width and Stata's default isn't optimized on your behalf.

Formal tests of residuals for normality here divide statistical people into two camps with no doubt some people wandering around confused in between. One camp people (no names, no fields for the sake of discretion) teaches that everything has to pass a significance test before you can possibly make a decision or an inference regarding it as established. The other camp. including myself FWIW, tend to sit loose and regard tests as providing some guidance but much less guidance than graphs provide. The biggest deal here, which isn't desperately controversial, is that if your sample size is large enough unimportant deviations from normality will be declared as significant at conventional levels, while with small samples failure to reject the null may just arise because you don't have enough data. These two cases don''t exhaust the logical possibilities, but they make significance tests problematic unless other evidence is also considered.

Here you have a hint from the normal quantile plot that two observations have slightly high positive residuals and it would always be worth going back to the data to examine which they are. They might turn out to be two really big countries or companies or two years that you know are unusual. As you don't name your variables or give example data, these are just indications.

If you have one predictor, then the main thing is to show us is a plot of the data with regression line superimposed. If you have several predictors, then show us added variable plots. A plot of residuals versus fitted is also often helpful.
1 like
Comment

Announcement

Contradictory results of tests for conditional normality of residuals in OLS

Comment

Comment

Comment

Comment

Comment

Comment