Multiple linear regression - residuals not normal?

Jean Hadji

Join Date: Nov 2018

Posts: 11
#1

Multiple linear regression - residuals not normal?

10 Sep 2020, 07:10

Hi everyone,

I'm running a multiple linear regression with 170 cases. To check for normality of residuals, I use the following commands.

Code:

predict x, resid histogram x, kdensity normal qnorm x pnorm x

Which yield the following plots. Can this still be considered as acceptable (qnorm and pnorm looks ok to me, while histogram shows skewness and an outlier)?

If not, can I perform any internal robustness checks like transforming the DV to see if the results from this regression are "valid"?

Last edited by Jean Hadji; 10 Sep 2020, 07:27.
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#2

10 Sep 2020, 07:27

Jean:
what does -estat hettest- give you back?

Kind regards,
Carlo
(Stata 19.0)
Comment
Jean Hadji

Join Date: Nov 2018

Posts: 11
#3

10 Sep 2020, 07:53

Chi-square is small and not significant. Thus, heteroskedasticity does not appear to be a problem.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2174
#4

10 Sep 2020, 07:55

How large is your sample size?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#5

10 Sep 2020, 08:06

Jeff:
the sample size seems to be 170 observations, as per #1.

Kind regards,
Carlo
(Stata 19.0)
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2174
#6

10 Sep 2020, 08:26

Thanks Carlo. I read past that.

This is always a Catch-22. Is n = 170 large enough to invoke the central limit theorem? If so, then normality is not an issue. If not, then how can we justify using tests for normality that are based on asymptotic analysis?

I’d ignore nonnormality or, if y > 0, try using log(y).
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#7

10 Sep 2020, 08:40

Jeff quoted Catch-22 (https://en.wikipedia.org/wiki/Catch-22_(logic)), a paradox that seems ubiquitous in the labour market when we read on vacation notices: "Those applying for their very first job should be at least 2-year experienced".

Kind regards,
Carlo
(Stata 19.0)
Comment
Jean Hadji

Join Date: Nov 2018

Posts: 11
#8

11 Sep 2020, 04:52

Thank you very much for your answers!

Since my DV (>0) has a negative skew, I used the following transformation: log10(K-DV), where K is the max value of my DV +1. The plots shown above look better when using the transformed DV.

Since the regression results (size/direction of standardized coeffcients and sig-levels) when using the transformed DV are comparable to the results using the untransformed DV, is it acceptable to simply report the regression using the untransformed DV? I'm asking because transforming the DV makes the interpretation of the results difficult and it's also quite uncommon in my field.

Well, I guess it's acceptable since Jeff already mentioned to ignore non-normality and rely on the CTT.

Last edited by Jean Hadji; 11 Sep 2020, 05:35.
Comment

Announcement

Multiple linear regression - residuals not normal?

Comment

Comment

Comment

Comment

Comment

Comment

Comment