Help identifying normal distribution

Oliver Adamson

Join Date: Apr 2022

Posts: 14
#1

Help identifying normal distribution

18 Aug 2022, 04:34

I am wanting to run an OLS regression on my panel dataset. Post-regression I have the histogram and Q-Q plots of the residuals, appearing to show a slightly skewed distribution, which would violate OLS assumptions. However, I have a very large sample size (~8000) and I'm puzzled by how it would be possible to get a perfect normal distribution. Would this be okay to use as OLS or not?
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 36054
#2

18 Aug 2022, 04:57

Normal error distribution is just about the least important “assumption” (meaning, ideal condition) for regression. There may be ways of getting an even better distribution but this plot does not lead to suggestions about what they are.
2 likes
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17851
#3

18 Aug 2022, 06:12

Oliver:
an off-topic question may investigate why going OLS (-regress-?) as a first line approach if you're dealing with a panel dataset (wouldn't -xtreg-the first command to think of?).

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Oliver Adamson

Join Date: Apr 2022

Posts: 14
#4

18 Aug 2022, 08:29

Originally posted by Carlo Lazzaro View Post

Oliver:
an off-topic question may investigate why going OLS (-regress-?) as a first line approach if you're dealing with a panel dataset (wouldn't -xtreg-the first command to think of?).

In all honesty, my supervisor for this dissertation recommended I use OLS. However I have just tried your approach but after trying to conduct a Hausman test for random vs fixed effects I have this error:

"e(b) not found in fixed"

Do you know what might cause this?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17851
#5

18 Aug 2022, 08:40

Oliver:
the only reason that springs to my mind about preferring (pooled) OLS to -xtreg- is the lack of evidence of a panel-wise effect.
That said, as far as your question is concerned, you may want to take a look at https://www.stata.com/statalist/arch.../msg01274.html.

Kind regards,
Carlo
(Stata 19.0)
Comment
Oliver Adamson

Join Date: Apr 2022

Posts: 14
#6

18 Aug 2022, 08:59

Originally posted by Carlo Lazzaro View Post

Oliver:
the only reason that springs to my mind about preferring (pooled) OLS to -xtreg- is the lack of evidence of a panel-wise effect.
That said, as far as your question is concerned, you may want to take a look at https://www.stata.com/statalist/arch.../msg01274.html.

Ok I believe I have worked it out and concluded I need the random effects model. Is there any assumption to this model that requires normal distribution or can I stop worrying?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17851
#7

18 Aug 2022, 09:30

Oliver:
if a researcher were obsessed with normality (by the way, normality, theoretically speaking, affects also the u component of the composed panel error under -re- specification ) she/he would be better off changing her/his job.
That said, you can exploit the community-contributed module -xtoverid- (beware that, being a bit old-fashioned, it does not support -fvvarlist- notation; see -xi:- prefix as a possible workaround). In brief, the null of -xtoverid- is that -re- is the way to go.
Another option is the Mundlak approach (https://blog.stata.com/2015/10/29/fi...dlak-approach/).

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Oliver Adamson

Join Date: Apr 2022

Posts: 14
#8

18 Aug 2022, 10:24

Originally posted by Carlo Lazzaro View Post

Oliver:
if a researcher were obsessed with normality (by the way, normality, theoretically speaking, affects also the u component of the composed panel error under -re- specification ) she/he would be better off changing her/his job.
That said, you can exploit the community-contributed module -xtoverid- (beware that, being a bit old-fashioned, it does not support -fvvarlist- notation; see -xi:- prefix as a possible workaround). In brief, the null of -xtoverid- is that -re- is the way to go.
Another option is the Mundlak approach (https://blog.stata.com/2015/10/29/fi...dlak-approach/).

Ok thank you Carlo! One more thing, I am using multiple imputation (you will have seen my post on this recently) and so I have done -xtreg ..., re- within the -mi estimate- command. The results table does not show me the between, within or the overall R-squared values. How do I find these?
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#9

18 Aug 2022, 10:27

I think here is a good place to comment that (I forget who said this) "In this business, there are no standard solutions, only standard problems". If I worried about all the assumptions of every model I've ran perfectly holding, I wouldn't run them at all.

I'll also plug what I usually do here, in saying that the choice of estimator (that's all OLS/xtreg-OLS are anyways, estimators) is much less important than the design of your paper. I don't know what you're studying naturally, but there are circumstances where using OLS or logit, or OLS and a negative binomial may be defensible, but in my opinion none of this really matters if the design of the paper (assuming causality is the goal) is deficient.
2 likes
Comment
Oliver Adamson

Join Date: Apr 2022

Posts: 14
#10

18 Aug 2022, 10:37

Originally posted by Jared Greathouse View Post

I think here is a good place to comment that (I forget who said this) "In this business, there are no standard solutions, only standard problems". If I worried about all the assumptions of every model I've ran perfectly holding, I wouldn't run them at all.

I'll also plug what I usually do here, in saying that the choice of estimator (that's all OLS/xtreg-OLS are anyways, estimators) is much less important than the design of your paper. I don't know what you're studying naturally, but there are circumstances where using OLS or logit, or OLS and a negative binomial may be defensible, but in my opinion none of this really matters if the design of the paper (assuming causality is the goal) is deficient.

Hi Jared, the goal of my paper is to analyse the effect of educational factors on income inequality (Gini index) on post-Communist nations. I am focussing on Eastern Europe and the Baltics, giving a total of 22 countries in the period of 1999 - 2020.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17851
#11

18 Aug 2022, 11:42

Oliver:
under -mi-, -xtreg- does not return the tipycal panel R-sqs, because -mi- follows different metrics.

Kind regards,
Carlo
(Stata 19.0)
Comment
Oliver Adamson

Join Date: Apr 2022

Posts: 14
#12

18 Aug 2022, 12:15

Originally posted by Carlo Lazzaro View Post

Oliver:
under -mi-, -xtreg- does not return the tipycal panel R-sqs, because -mi- follows different metrics.

What alternative measure is there to assess explanatory power under -mi-?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17851
#13

19 Aug 2022, 01:43

Oliver:
none that I know.

Kind regards,
Carlo
(Stata 19.0)
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4546
#14

19 Aug 2022, 03:31

you might want to take a look at the example code in #6 at https://www.statalist.org/forums/for...ng-mi-estimate
1 like
Comment
Oliver Adamson

Join Date: Apr 2022

Posts: 14
#15

21 Aug 2022, 05:18

Originally posted by Rich Goldstein View Post

you might want to take a look at the example code in #6 at https://www.statalist.org/forums/for...ng-mi-estimate

Thank you Rich!
Comment

Announcement

Help identifying normal distribution

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment