Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help identifying normal distribution

    I am wanting to run an OLS regression on my panel dataset. Post-regression I have the histogram and Q-Q plots of the residuals, appearing to show a slightly skewed distribution, which would violate OLS assumptions. However, I have a very large sample size (~8000) and I'm puzzled by how it would be possible to get a perfect normal distribution. Would this be okay to use as OLS or not?

    Click image for larger version

Name:	hist.png
Views:	1
Size:	203.7 KB
ID:	1678241
    Click image for larger version

Name:	qq.png
Views:	1
Size:	138.3 KB
ID:	1678242

  • #2
    Normal error distribution is just about the least important “assumption” (meaning, ideal condition) for regression. There may be ways of getting an even better distribution but this plot does not lead to suggestions about what they are.

    Comment


    • #3
      Oliver:
      an off-topic question may investigate why going OLS (-regress-?) as a first line approach if you're dealing with a panel dataset (wouldn't -xtreg-the first command to think of?).
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Originally posted by Carlo Lazzaro View Post
        Oliver:
        an off-topic question may investigate why going OLS (-regress-?) as a first line approach if you're dealing with a panel dataset (wouldn't -xtreg-the first command to think of?).
        In all honesty, my supervisor for this dissertation recommended I use OLS. However I have just tried your approach but after trying to conduct a Hausman test for random vs fixed effects I have this error:

        "e(b) not found in fixed"

        Do you know what might cause this?

        Comment


        • #5
          Oliver:
          the only reason that springs to my mind about preferring (pooled) OLS to -xtreg- is the lack of evidence of a panel-wise effect.
          That said, as far as your question is concerned, you may want to take a look at https://www.stata.com/statalist/arch.../msg01274.html.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Originally posted by Carlo Lazzaro View Post
            Oliver:
            the only reason that springs to my mind about preferring (pooled) OLS to -xtreg- is the lack of evidence of a panel-wise effect.
            That said, as far as your question is concerned, you may want to take a look at https://www.stata.com/statalist/arch.../msg01274.html.
            Ok I believe I have worked it out and concluded I need the random effects model. Is there any assumption to this model that requires normal distribution or can I stop worrying?

            Comment


            • #7
              Oliver:
              if a researcher were obsessed with normality (by the way, normality, theoretically speaking, affects also the u component of the composed panel error under -re- specification ) she/he would be better off changing her/his job.
              That said, you can exploit the community-contributed module -xtoverid- (beware that, being a bit old-fashioned, it does not support -fvvarlist- notation; see -xi:- prefix as a possible workaround). In brief, the null of -xtoverid- is that -re- is the way to go.
              Another option is the Mundlak approach (https://blog.stata.com/2015/10/29/fi...dlak-approach/).
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                Originally posted by Carlo Lazzaro View Post
                Oliver:
                if a researcher were obsessed with normality (by the way, normality, theoretically speaking, affects also the u component of the composed panel error under -re- specification ) she/he would be better off changing her/his job.
                That said, you can exploit the community-contributed module -xtoverid- (beware that, being a bit old-fashioned, it does not support -fvvarlist- notation; see -xi:- prefix as a possible workaround). In brief, the null of -xtoverid- is that -re- is the way to go.
                Another option is the Mundlak approach (https://blog.stata.com/2015/10/29/fi...dlak-approach/).
                Ok thank you Carlo! One more thing, I am using multiple imputation (you will have seen my post on this recently) and so I have done -xtreg ..., re- within the -mi estimate- command. The results table does not show me the between, within or the overall R-squared values. How do I find these?

                Comment


                • #9
                  I think here is a good place to comment that (I forget who said this) "In this business, there are no standard solutions, only standard problems". If I worried about all the assumptions of every model I've ran perfectly holding, I wouldn't run them at all.


                  I'll also plug what I usually do here, in saying that the choice of estimator (that's all OLS/xtreg-OLS are anyways, estimators) is much less important than the design of your paper. I don't know what you're studying naturally, but there are circumstances where using OLS or logit, or OLS and a negative binomial may be defensible, but in my opinion none of this really matters if the design of the paper (assuming causality is the goal) is deficient.

                  Comment


                  • #10
                    Originally posted by Jared Greathouse View Post
                    I think here is a good place to comment that (I forget who said this) "In this business, there are no standard solutions, only standard problems". If I worried about all the assumptions of every model I've ran perfectly holding, I wouldn't run them at all.


                    I'll also plug what I usually do here, in saying that the choice of estimator (that's all OLS/xtreg-OLS are anyways, estimators) is much less important than the design of your paper. I don't know what you're studying naturally, but there are circumstances where using OLS or logit, or OLS and a negative binomial may be defensible, but in my opinion none of this really matters if the design of the paper (assuming causality is the goal) is deficient.
                    Hi Jared, the goal of my paper is to analyse the effect of educational factors on income inequality (Gini index) on post-Communist nations. I am focussing on Eastern Europe and the Baltics, giving a total of 22 countries in the period of 1999 - 2020.

                    Comment


                    • #11
                      Oliver:
                      under -mi-, -xtreg- does not return the tipycal panel R-sqs, because -mi- follows different metrics.
                      Kind regards,
                      Carlo
                      (Stata 19.0)

                      Comment


                      • #12
                        Originally posted by Carlo Lazzaro View Post
                        Oliver:
                        under -mi-, -xtreg- does not return the tipycal panel R-sqs, because -mi- follows different metrics.
                        What alternative measure is there to assess explanatory power under -mi-?

                        Comment


                        • #13
                          Oliver:
                          none that I know.
                          Kind regards,
                          Carlo
                          (Stata 19.0)

                          Comment


                          • #14
                            you might want to take a look at the example code in #6 at https://www.statalist.org/forums/for...ng-mi-estimate

                            Comment


                            • #15
                              Originally posted by Rich Goldstein View Post
                              you might want to take a look at the example code in #6 at https://www.statalist.org/forums/for...ng-mi-estimate
                              Thank you Rich!

                              Comment

                              Working...
                              X