Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Residuals of OLS and 2SLS not normally distributed

    Hi

    I am running an OLS and a 2SLS regression with a continuous dependent variable, BMI. My independent variables contain a few categorical variables. Upon inspection, my residuals are not normally distributed. Should I be worried or this is expected?

    Kind Regards
    Nonsi Nkomo

  • #2
    Do not worry; the residuals do not have to be normal.

    Joao

    Comment


    • #3
      Hello Nonsi. A while ago, I cobbled together some slides summarizing what Wooldridge (2005) says about the assumptions for OLS linear regression. Perhaps you'll find them helpful. Cheers,
      Bruce
      --
      Bruce Weaver
      Email: [email protected]
      Version: Stata/MP 18.5 (Windows)

      Comment


      • #4
        Thank you Bruce for the link.

        Joao - Relief!! is it feasible to justify this non-normality of residuals by saying that most independent variables are not normally distributed?

        Kind Regards
        Nonsi Nkomo

        Comment


        • #5
          Dear Nonsi,

          You do not really have to justify it, but indeed very few variables (if any) are normally distributed.

          Best wishes,

          Joao

          Comment


          • #6
            Joao, you appear to be saying that the errors do not have to be (approximately) normal under any circumstances, and you make no mention of sample size, or what the purpose of the model is. Therefore, I am curious to hear what you think about Wooldridge's MLR.6, which states that the population error is assumed to be independent of the explanatory variables and normally distributed with a mean of 0 and variance = sigma2.

            A bit further on, he says that normality of the errors translates to normal sampling distributions of the OLS estimators. And later still, he says that the sampling distributions of the OLS estimators will still be approximately normal in large samples, even if the errors are not normally distributed. I conclude from this that normality of the errors is a sufficient condition, but not a necessary condition for normality of the sampling distributions of the OLS estimators. And it is the latter that really matters for testing and computing confidence intervals.

            Thanks for clarifying what you mean when you say that the errors do not have to be normal, including the role of sample size, purpose of the model, etc.

            Cheers,
            Bruce

            p.s. - Note that I have used the word errors rather than residuals. I think it's important to make that distinction when talking about the assumptions for OLS models. (This Wikipedia page has a nice explanation of the difference, for anyone who is unsure.)
            --
            Bruce Weaver
            Email: [email protected]
            Version: Stata/MP 18.5 (Windows)

            Comment


            • #7
              Dear Bruce,

              Thanks for giving me the opportunity to clarify. The original question mentioned residuals from 2SLS, which is valid only asymptotically, so I assumed that the sample is large enough for Nonsi not to worry about normality.

              More generally, of course I agree with what is in Wooldridge's book (one of the very best books around) but I really do not believe in such thing as normal errors. So:

              1) Many of the techniques we use (2SLS, robust standard errors, FGLS, ML, etc) are only valid asymptotically and in that case normality is not needed; it is always a good idea to work with large samples :-)

              2) If we cannot get a large sample, I would be more worried about the lack of power because all we need for t-tests and F-tests to be reliable is that the central limit theorem works reasonably well, and this can happen even with small samples as long as the errors are reasonably behaved.

              3) If we were to require the errors to be really normal the range of models we can estimate would be really, really small.

              4) The most popular normality tests are only valid asymptotically (ironically, when normality is not needed) and so I would not trust their results in small samples anyway.

              5) I see the normality assumption as a pedagogically convenient device to introduce inference, but when I teach this (in economics) I always make the point that it is not credible and drop it as soon as possible (typically in the following session). In fact, there are so many misconceptions about it that I wonder whether we should mention it at all.

              Of course, my views on this are shaped (biased!) by my experience of working with economic data and there may be other fields where things are very different; I would be interested to hear about those.

              Best wishes and thanks again,

              Joao

              Comment


              • #8
                Thank you for clarifying, Joao. I'll just note that in #1, Nonsi said (emphasis added), "I am running an OLS and a 2SLS regression with a continuous dependent variable...". ;-)

                Cheers,
                Bruce
                --
                Bruce Weaver
                Email: [email protected]
                Version: Stata/MP 18.5 (Windows)

                Comment


                • #9
                  Thank you all for the help, and my apologies for the really late reply. I will take everything that was posed into consideration.

                  Kind Regards
                  Nonsi Nkomo

                  Comment

                  Working...
                  X