Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Contradictory results of tests for conditional normality of residuals in OLS

    Hello,

    I am performing OLS and I have conducted two tests of the assumption of conditional normality of residuals.

    For the first tested I simply used the summarize command. My understanding is that skewness or kurtosis values <0.05 implies deviation from normality, so these results seem solid.
    Click image for larger version

Name:	Summarize.png
Views:	1
Size:	10.4 KB
ID:	1464493


    However I then performed a skewness/kurtosis test and got a p-value of 0.0000. My understanding is that this implies deviation from normality:
    Click image for larger version

Name:	Sktest.png
Views:	1
Size:	4.6 KB
ID:	1464494


    How should I think about these results?
    Last edited by Mark Edgeworth; 03 Oct 2018, 12:02.

  • #2
    you misunderstand about skewness and kurtosis; if data are normally distributed, then skewness is 0 and kurtosis is 3 - your results imply a lack of normality

    more important is that this is generally something that can be handled either in the modeling or sometimes by using robust estimates of SE's; try
    Code:
    help regress
    and scroll down to the "vce()" option and click on "vcetype" to learn more

    Comment


    • #3
      The null hypothesis in sktest swilk and sfrancia is normality, and a significant test statistic indicates that you reject the null. You can graph to have an idea before-hand

      Code:
      qnorm residual2
      hist residual2, normal
      That said, several influential people argue that testing the normality of the residual is not the same as establishing whether the error term is normally distributed. This renders this exercise not very useful, except for academic purposes.

      Comment


      • #4
        Thank you for your help both, that was a simple misunderstanding on my part.

        I have run the regression using robust standard errors, as suggested, and also log-transformed my outcome variable (its distribution was heavily right skewed). Here are the same results as above but using the new specification, as well as the QQ plot and a histogram of the residuals (as suggested). To me the graphics look pretty good, apart from the last couple of residuals in the top right which veer off. But the skewness and kurtosis says otherwise. What do you think?

        Where should I go from here? Most of my RHS variables are binary variables so I don't have a lot of options when it comes to transformations.

        Click image for larger version

Name:	SK2.png
Views:	1
Size:	10.0 KB
ID:	1464527


        Click image for larger version

Name:	Sk3.png
Views:	1
Size:	4.9 KB
ID:	1464528


        Click image for larger version

Name:	BSC93  conditional normality of residuals second visual test.png
Views:	1
Size:	23.5 KB
ID:	1464525


        Click image for larger version

Name:	BSC93 residuals histogram.png
Views:	1
Size:	25.1 KB
ID:	1464526

        Comment


        • #5
          Follow the discussion here on StackExchange for more on the point in #3. You have 200+ observations, so just carry on with your regression and concentrate on other regression diagnostics. Forget about the normality of the residuals.

          Comment


          • #6
            Mark:
            as an aside to the previous helpful comments, and in the same fashion of Andrew's last reply, I would be more concerned about exploring the relationships between your predictors and the regressand. Are they all linear? Is the data generating process fairly and truly reperesented in your regression model? Just run -estat ovtest- after regress to have an idea about what's going on.
            Kind regards,
            Carlo
            (Stata 18.0 SE)

            Comment


            • #7
              I agree with all previous discussants (except that, contrary to Andrew, the phrase "academic purposes" is not necessarily pejorative!) .

              I find normal quantile plots (sometimes called normal probability plots) enormously more helpful here than histograms with normal density functions superimposed. The latter tend to draw attention to apparent discrepancies in the middle of the distribution, which are often of no consequence, and do not make easy any analysis of discrepancies in the tails, which sometimes can be important and informative. Histograms are also sensitive to changes in bin start and bin width and Stata's default isn't optimized on your behalf.

              Formal tests of residuals for normality here divide statistical people into two camps with no doubt some people wandering around confused in between. One camp people (no names, no fields for the sake of discretion) teaches that everything has to pass a significance test before you can possibly make a decision or an inference regarding it as established. The other camp. including myself FWIW, tend to sit loose and regard tests as providing some guidance but much less guidance than graphs provide. The biggest deal here, which isn't desperately controversial, is that if your sample size is large enough unimportant deviations from normality will be declared as significant at conventional levels, while with small samples failure to reject the null may just arise because you don't have enough data. These two cases don''t exhaust the logical possibilities, but they make significance tests problematic unless other evidence is also considered.

              Here you have a hint from the normal quantile plot that two observations have slightly high positive residuals and it would always be worth going back to the data to examine which they are. They might turn out to be two really big countries or companies or two years that you know are unusual. As you don't name your variables or give example data, these are just indications.

              If you have one predictor, then the main thing is to show us is a plot of the data with regression line superimposed. If you have several predictors, then show us added variable plots. A plot of residuals versus fitted is also often helpful.

              Comment

              Working...
              X