No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Non normal distribution solution


    I am currently running a research on the impact of the great recession on the suicide rate across 51 states in U.S. (2000-2010 data set)
    I have a balanced panel with 561 observations.
    I checked for the normality of my distribution by running a skewness-kurtosis test on the residual of my regression
    My results were as followed

    Skewness/Kurtosis tests for Normality
    ------- joint ------
    Variable | Obs Pr(Skewness) Pr(Kurtosis) adj chi2(2) Prob>chi2
    e | 561 0.0000 0.0000 28.54 0.0000

    I adjusted my dependent variable (suicide rate) to log suicide rate and ran another test on it
    Skewness/Kurtosis tests for Normality
    ------- joint ------
    Variable | Obs Pr(Skewness) Pr(Kurtosis) adj chi2(2) Prob>chi2
    e | 561 0.0029 0.0006 17.89 0.0001

    It both cases I reject my H0 of normality.
    I was wondering if my results will be greatly impacted by this and it will be helpful if anyone could suggest a solution to this.

    Cheers !

  • #2
    You don't tell us what model you are using and in what sense it respects or reflects the panel structure.

    Perhaps the major point is that the skewness-kurtosis test is fairly useless for telling you whether either regression is poorly specified. Conditional normality is usually the least important assumption behind a regression model. A very low P-value may just mean that the sample size is large enough to find a little non-normal structure.

    You might be better off using diagnostic plots to check for structure in the residuals that might be incorporated in a revised model.


    • #3
      Hi Nick,

      Thank you for the reply.
      I am using a fixed effect panel data model for my research.
      However I don't quite understand what you meant by in what sense it reflects the panel structure.

      So since the p-value is really low, does that mean it has a low impact on the accuracy of the result that I obtained for this research?



      • #4
        You are fitting a panel model to panel data; that answers my first point, although a little imprecisely.

        Your new question underlines why this kind of normality test is not much use, or to put it a little more positively is of secondary utility.

        You seem to expect it to do something it can't possibly do, namely judge the "accuracy" of your model fit. I don't know exactly what you mean by that, but an examination using skewness and kurtosis of the approximation to normality of the residuals is no more than it says. A non-normal distribution of the residuals could be consistent with excellent model fit, and a normal distribution could be consistent with a poor model fit. It's presumably true that whatever commands you used -- it's still the case that you aren't telling us what they are, but I guess at xtreg -- in principle should work best with normal disturbances, but non-normal disturbances could pull the model towards them, so you wouldn't necessary detect them easily through this test.

        If there's a villain here it's mediocre textbook literature which places too much emphasis on itty-bitty formal tests and avoids focus on the easy to ask, difficult to answer question of whether the model is qualitatively right for a dataset given ideas about the real generating process.

        Better diagnostics are likely to include plots of

        1. Observed and predicted

        2. Residual and predicted

        3. Residuals and time.

        By the way, a very low P-value flags that the null hypothesis is wrong, not that there is no problem with the model.


        • #5
          Hi everyone,
          I am working on Stata 16.1, and I have a question about the nature of my graph of residuals. I first estimated the regression and then predicted the residual graph. Now when I estimate the kernel density using:

          kdensity r, resid

          Click image for larger version

Name:	normality-graph.png
Views:	1
Size:	54.9 KB
ID:	1549221

          My graph looks like the above picture. This is the first time I have come across such graph with my data analysis. Could you please suggest me what could be a reason that the kernel density estimation looked like this. I tried googling as well as looking in FAQs, and I came across this post. So, I was hoping someone could help me understand this.

          Thank you!



          • #6
            #5 could mean anything from "no problem" to "some bimodality in your residuals may hint at structure your model is not quite catching".

            I like density estimation a lot but -- I think consistently with that -- it seems oversold. Often it is hard to relate back to the data without a great deal of other information.

            Never believe apparent structure in a density estimate unless and until you have

            * found it to be stable over a range of bandwidths and even different kernels too (some very large fraction of Stata users just seem to use the default)

            * checked with a different kind of plot (I would usually prefer qnorm on the residuals)

            * ideally have a story for why the structure exists

            Normal distribution of errors is the least important assumption (read: ideal condition) in regression any way!


            • #7
              Hi Nick,

              Thank you so much for these helpful suggestions. I took your advice and plotted the qnorm on the residuals, and obtained the attached graph. You pointed out that one of the reasons for the bimodal structure could be that my model is not capturing the structure of my data. Could it be that because my dependent variables are censored? I am using dependent variables that are censored at 0 and then at upper limit 70 again. So, maybe if I use Tobit analysis, this bimodality could potentially be captured? I hope I am making sense.

              Attached Files


              • #8
                The bimodality to me looks genuine but minor.