Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • assumption of normality

    Hello,
    I would like to know if by using command xtreg in panel data I need assumption of normality of the sample.
    Thank you
    Erasmo

  • #2
    First let's be clear what assumption of normality we're talking about here. People are often under the impression that the predictor variables or the outcome variable need to have a normal distribution--this is not even remotely true. There is, and never has been, any such restriction, on any kind of regression (panel or otherwise). Where normality sometimes comes into play is that when the residuals of the regression are normally distributed, it can be easily proved that the coefficients divided by their standard errors have t-distributions, and you can do hypothesis testing based on those t-statistics.

    So normality of residuals is a sufficient condition for correct inference based on t- or z- statistics. It is not, however, always necessary. If the sample size is large, then it can be shown using the central limit theorem that the sampling distributions of the coefficients and their standard errors are (asymptotically) normal and chi square (respectively) anyway, so that the t-/z- statistics are again correct.

    So normality of residuals is only a concern in small samples. Even there, if the residual distribution is not too far from normal (especially if it is symmetric) then the t- and z- statistics' sampling distributions are reasonably well approximated by the corresponding t- and z- distributions so that hypothesis testing using them will have nearly the nominal Type I error rates.

    Finally, I want to emphasize that even when all of those "rescues" fail and non-normality is a problem (i.e. small sample with a nasty residual distribution), it only affects the validity of p-values. Even in this worst case scenario, it remains true that the estimated coefficients are unbiased (ordinary least squares regression) or consistent (fixed-effects panel regression) estimators of the population-level coefficients, and the standard errors are good estimates of the standard deviation of the sampling distribution. So if p-values are not important to answering your research question, you don't have to think about normality in any circumstance.


    Comment


    • #3
      Thank you!

      Comment


      • #4
        Hello everybody,
        i have a small sample 28 companies studied on 6 years. I used panel data (pooled ols model ). When I calculated mean and standard deviation for my varaibles, I found that the value of standard deviation is superior than mean's value for the majority of the variables. That means that the distribution is not normal?
        when I read Clyde's answer, I understood that it's not important .. Is it not important for my model ??
        Thank you

        Comment


        • #5
          It depends! After all, it is common to refer to a normal distribution with mean 0 and SD 1, so it is perfectly possible to have SD > mean together with a normal distribution.

          However, if your variable is such that only positive values are possible, or only zero and positive, then SD > mean is not consistent with a normal distribution. But you don't need to rely on this as a criterion; just use (e.g.) qnorm to check.

          I don't think I can improve on Clyde's answer at all except to add a very short executive summary: No. None of what Clyde says contradicts the idea that with very skewed distributions another model and/or estimation method might be preferable for your data.
          Last edited by Nick Cox; 04 Oct 2018, 07:09.

          Comment


          • #6
            Am I obliged to test for normality ? I found that Shapiro-wik test is the best for small samples ?

            Many thanks ..

            Comment


            • #7
              You're obliged to do that if your teacher, supervisor, examiner, or paper reviewer or referee insists on it. Otherwise, #2 implies clearly that there is no need to check for a condition that's not an assumption of regression. (My personal view is that a great deal would be clearer if people talked about "ideal conditions" rather than "assumptions".)

              It's always a good idea to have an idea of how your variables are distributed. however, as a matter of general statistical prudence.

              If your textbook or teacher tells you that marginal normality is a requirement for regression, you need a better textbook (or teacher).

              Comment


              • #8
                Thank you very much. So now I can rely on my results and present means and standard deviations without a problem

                Comment


                • #9
                  I didn't say that at all! Here it is in black and white from #5

                  another model and/or estimation method might be preferable for your data
                  Statalist is not the Oracle. All you have told us is

                  a small sample 28 companies studied on 6 years. I used panel data (pooled ols model)
                  and without seeing any data or results we cannot comment on whether your means and SDs are reliable. They will be accurately calculated for the data you present, but that's all that can be promised.

                  Comment

                  Working...
                  X