Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Does regress command assume normality of errors?

    Dear list,

    when I estimate a simple linear regression model, the result is the same with either "regress" or "ml model", the latter assuming normality. I know that OLS with normal error is exactly the same as ML. However, I have "never told" Stata to assume normal errors in "regress". According to methods and formulas, "regress" uses the normal equations to calculate the betas, as expected. Then, why are the two results exactly equivalent? Where does normality come into play?

    Thanks.

  • #2
    Alan:
    as far as I know, -regress- does not assume normality of errors.
    You can check whether error distribution is omoskedastic or not graphically via -rvfplot- or analitically via -estat hettest-
    Kind regards,
    Carlo
    (Stata 18.0 SE)

    Comment


    • #3
      Thank you Carlo. So then the issue is just that ML with normality reproduces OLS beta formulas, but not that OLS assumes normality per se. And then a real test for normality comes from OLS (which does not assume it), instead of a test on MLE residuals?

      Comment


      • #4
        -regress- can be applied to any data set. The estimation of the coefficients and the standard errors is accomplished using matrix algebra and makes no reference to any distribution. Now, if you want the p-values that are calculated from the t-statistics (coefficients divided bystandard errors) to accurately reflect the sampling distribution of the t-statistic, then the assumption that the residual distribution is normal will be a sufficient condition for that.

        When you do a ML estimation, the normal distribution is baked into the likelihood function. However, in terms of finding the parameter estimates that maximize that particular likelihood, ML is just doing things the hard way. It is a longer, slower, more approximate way of solving the normal equations! That's because the argument values that maximize the normal-based likelihood are, in fact, the same as the ones gotten by solving the normal equations. If you have the time and like algebra, you can actually work that through by applying some calculus to the normal-based likelihood and solving. So the two approaches will give the same results, except for perhaps very minor numerical errors.

        All of that said, I want to emphasize that the assumption that the residual distribution is normal is a sufficient, but not necessary condition for the p-values to accurately reflect the sampling distribution of the calculated t-statistics. The inferences made based on the assumption of normality are actually pretty robust to violations of the residual normality assumption. If the residual distribution is symmetric, or at least not highly skewed, and if the sample isn't too tiny, the sampling distribution of b/se (as calculated from the normal equations) will be very close to a t-distribution with the usual numbers of df anyway, close enough for nearly all practical purposes. Similarly if the predictors are all dichotomous, all you need as a reasonably large sample and the law of large numbers will come to the rescue in this regard. That's why on this forum you will find that people usually advise against testing for normality using any of the common statistical tests. Those tests are sensitive to minor departures from normality that are of no importance with regard to the robustness of inference from normal-theory regression. (Nick Cox often uses a compelling simile: statistical tests for normality are equivalent to sending a rowboat out to see to determine if the waters are calm enough for the Queen Elizabeth to safely set sail.)

        Comment


        • #5
          Thank you Clyde, that is very clear. I will try the math at some point. My sample is of around 2200 observations, for just two dichotomous regressors, so it seems I'm safe. Just out of curiosity, why is a skewed residual distribution problematic? In principle, you could have some Chi2 or Beta or exponential, although I guess in that case you don't use rgress but something like streg.

          Comment


          • #6
            Alan, it's problematic in small samples for inference. Asymptotically you can approximate a sampling distribution of OLS estimators with a normal distribution. So in large samples like the one you have it should not be a problem.
            Alfonso Sanchez-Penalver

            Comment


            • #7
              Just out of curiosity, why is a skewed residual distribution problematic? In principle, you could have some Chi2 or Beta or exponential, although I guess in that case you don't use rgress but something like streg.
              Well, in order for the sampling distribution of the calculated t-statistic to be close to a t-distribution, if the residual distribution is not normal, something else has to "rescue" the distributions of b and se. With dichotomous predictors, as you have, it is the law of large numbers that does that. b and se are, after all, calculated from sums and sums of squares of the variables in the regression. The law of large numbers will, if your sample is large enough, cause those to have approximately normal distributions (and chi square distributions for the quadratic forms will follow from that). But if the underlying residual distribution is highly skew, then the "large enough" sample will have to be truly gargantuan, whereas the distribution of the sampling mean from a symmetrical distribution becomes very close to normal with much more modest N's.

              And, as you said, when your sample really defies these effects, there are other models available. -streg- is one if your outcome variable is always non-negative. There are also generalized linear models, or models using transformations of some of the variables, etc. etc.

              Comment

              Working...
              X