Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple linear regression - residuals not normal?

    Hi everyone,

    I'm running a multiple linear regression with 170 cases. To check for normality of residuals, I use the following commands.

    Code:
    predict x, resid
    histogram x, kdensity normal
    qnorm x
    pnorm x

    Which yield the following plots. Can this still be considered as acceptable (qnorm and pnorm looks ok to me, while histogram shows skewness and an outlier)?

    If not, can I perform any internal robustness checks like transforming the DV to see if the results from this regression are "valid"?



    Last edited by Jean Hadji; 10 Sep 2020, 07:27.

  • #2
    Jean:
    what does -estat hettest- give you back?
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Chi-square is small and not significant. Thus, heteroskedasticity does not appear to be a problem.

      Comment


      • #4
        How large is your sample size?

        Comment


        • #5
          Jeff:
          the sample size seems to be 170 observations, as per #1.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Thanks Carlo. I read past that.

            This is always a Catch-22. Is n = 170 large enough to invoke the central limit theorem? If so, then normality is not an issue. If not, then how can we justify using tests for normality that are based on asymptotic analysis?

            I’d ignore nonnormality or, if y > 0, try using log(y).

            Comment


            • #7
              Jeff quoted Catch-22 (https://en.wikipedia.org/wiki/Catch-22_(logic)), a paradox that seems ubiquitous in the labour market when we read on vacation notices: "Those applying for their very first job should be at least 2-year experienced".
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                Thank you very much for your answers!

                Since my DV (>0) has a negative skew, I used the following transformation: log10(K-DV), where K is the max value of my DV +1. The plots shown above look better when using the transformed DV.

                Since the regression results (size/direction of standardized coeffcients and sig-levels) when using the transformed DV are comparable to the results using the untransformed DV, is it acceptable to simply report the regression using the untransformed DV? I'm asking because transforming the DV makes the interpretation of the results difficult and it's also quite uncommon in my field.

                Well, I guess it's acceptable since Jeff already mentioned to ignore non-normality and rely on the CTT.
                Last edited by Jean Hadji; 11 Sep 2020, 05:35.

                Comment

                Working...
                X