Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Checking Normality of Residuals and Homoscedasticity in Data with multiple imputations

    Hi,

    I have a data set from a cohort study. I have imputed missing data using multiple imputations (40), and I am now performing linear regression using the following command (Stata version 14.0):

    mi estimate, post: regress log_IgE c.log_PFOS i. birthseason i.parity_gr i.smoking

    I would like to check for homoscedasticity and normality of the residuals, but I am not sure how to do that? I was hoping someone here might be able to help.


    A small note:
    I have added the post-option to be able to use the beta-estimates in calculating the change in outcome with a doubling of the exposure, since both the exposure and outcome are log transformed (using log10):

    di "Change in IgE with a doubling of PFOS "
    di ((2^_b[log_PFOS])-1)*100
    di ((2^(_b[log_PFOS] - invnormal(.975)*_se[log_PFOS]))-1)*100
    di ((2^(_b[log_PFOS] + invnormal(.975)*_se[log_PFOS]))-1)*100

    Best regards,
    Amalie

  • #2
    I'm not sure why you want to test for normality, and you likely don't need to test for heteroskedasticity, either. Hopefully you have a large enough sample to justify using asymptotic analysis. If you have a small sample, the estimators don't have known properties, anyway, if you've imputed the data.

    How many complete and imputed observations do you have?

    The mi procedure should automatically compute standard errors robust to heteroskedasticity. I'm not sure if the "robust" option is needed; it shouldn't be.

    Comment


    • #3
      Dear Jeff,

      Thank you for your reply. I have 559 individuals in my data, and I perform several regression analyses on the imputed data. I have imputed between 30 and 74 missing values on the exposure variables and between 9 and 132 missing values on the outcome variables. On the co-variates I have imputed 0 - 125 missing values.

      I am not very familiar with mi, but I assumed that when I perform a regression model on my data it would have to comply with the normal assumptions for regression models, namely normality of residuals (necessary for hypothesis tests to be valid) and homoscedasticity: http://www.ats.ucla.edu/stat/stata/w.../statareg2.htm
      Of cause by log transforming my outcomes I hoped to overcome any issues with non-normality of residuals and heteroskedasticity, but as when performing the same analyses on the un-imputed data, I would like to check the assumptions.

      I applogize for my ignorance on this subject, but I don't quite understand why I do not need to check the assumptions?

      Best regards,
      Amalie

      Comment

      Working...
      X