Checking Normality of Residuals and Homoscedasticity in Data with multiple imputations

Amalie Timmermann

Join Date: Sep 2015

Posts: 23
#1

Checking Normality of Residuals and Homoscedasticity in Data with multiple imputations

28 Sep 2015, 06:53

Hi,

I have a data set from a cohort study. I have imputed missing data using multiple imputations (40), and I am now performing linear regression using the following command (Stata version 14.0):

mi estimate, post: regress log_IgE c.log_PFOS i. birthseason i.parity_gr i.smoking

I would like to check for homoscedasticity and normality of the residuals, but I am not sure how to do that? I was hoping someone here might be able to help.

A small note:
I have added the post-option to be able to use the beta-estimates in calculating the change in outcome with a doubling of the exposure, since both the exposure and outcome are log transformed (using log10):

di "Change in IgE with a doubling of PFOS "
di ((2^_b[log_PFOS])-1)*100
di ((2^(_b[log_PFOS] - invnormal(.975)*_se[log_PFOS]))-1)*100
di ((2^(_b[log_PFOS] + invnormal(.975)*_se[log_PFOS]))-1)*100

Best regards,
Amalie
Tags: multiple imputation, regression
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2121
#2

28 Sep 2015, 08:21

I'm not sure why you want to test for normality, and you likely don't need to test for heteroskedasticity, either. Hopefully you have a large enough sample to justify using asymptotic analysis. If you have a small sample, the estimators don't have known properties, anyway, if you've imputed the data.

How many complete and imputed observations do you have?

The mi procedure should automatically compute standard errors robust to heteroskedasticity. I'm not sure if the "robust" option is needed; it shouldn't be.
Comment
Amalie Timmermann

Join Date: Sep 2015

Posts: 23
#3

08 Oct 2015, 06:59

Dear Jeff,

Thank you for your reply. I have 559 individuals in my data, and I perform several regression analyses on the imputed data. I have imputed between 30 and 74 missing values on the exposure variables and between 9 and 132 missing values on the outcome variables. On the co-variates I have imputed 0 - 125 missing values.

I am not very familiar with mi, but I assumed that when I perform a regression model on my data it would have to comply with the normal assumptions for regression models, namely normality of residuals (necessary for hypothesis tests to be valid) and homoscedasticity: http://www.ats.ucla.edu/stat/stata/w.../statareg2.htm
Of cause by log transforming my outcomes I hoped to overcome any issues with non-normality of residuals and heteroskedasticity, but as when performing the same analyses on the un-imputed data, I would like to check the assumptions.

I applogize for my ignorance on this subject, but I don't quite understand why I do not need to check the assumptions?

Best regards,
Amalie
Comment

Announcement

Checking Normality of Residuals and Homoscedasticity in Data with multiple imputations

Comment

Comment