Variables chosen for multiple imputation

Yue YY

Join Date: May 2018

Posts: 41
#1

Variables chosen for multiple imputation

29 Apr 2019, 14:47

Dear Statalist,

I have a question regarding to choosing variables for multiple imputation. For example, I have missing values for smoking, and I'd like to investigate the relationship between smoking and cancer under control of age and sex during regression. There are also some variables that I'd like to adjusted e.g., education and occupation. If I want to obtain a crude OR only ajdusted for age and sex, and an adjusted OR adjusted for education and occupation as well, should I include all variables when imputing smoking for logistic regression on crude OR? Or should I just include age and sex because I will only ajdust for them for a crude OR? Thank you!

Yue
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#2

29 Apr 2019, 15:00

You should use all variables that are relevant to the prediction of the missing values when you impute, regardless of whether those same variable all appear in the subsequent analyses. You can create a single multiply imputed data set using all relevant variables. Then you can use it for whatever analyses you like afterwards.
Comment
Yue YY

Join Date: May 2018

Posts: 41
#3

29 Apr 2019, 17:42

Originally posted by Clyde Schechter View Post

You should use all variables that are relevant to the prediction of the missing values when you impute, regardless of whether those same variable all appear in the subsequent analyses. You can create a single multiply imputed data set using all relevant variables. Then you can use it for whatever analyses you like afterwards.

OK! Thank you very much, Clyde!
Comment
Nerea Becerra

Join Date: Mar 2019

Posts: 14
#4

21 May 2019, 01:44

Dear all,

I have problems when runing the multiple imputation with chained equations.
After run

Code:

mi set mlong mi misstable sum mi misstable patterns mi misstable nested mi register imputed hta_0 smoking education hdl_0 estimated_ldl_0 mi register regular sex age diabetes PA energy alcoholg_0 mi impute chained (logit) hta_0 (mlogit, augment) smoking education (regress) hdl_0 estimated_ldl_0 = age sex diabetes PA energy alcoholg_0, add(20) rseed (1234)

I obtain the following warning: the sets of predictors of the imputation model vary across imputations or iterations.

Can I ignore this warning? Do I have to change the variables used for the imputation model?

Smoking and education are categorical variables with three categories each one. I use the command "augment" because of the perfect prediction issue.

Thank you.

Nerea
Comment

Announcement

Variables chosen for multiple imputation

Comment

Comment

Comment