Multiple imputation

Wali Amar

Join Date: Oct 2021

Posts: 9
#1

Multiple imputation

08 May 2022, 17:14

Can we divide into separate steps the variables with missing data according to their type while multiple imputing them in Stata? For example, can we regress the continuous variables separately e.g., as

mi impute chained (regress) total_diseases number_pateints age = education employment , add(20) noisily augment

categorical as,

mi impute chained (logit) score_scale drug_pregference = education employment, add(20) noisily augment

categorical as,
mi impute chained (logit) q5 q33 q56 = education employment, add(20) noisily augment

and not as,

mi impute chained (regress) total_diseases number_pateints age (logit) score_scale drug_pregference (logit) q5 q33 q56 = education employment, add(20) noisily augment

Just wondering if it is possible?

Last edited by Wali Amar; 08 May 2022, 17:18.
Tags: None
daniel klein

Join Date: Mar 2014

Posts: 3912
#2

08 May 2022, 23:07

The first two questions that come to mind:

1. Why would you want to do this?
2. What do you mean by "possible"?

Only you can answer the first question so I am going with the second. In general, it is technically possible to run

Code:

mi impute chained (regress) total_diseases number_pateints age = education employment , add(20) noisily augment mi impute chained (logit) score_scale drug_pregference = education employment, add(20) noisily augment mi impute chained (logit) q5 q33 q56 = education employment, add(20) noisily augment

There is no need to augment the linear regression model because perfect prediction is not an issue there. Anyway, you will end up with 3*20=60 partly completed datasets. The first 20 datasets will contain imputed values for the continuous variables while retaining the missing values for the categorical variables. This situation nicely illustrates the main conceptual problem with that approach: the imputed values are totally unrelated to each other. Thus, you will badly underestimate the relationship between continuous and categorical variables. One of the principles of multiple imputation is to utilize the (multivariate) correlations among all variables to choose imputed values in a way that would recover the overall distribution of the complete (but only partly observed) data. The suggested approach will not do that.

Summing up: It is (technically) possible to do as you suggest. However, unless you are running simulations to assess how such specifications affect the imputed values, it is most likely not what you want.

Last edited by daniel klein; 08 May 2022, 23:12. Reason: added summary
3 likes
Comment

Announcement

Multiple imputation

Comment