Hello all,
I am looking for some guidance on how to impute categorical variables nominal and ordinal (up to 7 categories).
Overall there are <2% of missing values in my dataset. These variables are included in a principal component analysis in order to get a socio-economical status score, which will be used as a covariate in my final model.
1) Is there a threshold where simple imputation is preferred over multiple imputation? I imagine that when missing data is very low, one can do simple imputation but I can't find any reference that can value the sentence "when data has low missing values". Or is multiple imputation the only method used nowadays?
2) First, I wanted to do a simple (or single?) imputation by randomly assigning the values using the original variable distribution (please see the command below where SESWall indicate the wall material) :
Is it a good option? I think it is similar to a mean substitution.
3) Using the multiple imputation command mi impute chained (mlogit) would allow me to do a multivariate imputation, including the outcome variable (as recommended) and other auxiliary variables. Is a multiple imputation with m=1 equivalent to a simple imputation? I would use only the imputed dataset m=1 (and not m=0 with the observed data).
Many thanks,
Carole
I am looking for some guidance on how to impute categorical variables nominal and ordinal (up to 7 categories).
Overall there are <2% of missing values in my dataset. These variables are included in a principal component analysis in order to get a socio-economical status score, which will be used as a covariate in my final model.
1) Is there a threshold where simple imputation is preferred over multiple imputation? I imagine that when missing data is very low, one can do simple imputation but I can't find any reference that can value the sentence "when data has low missing values". Or is multiple imputation the only method used nowadays?
2) First, I wanted to do a simple (or single?) imputation by randomly assigning the values using the original variable distribution (please see the command below where SESWall indicate the wall material) :
Code:
gen rand = uniform() tab SESWall, nolabel | Freq. Percent Cum. ------------+----------------------------------- 1 | 171 17.34 17.34 2 | 103 10.45 27.79 3 | 117 11.87 39.66 4 | 595 60.34 100.00 ------------+----------------------------------- Total | 986 100.00 gen drawwall = cond(rand <.17, 1, cond(rand<.28, 2, cond(rand<.40, 3, 4))) gen iSESWall=SESWall replace iSESWall=drawwall if iSESWall==.
3) Using the multiple imputation command mi impute chained (mlogit) would allow me to do a multivariate imputation, including the outcome variable (as recommended) and other auxiliary variables. Is a multiple imputation with m=1 equivalent to a simple imputation? I would use only the imputed dataset m=1 (and not m=0 with the observed data).
Many thanks,
Carole
Comment