multiple imputation of multilelvel data

Julia Finster

Join Date: Feb 2017

Posts: 5
#1

multiple imputation of multilelvel data

02 Feb 2017, 07:50

Hi,

I have two questions concerning the imputation (techniques) of multilevel data. I have a multilevel data set (students in classes, in schools, in areas, in districts) and would like to impute variables on each of the four higher levels. Mo solution thus far has been:

-imputing all variables in one common model (wide format):

mi impute chained (pmm, knn(5)) varl1 varl1 varl1 varl2 varl2 varl2 varl3 varl3 varl3 varl4 varl4 varl4 = varl1 varl1 varl1, add(15) noisily rseed(52312)

- and then calculating the mean (metric variables), or the median (categorical variables) across imputations for all higher levels (in case imputed values differ on these levels):

foreach x of numlist 1/15 {
egen _`x'_foo= median(_`x'_varl2), by(level2)
replace _`x'_varl2= _`x'_foo
drop _`x'_foo

egen _`x'_foo= mean(_`x'_varl2), by(level2)
replace _`x'_varl2= _`x'_foo
drop _`x'_foo

.......

egen _`x'_foo= median(_`x'_varl3), by(level3)
replace _`x'_varl3= _`x'_foo
drop _`x'_foo

egen _`x'_foo= mean(_`x'_varl3), by(level3)
replace _`x'_varl3= _`x'_foo
drop _`x'_foo

...

egen _`x'_foo= median(_`x'_varl4), by(level4)
replace _`x'_varl2= _`x'_foo
drop _`x'_foo

egen _`x'_foo= mean(_`x'_varl4), by(level4)
replace _`x'_varl4= _`x'_foo
drop _`x'_foo
}

The commands are all working and I can run analyses, which I do with:

mi est, post noisily cmdok: gllamm DV varl1 varl2 varl3 varl4, i(level2 level3 level4) link(logit) f(binom)

However, I´m not sure if that procedure is correct or can adequately address the structure in my data. Does anybody have any thoughts on wether that procedure is ok? And as a follow-up question: is it ok that I force mi est to rum gllamm by using the cmdok option (as an .ado gllamm would otherwise not run with mi est).

I appreciate all thoughts. Thank you,

Julia
Tags: None
Tim Morris

Join Date: Apr 2014

Posts: 92
#2

02 Feb 2017, 09:57

Hi Julia

Unfortunately this is a very difficult problem and Stata doesn't as yet have any facility for proper multiple imputation of multilevel data.

Matteo Quartagno has done this for two-level data with in his jomo package in R, but even this can't handle more than two levels. He is sitting next to me now and shown him your proposed solution. Apparently Matthieu Resche-Rigon and Ian White's recent paper talks about the theoretical importance of including means in the imputation model (appendix) but you should be nervous about including only the mean. There is a worry about ecological bias so proceed cautiously!

Sorry, I haven't got any constructive solution. Tim
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#3

02 Feb 2017, 10:34

I ran into this problem personally, although almost all of my missing data were at level 1. I wasn't even able to find a workable Stata solution for my case. I wound up dropping one nursing facility with all missing race data (our key independent variable there, so I didn't feel justified in imputing that). There were one or two facilities (level 2 units) with one variable missing, and I wound up just including the variable in the imputation model. And yes, I realize that a) this isn't the theoretically correct thing to do, and b) this is not what Julia asked.

In case it helps anyone else, I found that a) Stata had posted some solutions for level 1 missing data on its blog, and b) none of these solutions applied to me. However, this might help someone.

http://www.stata.com/support/faqs/st...and-mi-impute/

I, too, regret not having any constructive solutions to offer. I think there's a lot more theoretical work needing to be done!

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Julia Finster

Join Date: Feb 2017

Posts: 5
#4

03 Feb 2017, 07:43

Thank you both! That was actually very helpful and good to know that I am not alone I also searched around a bit and an alternative might be to split the data into different data sets (one for each level), then impute the variables for each level separately and merge them back togehter afterwards. I don`t know how "good" that is, but I´m thinking about maybe changing my model to that. Have a nice weekend and thank you, Julia
Comment

Announcement

multiple imputation of multilelvel data

Comment

Comment

Comment