Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • multiple imputation of multilelvel data

    Hi,

    I have two questions concerning the imputation (techniques) of multilevel data. I have a multilevel data set (students in classes, in schools, in areas, in districts) and would like to impute variables on each of the four higher levels. Mo solution thus far has been:

    -imputing all variables in one common model (wide format):

    mi impute chained (pmm, knn(5)) varl1 varl1 varl1 varl2 varl2 varl2 varl3 varl3 varl3 varl4 varl4 varl4 = varl1 varl1 varl1, add(15) noisily rseed(52312)

    - and then calculating the mean (metric variables), or the median (categorical variables) across imputations for all higher levels (in case imputed values differ on these levels):

    foreach x of numlist 1/15 {
    egen _`x'_foo= median(_`x'_varl2), by(level2)
    replace _`x'_varl2= _`x'_foo
    drop _`x'_foo

    egen _`x'_foo= mean(_`x'_varl2), by(level2)
    replace _`x'_varl2= _`x'_foo
    drop _`x'_foo

    .......

    egen _`x'_foo= median(_`x'_varl3), by(level3)
    replace _`x'_varl3= _`x'_foo
    drop _`x'_foo

    egen _`x'_foo= mean(_`x'_varl3), by(level3)
    replace _`x'_varl3= _`x'_foo
    drop _`x'_foo

    ...

    egen _`x'_foo= median(_`x'_varl4), by(level4)
    replace _`x'_varl2= _`x'_foo
    drop _`x'_foo

    egen _`x'_foo= mean(_`x'_varl4), by(level4)
    replace _`x'_varl4= _`x'_foo
    drop _`x'_foo
    }

    The commands are all working and I can run analyses, which I do with:

    mi est, post noisily cmdok: gllamm DV varl1 varl2 varl3 varl4, i(level2 level3 level4) link(logit) f(binom)

    However, I´m not sure if that procedure is correct or can adequately address the structure in my data. Does anybody have any thoughts on wether that procedure is ok? And as a follow-up question: is it ok that I force mi est to rum gllamm by using the cmdok option (as an .ado gllamm would otherwise not run with mi est).


    I appreciate all thoughts. Thank you,


    Julia

  • #2
    Hi Julia

    Unfortunately this is a very difficult problem and Stata doesn't as yet have any facility for proper multiple imputation of multilevel data.

    Matteo Quartagno has done this for two-level data with in his jomo package in R, but even this can't handle more than two levels. He is sitting next to me now and shown him your proposed solution. Apparently Matthieu Resche-Rigon and Ian White's recent paper talks about the theoretical importance of including means in the imputation model (appendix) but you should be nervous about including only the mean. There is a worry about ecological bias so proceed cautiously!

    Sorry, I haven't got any constructive solution. Tim

    Comment


    • #3
      I ran into this problem personally, although almost all of my missing data were at level 1. I wasn't even able to find a workable Stata solution for my case. I wound up dropping one nursing facility with all missing race data (our key independent variable there, so I didn't feel justified in imputing that). There were one or two facilities (level 2 units) with one variable missing, and I wound up just including the variable in the imputation model. And yes, I realize that a) this isn't the theoretically correct thing to do, and b) this is not what Julia asked.

      In case it helps anyone else, I found that a) Stata had posted some solutions for level 1 missing data on its blog, and b) none of these solutions applied to me. However, this might help someone.

      http://www.stata.com/support/faqs/st...and-mi-impute/

      I, too, regret not having any constructive solutions to offer. I think there's a lot more theoretical work needing to be done!
      Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

      When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

      Comment


      • #4
        Thank you both! That was actually very helpful and good to know that I am not alone I also searched around a bit and an alternative might be to split the data into different data sets (one for each level), then impute the variables for each level separately and merge them back togehter afterwards. I don`t know how "good" that is, but I´m thinking about maybe changing my model to that. Have a nice weekend and thank you, Julia

        Comment

        Working...
        X