Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple imputation for longitudinal data- account for repeated measures within individuals

    Hello,
    I am working with data from a randomized control trial, where the primary outcome is to assess infant length-for-age z-scores (LAZ) at 1 year of age (comparison between treatment groups using ANOVA). Although the main analysis will follow an intention-to-treat approach, as a sensitivity analysis we want to impute LAZ at 12 months, and run the ANOVA again with imputed values. The trial is still ongoing (almost 80% complete), but so far there is not a lot of missing 12-month LAZ data (<4%).

    LAZ is measured at various time points (birth, 3 months, 6 months, 9 months) before the final endpoint (12 months). In the imputation model, I want to use these prior measurements to help predict 12-month LAZ. Measurements at these time points are also not complete- they include a similar proportion of missing data (<5%). Ideally, I would like for the imputation model to account for clustering by individuals (due to various length measurements at different time points). Is there a way to do this??

    I have researched extensively and cannot seem to find a good approach. Many resources don't account for clustering in the imputation stage but then run a multilevel model in the estimation stage, which is not what I want to do. Other references also suggest organizing the data in wide form, and making each measurement time point a separate variable (e.g., LAZ0, LAZ3, LAZ6, etc.), then use a MICE model (mi impute chained). However, I don't think this accounts for clustering by participant.

    Any help is appreciated, thank you!

  • #2
    Read this posting and related FAQ.

    Comment


    • #3
      Jill Korsiak

      Because of your randomised trial setting, where each participant is intended to have the same number of measurements, it is not too difficult to handle the imputation. Your data will need to be in wide form so that you have five LAZ variables, laz0 laz3 laz6 laz9 laz12. You then use multivariate imputation with any incomplete laz variables on the left hand side.

      If you have a monotone pattern of missingness (check with misstable patterns) then you can use mi impute monotone. If not, you will need mi impute mvn or mi impute chained.

      There is a command (mimix) designed for exactly this setting, and which can do this for you. It imputes assuming missing-at-random within each trial arm. However, it's really important to do sensitivity analysis in trials with incomplete outcomes, and mimix will do 'reference-based imputation' for sensitivity analysis. This imputation makes different assumptions (to missing-at-random) about the distribution of the unobserved data. SJ paper can be found here and slides from Suzie Cro here.

      Hope that helps, Tim

      Comment


      • #4
        Tim Morris

        Thank you for the suggestions Tim. The server I use to access journal articles is currently down so I cannot view your suggested article by Cro describing mimix, but I will review it later.
        In the mean time, I have a couple more questions which I was wondering whether you (or anyone else) could address:

        1) How does mi impute mvn or mi impute chained account for clustering by participant? My understanding is it doesn't, but I'm not sure.
        2) I will likely include additional predictors (which will also have some missing data) in the imputation models (e.g., age, education, income, etc.). Unlike LAZ, these variables are not all continuous and normally distributed. I understand that an assumption of mi impute mvn is joint multivariate normality, which wouldn't be achieved with the inclusion of categorical/binary variables. Is this problematic (i.e., is mi impute chained the better approach in this situation?) .

        Thank you!

        Comment

        Working...
        X