Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How can I do multiple imputation for longitudinal, binary data?

    Hello everybody and thanks in advance for your help. I am working on a database of 318 patients who received a lung ultrasound at T0 for suspected pneumonia. At T1, only 212 showed up so I have 106 missing data. Is multiple imputation a good idea at all to do the analysis of the T1 data, considering that a third of the patients more or less is missing?
    I saw that those not showing up had lower age, and differences in two clinical parameters (crackles/ rales and reducend ventilation). So, if MI is feasible given the large amount of missing data, I would impute on age, crackles and ventilation.I would like to see the effect of antibiotic on those who received it on the evolution of posterior big consolidations (a dummy variable I created: I have it for T0 and T1).
    Here is the data
    clear
    input int age_mo byte(crackles_rales_0 reduced_ventilation_0 t1_t0_dist t0_oral_antibiotic) float(post_bigcons_0 post_bigcons_1)
    1 1 0 3 0 0 0
    24 1 0 2 0 0 0
    1 2 0 2 0 0 0
    25 0 0 4 0 0 0
    78 0 0 4 0 0 0
    72 2 0 4 0 0 0
    8 2 0 . 0 . .
    133 2 2 3 1 0 0
    0 2 0 . 0 . .
    55 0 1 2 1 1 1
    27 2 0 . 1 . .
    2 2 0 . 1 . .
    27 1 1 5 1 0 0
    93 1 0 2 1 1 0
    3 2 0 . 1 . .
    12 2 0 . 1 . .
    0 2 0 . 0 . .
    102 0 0 . 1 . .
    179 2 0 3 0 0 0
    34 1 1 2 1 0 1
    8 1 1 2 1 0 0
    47 1 0 3 1 0 0
    39 0 0 . . . .
    32 2 0 3 1 0 0
    25 1 0 2 1 0 0
    83 2 2 2 1 0 0
    103 1 1 . 1 . .
    15 2 0 . 0 . .
    206 1 1 3 1 0 0
    65 1 0 2 1 0 0
    1 2 0 . 1 . .
    7 0 1 2 0 1 1
    2 2 0 4 1 0 0
    83 0 1 3 1 1 1
    34 1 0 . 0 . .
    55 1 0 . 1 0 0
    99 1 0 2 1 0 0
    37 1 1 3 1 1 0
    18 2 0 . 0 . .
    2 2 0 3 0 0 0
    52 1 0 4 1 0 0
    12 2 0 . 1 . .
    194 1 1 8 1 0 0
    4 2 0 3 0 0 0
    44 1 1 2 1 1 0
    1 2 0 . 0 . .
    0 2 0 4 1 1 0
    85 1 0 3 1 1 0
    60 2 0 5 0 0 0
    7 0 0 2 1 1 0
    7 2 0 . 0 . .
    5 2 0 2 0 0 0
    34 2 0 4 0 0 1
    2 2 0 . 0 . .
    1 2 0 4 0 0 0
    7 2 0 2 0 0 0
    4 1 0 3 0 0 0
    21 2 0 . 1 . .
    4 0 0 . 1 . .
    6 2 0 5 0 0 0
    190 1 0 3 0 0 0
    14 2 0 . 1 . .
    52 2 2 3 1 1 0
    4 2 0 . 0 . .
    1 2 0 . 0 . .
    17 2 0 . 0 . .
    138 1 0 2 0 0 0
    17 2 0 . 1 0 0
    40 1 1 5 1 0 0
    55 0 0 . 0 . .
    80 1 0 3 0 1 0
    64 0 1 4 1 1 0
    0 2 0 . 1 . .
    41 1 0 3 0 0 0
    30 2 2 3 0 0 0
    1 2 0 . 1 . .
    54 1 1 5 1 0 0
    60 1 1 3 0 1 0
    4 2 0 2 0 0 0
    106 1 1 4 1 0 0
    59 1 1 3 1 0 0
    64 1 0 2 1 1 0
    66 2 0 5 1 0 0
    0 2 0 . 1 . .
    1 1 0 2 0 0 0
    0 2 0 . 0 . .
    134 1 1 4 1 1 1
    5 0 0 . 1 . .
    155 2 2 4 1 1 0
    53 0 0 1 1 1 1
    37 0 2 2 1 0 1
    36 1 0 3 0 0 0
    109 1 1 2 1 0 0
    96 1 1 3 1 0 0
    162 1 0 2 1 0 0
    63 1 0 . 0 . .
    66 2 2 4 1 1 0
    10 0 0 . 0 . .
    41 . 1 . 1 . .
    28 2 0 . 1 . .
    end
    label values crackles_rales_0 crackles_rales
    label def crackles_rales 0 "No", modify
    label def crackles_rales 1 "Localized", modify
    label def crackles_rales 2 "Diffuse", modify
    label values reduced_ventilation_0 reduced_vent
    label def reduced_vent 0 "No", modify
    label def reduced_vent 1 "Yes localized", modify
    label def reduced_vent 2 "Yes bilateral", modify
    label values t0_oral_antibiotic yes_no
    label def yes_no 0 "No", modify
    label def yes_no 1 "Yes", modify
    [/CODE]

    My question is: if MI is feasible, should I use the mlong format, right? and after that, should I simply run a mlogit model putting for example "post_bigcons_1" as outcome and "post_bigcons_0" (how it was at T0) as covariate or should I reshape the data in the long format? and if so, should the "reshape long" happen before or after the imputation?
    I have Stata v 19 BE.
    Many thanks in advance for any answers to these questions.

    Anna


  • #2
    I have recently given an example how to impute longitudinal data here: https://www.statalist.org/forums/for...ple-imputation

    If the variable of interest is binary, both logit and pmm work. If you have more than 2 levels, mlogit is fine.
    Best wishes

    Stata 18.0 MP | ORCID | Google Scholar

    Comment


    • #3
      It looks like the multiple imputation (MI) you're using is more straightforward than what Felix suggested, since it only involves data from before (baseline) and after (follow-up) the event. However, I am not sure what the advantage of using MI is in your case. The missing data seems to be monotone, and it is not obvious from your data that the other variables can predict the outcome very well. Do you have additional variables potentially associated with the outcome?

      Comment


      • #4
        Thanks Tiago, yes in fact I do have other variables potentially associated with the outcome, I just presented the very essentials to show a sort of "skeleton" of the project. Which method do you think I would benefit from using?

        Comment


        • #5
          Multiple imputation via chained equations and logit regression would provide a nice sensitivity analysis.

          Comment


          • #6
            I'm skeptical of imputing the outcome variable in this setting. First, there's the issue of why did people attrit from the study? Imputation assumes it is not systematically related to the outcome. Even if you think missing at random makes sense, mechanically I don't see how it can make much of a difference. If were were using linear models and using data on all of X to imputing missing data on Y by using a regression predict Y out-of-sample, we would then wind up with the same estimates as using the complete cases. I know imputation adds some noise, but how can that really help?

            It seems one should at least first test for attrition bias by putting in the future value of the sample selection indicator to see if it predicts the current outcome.

            Comment


            • #7
              Anna:
              from your first post your data might be missing not at random.
              If this were the case, you may want to take a look at:
              Van Buuren S, Boshuizen HC, Knook DL. Multiple imputation of missing blood pressure covariates in survival analysis. Stat Med. 1999; 18(6): 681-694.doi:10.1002/(sici)1097-0258(19990330)18:6<681::aid-sim71>3.0.co;2-r.
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                Carlo Lazzaro thank you!
                @Jeff Wooldridge: It is an observational study, not a randomized (I should have mentioned it before) and those who have no follow up were, probably, less severe at the beginning. I don't know if they were not required to come back or they decided it by themselves.

                Comment


                • #9
                  ...those who have no follow up were, probably, less severe at the beginning.
                  That's the trick, Anna: does the missingness mechanism depend solely on the observed data (MAR) or on unobserved ones (with a possible contribution of the observed values) (MNAR)?
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment


                  • #10
                    I agree with Carlo's wise observation. I was under the impression that the missing at random assumption was reasonable, given that age and other two variables were associated with the missing data status.

                    Comment


                    • #11
                      I cannot be sure but I think that considering them as MAR is reasonable. Because if they are MNAR, I should do no analysis ot all of the remaining ones, right?

                      Comment


                      • #12
                        Carlo Lazzaro and Tiago Pereira : ​​​​​​​I cannot be sure but I think that considering them as MAR is reasonable. Because if they are MNAR, I should do no analysis ot all of the remaining ones, right?

                        Comment


                        • #13
                          Not quite, Anna.
                          if your data are MNAR you should do mupltiple imputation (as they were MAR) + sensitivity analysis (as they are not MAR).
                          Van Buuren abd colleagues' paper explain this issue.
                          Kind regards,
                          Carlo
                          (Stata 19.0)

                          Comment

                          Working...
                          X