How can I do multiple imputation for longitudinal, binary data?

anna campo

Join Date: Jan 2024

Posts: 15
#1

How can I do multiple imputation for longitudinal, binary data?

28 Jul 2025, 11:18

Hello everybody and thanks in advance for your help. I am working on a database of 318 patients who received a lung ultrasound at T0 for suspected pneumonia. At T1, only 212 showed up so I have 106 missing data. Is multiple imputation a good idea at all to do the analysis of the T1 data, considering that a third of the patients more or less is missing?
I saw that those not showing up had lower age, and differences in two clinical parameters (crackles/ rales and reducend ventilation). So, if MI is feasible given the large amount of missing data, I would impute on age, crackles and ventilation.I would like to see the effect of antibiotic on those who received it on the evolution of posterior big consolidations (a dummy variable I created: I have it for T0 and T1).
Here is the data
clear
input int age_mo byte(crackles_rales_0 reduced_ventilation_0 t1_t0_dist t0_oral_antibiotic) float(post_bigcons_0 post_bigcons_1)
1 1 0 3 0 0 0
24 1 0 2 0 0 0
1 2 0 2 0 0 0
25 0 0 4 0 0 0
78 0 0 4 0 0 0
72 2 0 4 0 0 0
8 2 0 . 0 . .
133 2 2 3 1 0 0
0 2 0 . 0 . .
55 0 1 2 1 1 1
27 2 0 . 1 . .
2 2 0 . 1 . .
27 1 1 5 1 0 0
93 1 0 2 1 1 0
3 2 0 . 1 . .
12 2 0 . 1 . .
0 2 0 . 0 . .
102 0 0 . 1 . .
179 2 0 3 0 0 0
34 1 1 2 1 0 1
8 1 1 2 1 0 0
47 1 0 3 1 0 0
39 0 0 . . . .
32 2 0 3 1 0 0
25 1 0 2 1 0 0
83 2 2 2 1 0 0
103 1 1 . 1 . .
15 2 0 . 0 . .
206 1 1 3 1 0 0
65 1 0 2 1 0 0
1 2 0 . 1 . .
7 0 1 2 0 1 1
2 2 0 4 1 0 0
83 0 1 3 1 1 1
34 1 0 . 0 . .
55 1 0 . 1 0 0
99 1 0 2 1 0 0
37 1 1 3 1 1 0
18 2 0 . 0 . .
2 2 0 3 0 0 0
52 1 0 4 1 0 0
12 2 0 . 1 . .
194 1 1 8 1 0 0
4 2 0 3 0 0 0
44 1 1 2 1 1 0
1 2 0 . 0 . .
0 2 0 4 1 1 0
85 1 0 3 1 1 0
60 2 0 5 0 0 0
7 0 0 2 1 1 0
7 2 0 . 0 . .
5 2 0 2 0 0 0
34 2 0 4 0 0 1
2 2 0 . 0 . .
1 2 0 4 0 0 0
7 2 0 2 0 0 0
4 1 0 3 0 0 0
21 2 0 . 1 . .
4 0 0 . 1 . .
6 2 0 5 0 0 0
190 1 0 3 0 0 0
14 2 0 . 1 . .
52 2 2 3 1 1 0
4 2 0 . 0 . .
1 2 0 . 0 . .
17 2 0 . 0 . .
138 1 0 2 0 0 0
17 2 0 . 1 0 0
40 1 1 5 1 0 0
55 0 0 . 0 . .
80 1 0 3 0 1 0
64 0 1 4 1 1 0
0 2 0 . 1 . .
41 1 0 3 0 0 0
30 2 2 3 0 0 0
1 2 0 . 1 . .
54 1 1 5 1 0 0
60 1 1 3 0 1 0
4 2 0 2 0 0 0
106 1 1 4 1 0 0
59 1 1 3 1 0 0
64 1 0 2 1 1 0
66 2 0 5 1 0 0
0 2 0 . 1 . .
1 1 0 2 0 0 0
0 2 0 . 0 . .
134 1 1 4 1 1 1
5 0 0 . 1 . .
155 2 2 4 1 1 0
53 0 0 1 1 1 1
37 0 2 2 1 0 1
36 1 0 3 0 0 0
109 1 1 2 1 0 0
96 1 1 3 1 0 0
162 1 0 2 1 0 0
63 1 0 . 0 . .
66 2 2 4 1 1 0
10 0 0 . 0 . .
41 . 1 . 1 . .
28 2 0 . 1 . .
end
label values crackles_rales_0 crackles_rales
label def crackles_rales 0 "No", modify
label def crackles_rales 1 "Localized", modify
label def crackles_rales 2 "Diffuse", modify
label values reduced_ventilation_0 reduced_vent
label def reduced_vent 0 "No", modify
label def reduced_vent 1 "Yes localized", modify
label def reduced_vent 2 "Yes bilateral", modify
label values t0_oral_antibiotic yes_no
label def yes_no 0 "No", modify
label def yes_no 1 "Yes", modify
[/CODE]

My question is: if MI is feasible, should I use the mlong format, right? and after that, should I simply run a mlogit model putting for example "post_bigcons_1" as outcome and "post_bigcons_0" (how it was at T0) as covariate or should I reshape the data in the long format? and if so, should the "reshape long" happen before or after the imputation?
I have Stata v 19 BE.
Many thanks in advance for any answers to these questions.

Anna
Tags: None
Felix Bittmann

Join Date: Aug 2018

Posts: 750
#2

28 Jul 2025, 12:14

I have recently given an example how to impute longitudinal data here: https://www.statalist.org/forums/for...ple-imputation

If the variable of interest is binary, both logit and pmm work. If you have more than 2 levels, mlogit is fine.

Best wishes

Stata 18.0 MP | ORCID | Google Scholar
Comment
Tiago Pereira

Join Date: Jan 2016

Posts: 409
#3

28 Jul 2025, 16:46

It looks like the multiple imputation (MI) you're using is more straightforward than what Felix suggested, since it only involves data from before (baseline) and after (follow-up) the event. However, I am not sure what the advantage of using MI is in your case. The missing data seems to be monotone, and it is not obvious from your data that the other variables can predict the outcome very well. Do you have additional variables potentially associated with the outcome?
Comment
anna campo

Join Date: Jan 2024

Posts: 15
#4

29 Jul 2025, 01:23

Thanks Tiago, yes in fact I do have other variables potentially associated with the outcome, I just presented the very essentials to show a sort of "skeleton" of the project. Which method do you think I would benefit from using?
Comment
Tiago Pereira

Join Date: Jan 2016

Posts: 409
#5

29 Jul 2025, 12:22

Multiple imputation via chained equations and logit regression would provide a nice sensitivity analysis.
2 likes
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2204
#6

29 Jul 2025, 21:09

I'm skeptical of imputing the outcome variable in this setting. First, there's the issue of why did people attrit from the study? Imputation assumes it is not systematically related to the outcome. Even if you think missing at random makes sense, mechanically I don't see how it can make much of a difference. If were were using linear models and using data on all of X to imputing missing data on Y by using a regression predict Y out-of-sample, we would then wind up with the same estimates as using the complete cases. I know imputation adds some noise, but how can that really help?

It seems one should at least first test for attrition bias by putting in the future value of the sample selection indicator to see if it predicts the current outcome.
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17739
#7

30 Jul 2025, 00:08

Anna:
from your first post your data might be missing not at random.
If this were the case, you may want to take a look at:
Van Buuren S, Boshuizen HC, Knook DL. Multiple imputation of missing blood pressure covariates in survival analysis. Stat Med. 1999; 18(6): 681-694.doi:10.1002/(sici)1097-0258(19990330)18:6<681::aid-sim71>3.0.co;2-r.

Kind regards,
Carlo
(Stata 19.0)
Comment
anna campo

Join Date: Jan 2024

Posts: 15
#8

30 Jul 2025, 00:30

Carlo Lazzaro thank you!
@Jeff Wooldridge: It is an observational study, not a randomized (I should have mentioned it before) and those who have no follow up were, probably, less severe at the beginning. I don't know if they were not required to come back or they decided it by themselves.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17739
#9

30 Jul 2025, 00:38

...those who have no follow up were, probably, less severe at the beginning.

That's the trick, Anna: does the missingness mechanism depend solely on the observed data (MAR) or on unobserved ones (with a possible contribution of the observed values) (MNAR)?

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Tiago Pereira

Join Date: Jan 2016

Posts: 409
#10

30 Jul 2025, 02:42

I agree with Carlo's wise observation. I was under the impression that the missing at random assumption was reasonable, given that age and other two variables were associated with the missing data status.
1 like
Comment
anna campo

Join Date: Jan 2024

Posts: 15
#11

31 Jul 2025, 03:29

I cannot be sure but I think that considering them as MAR is reasonable. Because if they are MNAR, I should do no analysis ot all of the remaining ones, right?
Comment
anna campo

Join Date: Jan 2024

Posts: 15
#12

31 Jul 2025, 10:55

Carlo Lazzaro and Tiago Pereira : I cannot be sure but I think that considering them as MAR is reasonable. Because if they are MNAR, I should do no analysis ot all of the remaining ones, right?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17739
#13

31 Jul 2025, 11:19

Not quite, Anna.
if your data are MNAR you should do mupltiple imputation (as they were MAR) + sensitivity analysis (as they are not MAR).
Van Buuren abd colleagues' paper explain this issue.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

How can I do multiple imputation for longitudinal, binary data?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment