Hello everybody and thanks in advance for your help. I am working on a database of 318 patients who received a lung ultrasound at T0 for suspected pneumonia. At T1, only 212 showed up so I have 106 missing data. Is multiple imputation a good idea at all to do the analysis of the T1 data, considering that a third of the patients more or less is missing?
I saw that those not showing up had lower age, and differences in two clinical parameters (crackles/ rales and reducend ventilation). So, if MI is feasible given the large amount of missing data, I would impute on age, crackles and ventilation.I would like to see the effect of antibiotic on those who received it on the evolution of posterior big consolidations (a dummy variable I created: I have it for T0 and T1).
Here is the data
clear
input int age_mo byte(crackles_rales_0 reduced_ventilation_0 t1_t0_dist t0_oral_antibiotic) float(post_bigcons_0 post_bigcons_1)
1 1 0 3 0 0 0
24 1 0 2 0 0 0
1 2 0 2 0 0 0
25 0 0 4 0 0 0
78 0 0 4 0 0 0
72 2 0 4 0 0 0
8 2 0 . 0 . .
133 2 2 3 1 0 0
0 2 0 . 0 . .
55 0 1 2 1 1 1
27 2 0 . 1 . .
2 2 0 . 1 . .
27 1 1 5 1 0 0
93 1 0 2 1 1 0
3 2 0 . 1 . .
12 2 0 . 1 . .
0 2 0 . 0 . .
102 0 0 . 1 . .
179 2 0 3 0 0 0
34 1 1 2 1 0 1
8 1 1 2 1 0 0
47 1 0 3 1 0 0
39 0 0 . . . .
32 2 0 3 1 0 0
25 1 0 2 1 0 0
83 2 2 2 1 0 0
103 1 1 . 1 . .
15 2 0 . 0 . .
206 1 1 3 1 0 0
65 1 0 2 1 0 0
1 2 0 . 1 . .
7 0 1 2 0 1 1
2 2 0 4 1 0 0
83 0 1 3 1 1 1
34 1 0 . 0 . .
55 1 0 . 1 0 0
99 1 0 2 1 0 0
37 1 1 3 1 1 0
18 2 0 . 0 . .
2 2 0 3 0 0 0
52 1 0 4 1 0 0
12 2 0 . 1 . .
194 1 1 8 1 0 0
4 2 0 3 0 0 0
44 1 1 2 1 1 0
1 2 0 . 0 . .
0 2 0 4 1 1 0
85 1 0 3 1 1 0
60 2 0 5 0 0 0
7 0 0 2 1 1 0
7 2 0 . 0 . .
5 2 0 2 0 0 0
34 2 0 4 0 0 1
2 2 0 . 0 . .
1 2 0 4 0 0 0
7 2 0 2 0 0 0
4 1 0 3 0 0 0
21 2 0 . 1 . .
4 0 0 . 1 . .
6 2 0 5 0 0 0
190 1 0 3 0 0 0
14 2 0 . 1 . .
52 2 2 3 1 1 0
4 2 0 . 0 . .
1 2 0 . 0 . .
17 2 0 . 0 . .
138 1 0 2 0 0 0
17 2 0 . 1 0 0
40 1 1 5 1 0 0
55 0 0 . 0 . .
80 1 0 3 0 1 0
64 0 1 4 1 1 0
0 2 0 . 1 . .
41 1 0 3 0 0 0
30 2 2 3 0 0 0
1 2 0 . 1 . .
54 1 1 5 1 0 0
60 1 1 3 0 1 0
4 2 0 2 0 0 0
106 1 1 4 1 0 0
59 1 1 3 1 0 0
64 1 0 2 1 1 0
66 2 0 5 1 0 0
0 2 0 . 1 . .
1 1 0 2 0 0 0
0 2 0 . 0 . .
134 1 1 4 1 1 1
5 0 0 . 1 . .
155 2 2 4 1 1 0
53 0 0 1 1 1 1
37 0 2 2 1 0 1
36 1 0 3 0 0 0
109 1 1 2 1 0 0
96 1 1 3 1 0 0
162 1 0 2 1 0 0
63 1 0 . 0 . .
66 2 2 4 1 1 0
10 0 0 . 0 . .
41 . 1 . 1 . .
28 2 0 . 1 . .
end
label values crackles_rales_0 crackles_rales
label def crackles_rales 0 "No", modify
label def crackles_rales 1 "Localized", modify
label def crackles_rales 2 "Diffuse", modify
label values reduced_ventilation_0 reduced_vent
label def reduced_vent 0 "No", modify
label def reduced_vent 1 "Yes localized", modify
label def reduced_vent 2 "Yes bilateral", modify
label values t0_oral_antibiotic yes_no
label def yes_no 0 "No", modify
label def yes_no 1 "Yes", modify
[/CODE]
My question is: if MI is feasible, should I use the mlong format, right? and after that, should I simply run a mlogit model putting for example "post_bigcons_1" as outcome and "post_bigcons_0" (how it was at T0) as covariate or should I reshape the data in the long format? and if so, should the "reshape long" happen before or after the imputation?
I have Stata v 19 BE.
Many thanks in advance for any answers to these questions.
Anna
I saw that those not showing up had lower age, and differences in two clinical parameters (crackles/ rales and reducend ventilation). So, if MI is feasible given the large amount of missing data, I would impute on age, crackles and ventilation.I would like to see the effect of antibiotic on those who received it on the evolution of posterior big consolidations (a dummy variable I created: I have it for T0 and T1).
Here is the data
clear
input int age_mo byte(crackles_rales_0 reduced_ventilation_0 t1_t0_dist t0_oral_antibiotic) float(post_bigcons_0 post_bigcons_1)
1 1 0 3 0 0 0
24 1 0 2 0 0 0
1 2 0 2 0 0 0
25 0 0 4 0 0 0
78 0 0 4 0 0 0
72 2 0 4 0 0 0
8 2 0 . 0 . .
133 2 2 3 1 0 0
0 2 0 . 0 . .
55 0 1 2 1 1 1
27 2 0 . 1 . .
2 2 0 . 1 . .
27 1 1 5 1 0 0
93 1 0 2 1 1 0
3 2 0 . 1 . .
12 2 0 . 1 . .
0 2 0 . 0 . .
102 0 0 . 1 . .
179 2 0 3 0 0 0
34 1 1 2 1 0 1
8 1 1 2 1 0 0
47 1 0 3 1 0 0
39 0 0 . . . .
32 2 0 3 1 0 0
25 1 0 2 1 0 0
83 2 2 2 1 0 0
103 1 1 . 1 . .
15 2 0 . 0 . .
206 1 1 3 1 0 0
65 1 0 2 1 0 0
1 2 0 . 1 . .
7 0 1 2 0 1 1
2 2 0 4 1 0 0
83 0 1 3 1 1 1
34 1 0 . 0 . .
55 1 0 . 1 0 0
99 1 0 2 1 0 0
37 1 1 3 1 1 0
18 2 0 . 0 . .
2 2 0 3 0 0 0
52 1 0 4 1 0 0
12 2 0 . 1 . .
194 1 1 8 1 0 0
4 2 0 3 0 0 0
44 1 1 2 1 1 0
1 2 0 . 0 . .
0 2 0 4 1 1 0
85 1 0 3 1 1 0
60 2 0 5 0 0 0
7 0 0 2 1 1 0
7 2 0 . 0 . .
5 2 0 2 0 0 0
34 2 0 4 0 0 1
2 2 0 . 0 . .
1 2 0 4 0 0 0
7 2 0 2 0 0 0
4 1 0 3 0 0 0
21 2 0 . 1 . .
4 0 0 . 1 . .
6 2 0 5 0 0 0
190 1 0 3 0 0 0
14 2 0 . 1 . .
52 2 2 3 1 1 0
4 2 0 . 0 . .
1 2 0 . 0 . .
17 2 0 . 0 . .
138 1 0 2 0 0 0
17 2 0 . 1 0 0
40 1 1 5 1 0 0
55 0 0 . 0 . .
80 1 0 3 0 1 0
64 0 1 4 1 1 0
0 2 0 . 1 . .
41 1 0 3 0 0 0
30 2 2 3 0 0 0
1 2 0 . 1 . .
54 1 1 5 1 0 0
60 1 1 3 0 1 0
4 2 0 2 0 0 0
106 1 1 4 1 0 0
59 1 1 3 1 0 0
64 1 0 2 1 1 0
66 2 0 5 1 0 0
0 2 0 . 1 . .
1 1 0 2 0 0 0
0 2 0 . 0 . .
134 1 1 4 1 1 1
5 0 0 . 1 . .
155 2 2 4 1 1 0
53 0 0 1 1 1 1
37 0 2 2 1 0 1
36 1 0 3 0 0 0
109 1 1 2 1 0 0
96 1 1 3 1 0 0
162 1 0 2 1 0 0
63 1 0 . 0 . .
66 2 2 4 1 1 0
10 0 0 . 0 . .
41 . 1 . 1 . .
28 2 0 . 1 . .
end
label values crackles_rales_0 crackles_rales
label def crackles_rales 0 "No", modify
label def crackles_rales 1 "Localized", modify
label def crackles_rales 2 "Diffuse", modify
label values reduced_ventilation_0 reduced_vent
label def reduced_vent 0 "No", modify
label def reduced_vent 1 "Yes localized", modify
label def reduced_vent 2 "Yes bilateral", modify
label values t0_oral_antibiotic yes_no
label def yes_no 0 "No", modify
label def yes_no 1 "Yes", modify
[/CODE]
My question is: if MI is feasible, should I use the mlong format, right? and after that, should I simply run a mlogit model putting for example "post_bigcons_1" as outcome and "post_bigcons_0" (how it was at T0) as covariate or should I reshape the data in the long format? and if so, should the "reshape long" happen before or after the imputation?
I have Stata v 19 BE.
Many thanks in advance for any answers to these questions.
Anna
Comment