Multiple imputation to estimate missing values for filtering observations

MATTHEW IPPOLITO

Join Date: Nov 2015

Posts: 7
#1

Multiple imputation to estimate missing values for filtering observations

30 Oct 2022, 13:01

Is there a way in Stata to use multiply-imputed variables to subset data?

I am analyzing a clinical dataset with a high degree of missingness in the two variables, P and W, that are required to subset the data.

I would like to impute P and W, and then effectively filter out observations for which P >200 and W >20.

However, -mi est- does not permit "if" statements, understandably: "estimation sample varies between m=1 and m=2 ... subsample [] changes from one imputation to another."

Collapsing the imputed dataset on individual observations to return the mean of the imputed value seems problematic from a statistical perspective although that is just my intuition, i.e., it seems to undermine the entire premise of multiple imputation, though I am not a card-carrying statistician.

Imputation using standard regression methods does not perform very well (e.g., adj R^2 <0.10 in the best fitting models), which may indicate this exercise is a lost cause, but I was hoping to get others' input before giving up.

Thank you.
Tags: None
Felix Bittmann

Join Date: Aug 2018

Posts: 710
#2

30 Oct 2022, 15:08

This is an interesting question and I have two ideas, neither of them validated or with references:

1. Impute and then create a dummy variable that fits the cases you want to select, something like

Code:

gen touse = P < 200 & W < 20

Then run the regression model with the touse variable as an interaction term, like:

Code:

mi estimate: reg y (c.x1 c.x2 c.x2)##i.touse

2. Maybe collapsing is fine. Potentially, the median might be more robust. Then I would use something like:

Code:

bysort ID: egen medvalue = median(P)

In the end the overall quality will depend a lot on the overall share of missingness and the quality of your imputation model or available prediction variables, I suppose.

Last edited by Felix Bittmann; 30 Oct 2022, 15:11.

Best wishes

Stata 18.0 MP | ORCID | Google Scholar
Comment
MATTHEW IPPOLITO

Join Date: Nov 2015

Posts: 7
#3

30 Oct 2022, 16:18

Thank you Felix! Your option #1 is very clever, and it worked! After running the -mi- regression I was able to combine the coefficients with mi est (_b[X1] + _b[X2] + _b[X1#X2], dots: logistic Y X1##X2, following advice found here.
Comment

Announcement

Multiple imputation to estimate missing values for filtering observations

Comment

Comment