Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple imputation to estimate missing values for filtering observations

    Is there a way in Stata to use multiply-imputed variables to subset data?

    I am analyzing a clinical dataset with a high degree of missingness in the two variables, P and W, that are required to subset the data.

    I would like to impute P and W, and then effectively filter out observations for which P >200 and W >20.

    However, -mi est- does not permit "if" statements, understandably: "estimation sample varies between m=1 and m=2 ... subsample [] changes from one imputation to another."

    Collapsing the imputed dataset on individual observations to return the mean of the imputed value seems problematic from a statistical perspective although that is just my intuition, i.e., it seems to undermine the entire premise of multiple imputation, though I am not a card-carrying statistician.

    Imputation using standard regression methods does not perform very well (e.g., adj R^2 <0.10 in the best fitting models), which may indicate this exercise is a lost cause, but I was hoping to get others' input before giving up.

    Thank you.

  • #2
    This is an interesting question and I have two ideas, neither of them validated or with references:

    1. Impute and then create a dummy variable that fits the cases you want to select, something like

    Code:
    gen touse = P < 200 & W < 20
    Then run the regression model with the touse variable as an interaction term, like:

    Code:
    mi estimate: reg y (c.x1 c.x2 c.x2)##i.touse
    2. Maybe collapsing is fine. Potentially, the median might be more robust. Then I would use something like:

    Code:
    bysort ID: egen medvalue = median(P)

    In the end the overall quality will depend a lot on the overall share of missingness and the quality of your imputation model or available prediction variables, I suppose.
    Last edited by Felix Bittmann; 30 Oct 2022, 15:11.
    Best wishes

    Stata 18.0 MP | ORCID | Google Scholar

    Comment


    • #3
      Thank you Felix! Your option #1 is very clever, and it worked! After running the -mi- regression I was able to combine the coefficients with mi est (_b[X1] + _b[X2] + _b[X1#X2], dots: logistic Y X1##X2, following advice found here.

      Comment

      Working...
      X