Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple imputation for correlated exposure variables

    Hello. I am trying to perform multiple imputation in my dataset using mi impute chained.
    The dataset has three exposure variables, e.g. smoking at time 1, smoking at time 2, and smoking at time 3. I generated 4th exposure variable: smoking at any time. All exposure variables are binary variables (0 vs 1).
    Code:
    gen smoking_any=0 if smoking1==0 & smoking2==0 & smoking3==0
    replace smoking_any=1 if smoking1==1 | smoking2==1 | smoking3==1
    smoking1 smoking2 smoking3 smoking_any
    1 0 . 1
    0 0 . .
    0 1 . 1
    . . 1 1
    0 0 0 0
    0 0 0 0
    0 0 0 0
    . 0 0 .
    0 0 . .
    My question is how to impute "smoking_any".
    When I include all four exposure variables in the imputation model, it always says ".... predicts data perfectly" or "convergence not achieved".

    If I impute "smoking_any" separately from the other three variables, it looks like the prevalence of smoking at any time would be overestimated.

    Can I use passive imputation approach after I impute smoking1, smoking2, and smoking3? But it is said that "this method is actually a misspecification of your imputation model and will lead to biased parameter estimates in your analytic model".

    Thank you very much.

  • #2
    Still waiting for advice...

    Comment


    • #3
      Hi Jeff,

      I am wondering what you ended up doing to get the imputation to work, as I am encountering the same issue. I have tried using the "augment" command for the logit variables, which leads the imputation to run but results in some missing values still not being imputed.

      Thanks,
      Mari

      Comment


      • #4
        Originally posted by Marisol Kevelson View Post
        Hi Jeff,

        I am wondering what you ended up doing to get the imputation to work, as I am encountering the same issue. I have tried using the "augment" command for the logit variables, which leads the imputation to run but results in some missing values still not being imputed.

        Thanks,
        Mari
        Hi Mari,

        I chose to impute smoking1, smoking2 and smoking3 first, and then calculate smoking_any passively. This is not a perfect approach (as mentioned above), but I think it is ok if the missing percentage is not high. Perhaps a sensitive analysis by imputing smoking_any directly would be helpful.

        Jeff

        Comment

        Working...
        X