Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple imputation

    Can we divide into separate steps the variables with missing data according to their type while multiple imputing them in Stata? For example, can we regress the continuous variables separately e.g., as

    mi impute chained (regress) total_diseases number_pateints age = education employment , add(20) noisily augment

    categorical as,

    mi impute chained (logit) score_scale drug_pregference = education employment, add(20) noisily augment

    categorical as,
    mi impute chained (logit) q5 q33 q56 = education employment, add(20) noisily augment

    and not as,

    mi impute chained (regress) total_diseases number_pateints age (logit) score_scale drug_pregference (logit) q5 q33 q56 = education employment, add(20) noisily augment

    Just wondering if it is possible?

    Last edited by Wali Amar; 08 May 2022, 17:18.

  • #2
    The first two questions that come to mind:

    1. Why would you want to do this?
    2. What do you mean by "possible"?

    Only you can answer the first question so I am going with the second. In general, it is technically possible to run

    Code:
    mi impute chained (regress) total_diseases number_pateints age = education employment , add(20) noisily augment
    mi impute chained (logit) score_scale drug_pregference = education employment, add(20) noisily augment
    mi impute chained (logit) q5 q33 q56 = education employment, add(20) noisily augment
    There is no need to augment the linear regression model because perfect prediction is not an issue there. Anyway, you will end up with 3*20=60 partly completed datasets. The first 20 datasets will contain imputed values for the continuous variables while retaining the missing values for the categorical variables. This situation nicely illustrates the main conceptual problem with that approach: the imputed values are totally unrelated to each other. Thus, you will badly underestimate the relationship between continuous and categorical variables. One of the principles of multiple imputation is to utilize the (multivariate) correlations among all variables to choose imputed values in a way that would recover the overall distribution of the complete (but only partly observed) data. The suggested approach will not do that.

    Summing up: It is (technically) possible to do as you suggest. However, unless you are running simulations to assess how such specifications affect the imputed values, it is most likely not what you want.
    Last edited by daniel klein; 08 May 2022, 23:12. Reason: added summary

    Comment

    Working...
    X