Hi everyone,
I am trying to use -mi impute chained- to impute missing values. At the end of the imputation process, the log shows the following table:
------------------------------------------------------------------
| Observations per m
|----------------------------------------------
Variable | Complete Incomplete Imputed | Total
-------------------+-----------------------------------+----------
edu | 1858 2 2 | 1860
apoe | 1804 56 49 | 1860
wmhvolume | 1858 2 2 | 1860
icvolume | 1858 2 2 | 1860
neurocog5 | 1859 1 1 | 1860
bmi1 | 1859 1 0 | 1860
sbp1 | 1859 1 0 | 1860
dbp1 | 1859 1 0 | 1860
glucose1 | 1804 56 45 | 1860
hdl1 | 1836 24 17 | 1860
tch1 | 1835 25 18 | 1860
trigs1 | 1836 24 17 | 1860
cholmeds11 | 1840 20 19 | 1860
bmi5 | 1852 8 8 | 1860
smoke5 | 1741 119 50 | 1860
sbp5 | 1855 5 5 | 1860
dbp5 | 1855 5 5 | 1860
glucose5 | 1754 106 98 | 1860
hdl5 | 1847 13 13 | 1860
tch5 | 1847 13 13 | 1860
trigs5 | 1847 13 13 | 1860
cholmeds15 | 1855 5 5 | 1860
dmmeds1 | 1532 328 326 | 1860
dmmeds5 | 1855 5 5 | 1860
------------------------------------------------------------------
As you can see, sometimes all of my incomplete observations are imputed, and sometimes they are not all imputed (e.g., apoe, bmi1, sbp1, etc.). My goal is to build a sound model that will be able to impute all of the incomplete values for each variable in the model.
Here is the meat of my -mi impute chained- code:
*Impute data
mi impute chained ///
(ologit, omit(i.dmmeds1 i.dmmeds5)) edu ///
(logit, omit(i.dmmeds1 i.dmmeds5)) apoe ///
(regress, omit(i.dmmeds1 i.dmmeds5)) wmhvolume ///
(regress, omit(i.dmmeds1 i.dmmeds5)) icvolume ///
(logit, omit(i.dmmeds1 i.dmmeds5)) neurocog5 ///
(regress, omit(i.dmmeds1 i.dmmeds5)) bmi1 ///
(regress, omit(i.dmmeds1 i.dmmeds5)) sbp1 ///
(regress, omit(i.dmmeds1 i.dmmeds5)) dbp1 ///
(regress, omit(i.dmmeds1 i.dmmeds5) include(c.glucose2)) glucose1 ///
(regress, omit(i.dmmeds1 i.dmmeds5)) hdl1 ///
(regress, omit(i.dmmeds1 i.dmmeds5) include(c.tch2)) tch1 ///
(regress, omit(i.dmmeds1 i.dmmeds5)) trigs1 ///
(logit, omit(i.dmmeds1 i.dmmeds5 i.smoke5 i.black)) cholmeds11 ///
(regress, omit(i.dmmeds1 i.dmmeds5)) bmi5 ///
(ologit, omit(i.dmmeds1 i.dmmeds5 i.cholmeds11) include(i.smoke4)) smoke5 ///
(regress, omit(i.dmmeds1 i.dmmeds5)) sbp5 ///
(regress, omit(i.dmmeds1 i.dmmeds5)) dbp5 ///
(regress, omit(i.dmmeds1 i.dmmeds5) include(c.glucose4)) glucose5 ///
(regress, omit(i.dmmeds1 i.dmmeds5)) hdl5 ///
(regress, omit(i.dmmeds1 i.dmmeds5)) tch5 ///
(regress, omit(i.dmmeds1 i.dmmeds5)) trigs5 ///
(logit, omit(i.dmmeds1 i.dmmeds5)) cholmeds15 ///
(logit, noimputed omit(i.smoke1 i.htnmeds1 i.htnmeds5 i.center) ///
include(c.bmi1 c.sbp1 c.dbp1 c.hdl1 c.tch1 c.trigs1)) dmmeds1 ///
(logit, noimputed omit(i.smoke1 i.htnmeds1 i.htnmeds5 i.center) ///
include(c.bmi5 c.sbp5 c.dbp5 c.hdl5 c.tch5 c.trigs5)) dmmeds5 ///
= i.male i.black i.center c.age1 c.agesq1 i.smoke1 ///
i.htnmeds1 i.htnmeds5 ///
c.fa1 c.fa2 c.fa3 c.fa4 c.fa5 c.fa6 c.fa7 ///
c.md1 c.md2 c.md3 c.md4 c.md5 c.md6 c.md7 ///
, add(5) burnin(100) rseed(`seed') augment force noisily ///
savetrace("1-data\stata\MItrace_seed`seed'.dta", replace)
My unsuccessful attempts to address this issue so far have included:
1) Varying the seed
2) Increasing the # of imputations and/or burn-ins
3) Tweaking the variable-specific mi models ("including" or "omitting" more variables)
4) Looking at the MI manual. I've looked for example tables similar to the one I pasted above ("observations per m"), and the only somewhat-helpful resource I have found is this:
Usually, the number of complete observations in the imputation sample...will be equal to the number of observations used in the estimation. Sometimes, however, observations may be dropped from the estimation—for example, if independent variables contain missing values. In this case, the number of complete observations in the imputation sample and the number of observations used in the estimation will be different, and the following note will appear following the table output: "Note: right-hand-side variables (or weights) have missing values; model parameters estimated using listwise deletion" You should evaluate such cases to verify that results are as expected. In general, missing values in independent variables (or in a weighting variable) do not affect the imputation sample but they may lead to missing imputed values.
I don't really understand, though, how "too much missingness" (my own words) could be a problem to the point that missing values are imputed with missing values. Wouldn't -mi impute- at least return a "wild guess" imputation, before it would just return another missing value? Though maybe I'm not reading this manual note correctly?
5) Googling for what feels like an eternity.
I'd really appreciate any thoughts or suggestions. Thank you.
I am trying to use -mi impute chained- to impute missing values. At the end of the imputation process, the log shows the following table:
------------------------------------------------------------------
| Observations per m
|----------------------------------------------
Variable | Complete Incomplete Imputed | Total
-------------------+-----------------------------------+----------
edu | 1858 2 2 | 1860
apoe | 1804 56 49 | 1860
wmhvolume | 1858 2 2 | 1860
icvolume | 1858 2 2 | 1860
neurocog5 | 1859 1 1 | 1860
bmi1 | 1859 1 0 | 1860
sbp1 | 1859 1 0 | 1860
dbp1 | 1859 1 0 | 1860
glucose1 | 1804 56 45 | 1860
hdl1 | 1836 24 17 | 1860
tch1 | 1835 25 18 | 1860
trigs1 | 1836 24 17 | 1860
cholmeds11 | 1840 20 19 | 1860
bmi5 | 1852 8 8 | 1860
smoke5 | 1741 119 50 | 1860
sbp5 | 1855 5 5 | 1860
dbp5 | 1855 5 5 | 1860
glucose5 | 1754 106 98 | 1860
hdl5 | 1847 13 13 | 1860
tch5 | 1847 13 13 | 1860
trigs5 | 1847 13 13 | 1860
cholmeds15 | 1855 5 5 | 1860
dmmeds1 | 1532 328 326 | 1860
dmmeds5 | 1855 5 5 | 1860
------------------------------------------------------------------
As you can see, sometimes all of my incomplete observations are imputed, and sometimes they are not all imputed (e.g., apoe, bmi1, sbp1, etc.). My goal is to build a sound model that will be able to impute all of the incomplete values for each variable in the model.
Here is the meat of my -mi impute chained- code:
*Impute data
mi impute chained ///
(ologit, omit(i.dmmeds1 i.dmmeds5)) edu ///
(logit, omit(i.dmmeds1 i.dmmeds5)) apoe ///
(regress, omit(i.dmmeds1 i.dmmeds5)) wmhvolume ///
(regress, omit(i.dmmeds1 i.dmmeds5)) icvolume ///
(logit, omit(i.dmmeds1 i.dmmeds5)) neurocog5 ///
(regress, omit(i.dmmeds1 i.dmmeds5)) bmi1 ///
(regress, omit(i.dmmeds1 i.dmmeds5)) sbp1 ///
(regress, omit(i.dmmeds1 i.dmmeds5)) dbp1 ///
(regress, omit(i.dmmeds1 i.dmmeds5) include(c.glucose2)) glucose1 ///
(regress, omit(i.dmmeds1 i.dmmeds5)) hdl1 ///
(regress, omit(i.dmmeds1 i.dmmeds5) include(c.tch2)) tch1 ///
(regress, omit(i.dmmeds1 i.dmmeds5)) trigs1 ///
(logit, omit(i.dmmeds1 i.dmmeds5 i.smoke5 i.black)) cholmeds11 ///
(regress, omit(i.dmmeds1 i.dmmeds5)) bmi5 ///
(ologit, omit(i.dmmeds1 i.dmmeds5 i.cholmeds11) include(i.smoke4)) smoke5 ///
(regress, omit(i.dmmeds1 i.dmmeds5)) sbp5 ///
(regress, omit(i.dmmeds1 i.dmmeds5)) dbp5 ///
(regress, omit(i.dmmeds1 i.dmmeds5) include(c.glucose4)) glucose5 ///
(regress, omit(i.dmmeds1 i.dmmeds5)) hdl5 ///
(regress, omit(i.dmmeds1 i.dmmeds5)) tch5 ///
(regress, omit(i.dmmeds1 i.dmmeds5)) trigs5 ///
(logit, omit(i.dmmeds1 i.dmmeds5)) cholmeds15 ///
(logit, noimputed omit(i.smoke1 i.htnmeds1 i.htnmeds5 i.center) ///
include(c.bmi1 c.sbp1 c.dbp1 c.hdl1 c.tch1 c.trigs1)) dmmeds1 ///
(logit, noimputed omit(i.smoke1 i.htnmeds1 i.htnmeds5 i.center) ///
include(c.bmi5 c.sbp5 c.dbp5 c.hdl5 c.tch5 c.trigs5)) dmmeds5 ///
= i.male i.black i.center c.age1 c.agesq1 i.smoke1 ///
i.htnmeds1 i.htnmeds5 ///
c.fa1 c.fa2 c.fa3 c.fa4 c.fa5 c.fa6 c.fa7 ///
c.md1 c.md2 c.md3 c.md4 c.md5 c.md6 c.md7 ///
, add(5) burnin(100) rseed(`seed') augment force noisily ///
savetrace("1-data\stata\MItrace_seed`seed'.dta", replace)
My unsuccessful attempts to address this issue so far have included:
1) Varying the seed
2) Increasing the # of imputations and/or burn-ins
3) Tweaking the variable-specific mi models ("including" or "omitting" more variables)
4) Looking at the MI manual. I've looked for example tables similar to the one I pasted above ("observations per m"), and the only somewhat-helpful resource I have found is this:
Usually, the number of complete observations in the imputation sample...will be equal to the number of observations used in the estimation. Sometimes, however, observations may be dropped from the estimation—for example, if independent variables contain missing values. In this case, the number of complete observations in the imputation sample and the number of observations used in the estimation will be different, and the following note will appear following the table output: "Note: right-hand-side variables (or weights) have missing values; model parameters estimated using listwise deletion" You should evaluate such cases to verify that results are as expected. In general, missing values in independent variables (or in a weighting variable) do not affect the imputation sample but they may lead to missing imputed values.
I don't really understand, though, how "too much missingness" (my own words) could be a problem to the point that missing values are imputed with missing values. Wouldn't -mi impute- at least return a "wild guess" imputation, before it would just return another missing value? Though maybe I'm not reading this manual note correctly?
5) Googling for what feels like an eternity.
I'd really appreciate any thoughts or suggestions. Thank you.
Comment