MI Impute Chained -- Not all incomplete values imputed

Jonathan Tingle

Join Date: Sep 2015

Posts: 4
#1

MI Impute Chained -- Not all incomplete values imputed

09 Sep 2015, 07:53

Hi everyone,

I am trying to use -mi impute chained- to impute missing values. At the end of the imputation process, the log shows the following table:

------------------------------------------------------------------
| Observations per m
|----------------------------------------------
Variable | Complete Incomplete Imputed | Total
-------------------+-----------------------------------+----------
edu | 1858 2 2 | 1860
apoe | 1804 56 49 | 1860
wmhvolume | 1858 2 2 | 1860
icvolume | 1858 2 2 | 1860
neurocog5 | 1859 1 1 | 1860
bmi1 | 1859 1 0 | 1860
sbp1 | 1859 1 0 | 1860
dbp1 | 1859 1 0 | 1860
glucose1 | 1804 56 45 | 1860
hdl1 | 1836 24 17 | 1860
tch1 | 1835 25 18 | 1860
trigs1 | 1836 24 17 | 1860
cholmeds11 | 1840 20 19 | 1860
bmi5 | 1852 8 8 | 1860
smoke5 | 1741 119 50 | 1860
sbp5 | 1855 5 5 | 1860
dbp5 | 1855 5 5 | 1860
glucose5 | 1754 106 98 | 1860
hdl5 | 1847 13 13 | 1860
tch5 | 1847 13 13 | 1860
trigs5 | 1847 13 13 | 1860
cholmeds15 | 1855 5 5 | 1860
dmmeds1 | 1532 328 326 | 1860
dmmeds5 | 1855 5 5 | 1860
------------------------------------------------------------------

As you can see, sometimes all of my incomplete observations are imputed, and sometimes they are not all imputed (e.g., apoe, bmi1, sbp1, etc.). My goal is to build a sound model that will be able to impute all of the incomplete values for each variable in the model.

Here is the meat of my -mi impute chained- code:

*Impute data
mi impute chained ///
(ologit, omit(i.dmmeds1 i.dmmeds5)) edu ///
(logit, omit(i.dmmeds1 i.dmmeds5)) apoe ///
(regress, omit(i.dmmeds1 i.dmmeds5)) wmhvolume ///
(regress, omit(i.dmmeds1 i.dmmeds5)) icvolume ///
(logit, omit(i.dmmeds1 i.dmmeds5)) neurocog5 ///
(regress, omit(i.dmmeds1 i.dmmeds5)) bmi1 ///
(regress, omit(i.dmmeds1 i.dmmeds5)) sbp1 ///
(regress, omit(i.dmmeds1 i.dmmeds5)) dbp1 ///
(regress, omit(i.dmmeds1 i.dmmeds5) include(c.glucose2)) glucose1 ///
(regress, omit(i.dmmeds1 i.dmmeds5)) hdl1 ///
(regress, omit(i.dmmeds1 i.dmmeds5) include(c.tch2)) tch1 ///
(regress, omit(i.dmmeds1 i.dmmeds5)) trigs1 ///
(logit, omit(i.dmmeds1 i.dmmeds5 i.smoke5 i.black)) cholmeds11 ///
(regress, omit(i.dmmeds1 i.dmmeds5)) bmi5 ///
(ologit, omit(i.dmmeds1 i.dmmeds5 i.cholmeds11) include(i.smoke4)) smoke5 ///
(regress, omit(i.dmmeds1 i.dmmeds5)) sbp5 ///
(regress, omit(i.dmmeds1 i.dmmeds5)) dbp5 ///
(regress, omit(i.dmmeds1 i.dmmeds5) include(c.glucose4)) glucose5 ///
(regress, omit(i.dmmeds1 i.dmmeds5)) hdl5 ///
(regress, omit(i.dmmeds1 i.dmmeds5)) tch5 ///
(regress, omit(i.dmmeds1 i.dmmeds5)) trigs5 ///
(logit, omit(i.dmmeds1 i.dmmeds5)) cholmeds15 ///
(logit, noimputed omit(i.smoke1 i.htnmeds1 i.htnmeds5 i.center) ///
include(c.bmi1 c.sbp1 c.dbp1 c.hdl1 c.tch1 c.trigs1)) dmmeds1 ///
(logit, noimputed omit(i.smoke1 i.htnmeds1 i.htnmeds5 i.center) ///
include(c.bmi5 c.sbp5 c.dbp5 c.hdl5 c.tch5 c.trigs5)) dmmeds5 ///
= i.male i.black i.center c.age1 c.agesq1 i.smoke1 ///
i.htnmeds1 i.htnmeds5 ///
c.fa1 c.fa2 c.fa3 c.fa4 c.fa5 c.fa6 c.fa7 ///
c.md1 c.md2 c.md3 c.md4 c.md5 c.md6 c.md7 ///
, add(5) burnin(100) rseed(`seed') augment force noisily ///
savetrace("1-data\stata\MItrace_seed`seed'.dta", replace)

My unsuccessful attempts to address this issue so far have included:
1) Varying the seed
2) Increasing the # of imputations and/or burn-ins
3) Tweaking the variable-specific mi models ("including" or "omitting" more variables)
4) Looking at the MI manual. I've looked for example tables similar to the one I pasted above ("observations per m"), and the only somewhat-helpful resource I have found is this:
Usually, the number of complete observations in the imputation sample...will be equal to the number of observations used in the estimation. Sometimes, however, observations may be dropped from the estimation—for example, if independent variables contain missing values. In this case, the number of complete observations in the imputation sample and the number of observations used in the estimation will be different, and the following note will appear following the table output: "Note: right-hand-side variables (or weights) have missing values; model parameters estimated using listwise deletion" You should evaluate such cases to verify that results are as expected. In general, missing values in independent variables (or in a weighting variable) do not affect the imputation sample but they may lead to missing imputed values.

I don't really understand, though, how "too much missingness" (my own words) could be a problem to the point that missing values are imputed with missing values. Wouldn't -mi impute- at least return a "wild guess" imputation, before it would just return another missing value? Though maybe I'm not reading this manual note correctly?

5) Googling for what feels like an eternity.

I'd really appreciate any thoughts or suggestions. Thank you.
Tags: multiple imputation
Jonathan Tingle

Join Date: Sep 2015

Posts: 4
#2

16 Sep 2015, 09:09

Being new to the StataList fora, I'm not sure if it's frowned upon to BUMP a thread... But I'm going to ask for forgiveness instead of permission.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4460
#3

16 Sep 2015, 11:21

I don't know about others, but I had a lot of trouble reading your question - so I stopped; please read the FAQ, esp. the part about (1) using CODE delimiters to make posts easier to read and (2) the advice on how to ask a question - bumping something that takes a great deal of work to even read will not help
Comment

Jonathan Tingle

Join Date: Sep 2015
Posts: 4

16 Sep 2015, 12:52

Originally posted by Rich Goldstein View Post

I don't know about others, but I had a lot of trouble reading your question - so I stopped; please read the FAQ, esp. the part about (1) using CODE delimiters to make posts easier to read and (2) the advice on how to ask a question - bumping something that takes a great deal of work to even read will not help

My goal:

I'd like to use Stata's mi impute chained to impute all missing values across all imputed datasets.

My problem:

In the table that Stata returns at the end of mi impute chained, I see that not all incomplete values were imputed. Here is a subset of the table, reformatted in Excel:

Observations per m
variable	complete	incomplete	imputed	total
dbp1	1859	1	0	1860
glucose1	1804	56	44	1860
hdl1	1836	24	16	1860
tch1	1835	25	17	1860
trigs1	1836	24	16	1860
cholmeds1	1840	20	19	1860

I am concerned that not all of the incomplete observations were imputed.

My code:

Code:

*Impute data
mi impute chained ///
     (ologit) edu ///
     (logit) apoe ///
     (regress) wmhvolume ///
     (regress) icvolume ///
     (logit) neurocog5 ///
     (regress) bmi1 ///
     (regress) sbp1 ///
     (regress) dbp1 ///
     (regress, include(c.glucose2)) glucose1 ///
     (regress) hdl1 ///
     (regress, include(c.tch2)) tch1 ///
     (regress) trigs1 ///
     (logit, omit(i.smoke5 i.black)) cholmeds11 ///
     (regress) bmi5 ///
     (ologit, omit(i.cholmeds11) include(i.smoke4)) smoke5 ///
     (regress) sbp5 ///
     (regress) dbp5 ///
     (regress, include(c.glucose4)) glucose5 ///
     (regress) hdl5 ///
     (regress) tch5 ///
     (regress) trigs5 ///
     (logit) cholmeds15 ///
     (logit, noimputed omit(i.smoke1 i.htnmeds1 i.htnmeds5 i.center) ///
            include(c.bmi1 c.sbp1 c.dbp1 c.hdl1 c.tch1 c.trigs1)) dmmeds1 ///
     (logit, noimputed omit(i.smoke1 i.htnmeds1 i.htnmeds5 i.center) ///
            include(c.bmi5 c.sbp5 c.dbp5 c.hdl5 c.tch5 c.trigs5)) dmmeds5 ///
     = i.male i.black i.center c.age1 c.agesq1 i.smoke1 ///
     i.htnmeds1 i.htnmeds5 ///
     c.fa1 c.fa2 c.fa3 c.fa4 c.fa5 c.fa6 c.fa7 ///
     c.md1 c.md2 c.md3 c.md4 c.md5 c.md6 c.md7 ///
     , add(5) burnin(100) rseed(`seed') augment force noisily ///
     savetrace("1-data\stata\MItrace_seed`seed'.dta", replace)

*This code is edited from the OP, but it still yields the same problem.

My previous unsuccessful attempts to solve the problem:

1) Varying the seed
2) Increasing the number of imputations and/or burn-ins
3) Including or Omitting additional variables in the imputation equations
4) Looking through the Stata MI manual for the "observations per m" table, like the one I posted above. Generally, these tables were only accompanied by notes explaining that Stata had, indeed, imputed all incomplete missing values. I did not find much in the way of useful information regarding how to troubleshoot in the event that Stata had not imputed all incomplete missing values. The one exception to this was the following passage:

"Usually, the number of complete observations in the imputation sample...will be equal to the number of observations used in the estimation. Sometimes, however, observations may be dropped from the estimation—for example, if independent variables contain missing values. In this case, the number of complete observations in the imputation sample and the number of observations used in the estimation will be different, and the following note will appear following the table output: "Note: right-hand-side variables (or weights) have missing values; model parameters estimated using listwise deletion" You should evaluate such cases to verify that results are as expected. In general, missing values in independent variables (or in a weighting variable) do not affect the imputation sample but they may lead to missing imputed values."

I do not understand why missing values in the independent variables would lead to missing values, per the bolded portion of the note above. Regardless, I would like still like more information regarding how I might go about addressing that issue if it does exist.

___

Thank you for your help and for taking the time to read this. Please let me know if the question, or the formatting of the question, is unclear.

Comment

Rich Goldstein

Join Date: Mar 2014

Posts: 4460
#5

16 Sep 2015, 13:28

I have had better luck using "ice" (type "search ice" to find and download) than using "mi import chained"; I don't know why; after using ice, the results can be imported using the "mi import" command; I don't know why your example failed - but thank you for making it legible
Comment
Jonathan Tingle

Join Date: Sep 2015

Posts: 4
#6

30 Sep 2015, 08:28

In case anyone else runs into this problem, too, I finally figured it out (or at least, figured it out in my case). Maddeningly simple, but important nonetheless...

Identifying the problem: All of the variables you "include" in your variable-specific MI regressions need to be free of missing values. Otherwise, you just impute missing values with new missing values. So, for instance, in my code above, if I "include" smoke4 in the model to impute smoke5, then smoke4 can't have any missing values, or else it might not be able to impute the missing values for smoke5.

Identifying the solution: Instead of "including" ancillary variables in variable-specific MI regressions, just put those variables in as additional variables to be imputed (and "omit" them, if necessary, from the other models, instead). That way, the missing values of the ancillary variables will be imputed, too, which in turn allows mi impute chained to impute all missing values for the "main" variables.

Hope this helps.
Comment

Announcement