Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • MI Impute Chained -- Not all incomplete values imputed

    Hi everyone,

    I am trying to use -mi impute chained- to impute missing values. At the end of the imputation process, the log shows the following table:

    ------------------------------------------------------------------
    | Observations per m
    |----------------------------------------------
    Variable | Complete Incomplete Imputed | Total
    -------------------+-----------------------------------+----------
    edu | 1858 2 2 | 1860
    apoe | 1804 56 49 | 1860
    wmhvolume | 1858 2 2 | 1860
    icvolume | 1858 2 2 | 1860
    neurocog5 | 1859 1 1 | 1860
    bmi1 | 1859 1 0 | 1860
    sbp1 | 1859 1 0 | 1860
    dbp1 | 1859 1 0 | 1860
    glucose1 | 1804 56 45 | 1860
    hdl1 | 1836 24 17 | 1860
    tch1 | 1835 25 18 | 1860
    trigs1 | 1836 24 17 | 1860
    cholmeds11 | 1840 20 19 | 1860
    bmi5 | 1852 8 8 | 1860
    smoke5 | 1741 119 50 | 1860
    sbp5 | 1855 5 5 | 1860
    dbp5 | 1855 5 5 | 1860
    glucose5 | 1754 106 98 | 1860
    hdl5 | 1847 13 13 | 1860
    tch5 | 1847 13 13 | 1860
    trigs5 | 1847 13 13 | 1860
    cholmeds15 | 1855 5 5 | 1860
    dmmeds1 | 1532 328 326 | 1860
    dmmeds5 | 1855 5 5 | 1860
    ------------------------------------------------------------------

    As you can see, sometimes all of my incomplete observations are imputed, and sometimes they are not all imputed (e.g., apoe, bmi1, sbp1, etc.). My goal is to build a sound model that will be able to impute all of the incomplete values for each variable in the model.

    Here is the meat of my -mi impute chained- code:

    *Impute data
    mi impute chained ///
    (ologit, omit(i.dmmeds1 i.dmmeds5)) edu ///
    (logit, omit(i.dmmeds1 i.dmmeds5)) apoe ///
    (regress, omit(i.dmmeds1 i.dmmeds5)) wmhvolume ///
    (regress, omit(i.dmmeds1 i.dmmeds5)) icvolume ///
    (logit, omit(i.dmmeds1 i.dmmeds5)) neurocog5 ///
    (regress, omit(i.dmmeds1 i.dmmeds5)) bmi1 ///
    (regress, omit(i.dmmeds1 i.dmmeds5)) sbp1 ///
    (regress, omit(i.dmmeds1 i.dmmeds5)) dbp1 ///
    (regress, omit(i.dmmeds1 i.dmmeds5) include(c.glucose2)) glucose1 ///
    (regress, omit(i.dmmeds1 i.dmmeds5)) hdl1 ///
    (regress, omit(i.dmmeds1 i.dmmeds5) include(c.tch2)) tch1 ///
    (regress, omit(i.dmmeds1 i.dmmeds5)) trigs1 ///
    (logit, omit(i.dmmeds1 i.dmmeds5 i.smoke5 i.black)) cholmeds11 ///
    (regress, omit(i.dmmeds1 i.dmmeds5)) bmi5 ///
    (ologit, omit(i.dmmeds1 i.dmmeds5 i.cholmeds11) include(i.smoke4)) smoke5 ///
    (regress, omit(i.dmmeds1 i.dmmeds5)) sbp5 ///
    (regress, omit(i.dmmeds1 i.dmmeds5)) dbp5 ///
    (regress, omit(i.dmmeds1 i.dmmeds5) include(c.glucose4)) glucose5 ///
    (regress, omit(i.dmmeds1 i.dmmeds5)) hdl5 ///
    (regress, omit(i.dmmeds1 i.dmmeds5)) tch5 ///
    (regress, omit(i.dmmeds1 i.dmmeds5)) trigs5 ///
    (logit, omit(i.dmmeds1 i.dmmeds5)) cholmeds15 ///
    (logit, noimputed omit(i.smoke1 i.htnmeds1 i.htnmeds5 i.center) ///
    include(c.bmi1 c.sbp1 c.dbp1 c.hdl1 c.tch1 c.trigs1)) dmmeds1 ///
    (logit, noimputed omit(i.smoke1 i.htnmeds1 i.htnmeds5 i.center) ///
    include(c.bmi5 c.sbp5 c.dbp5 c.hdl5 c.tch5 c.trigs5)) dmmeds5 ///
    = i.male i.black i.center c.age1 c.agesq1 i.smoke1 ///
    i.htnmeds1 i.htnmeds5 ///
    c.fa1 c.fa2 c.fa3 c.fa4 c.fa5 c.fa6 c.fa7 ///
    c.md1 c.md2 c.md3 c.md4 c.md5 c.md6 c.md7 ///
    , add(5) burnin(100) rseed(`seed') augment force noisily ///
    savetrace("1-data\stata\MItrace_seed`seed'.dta", replace)

    My unsuccessful attempts to address this issue so far have included:
    1) Varying the seed
    2) Increasing the # of imputations and/or burn-ins
    3) Tweaking the variable-specific mi models ("including" or "omitting" more variables)
    4) Looking at the MI manual. I've looked for example tables similar to the one I pasted above ("observations per m"), and the only somewhat-helpful resource I have found is this:
    Usually, the number of complete observations in the imputation sample...will be equal to the number of observations used in the estimation. Sometimes, however, observations may be dropped from the estimation—for example, if independent variables contain missing values. In this case, the number of complete observations in the imputation sample and the number of observations used in the estimation will be different, and the following note will appear following the table output: "Note: right-hand-side variables (or weights) have missing values; model parameters estimated using listwise deletion" You should evaluate such cases to verify that results are as expected. In general, missing values in independent variables (or in a weighting variable) do not affect the imputation sample but they may lead to missing imputed values.

    I don't really understand, though, how "too much missingness" (my own words) could be a problem to the point that missing values are imputed with missing values. Wouldn't -mi impute- at least return a "wild guess" imputation, before it would just return another missing value? Though maybe I'm not reading this manual note correctly?

    5) Googling for what feels like an eternity.

    I'd really appreciate any thoughts or suggestions. Thank you.

  • #2
    Being new to the StataList fora, I'm not sure if it's frowned upon to BUMP a thread... But I'm going to ask for forgiveness instead of permission.

    Comment


    • #3
      I don't know about others, but I had a lot of trouble reading your question - so I stopped; please read the FAQ, esp. the part about (1) using CODE delimiters to make posts easier to read and (2) the advice on how to ask a question - bumping something that takes a great deal of work to even read will not help

      Comment


      • #4
        Originally posted by Rich Goldstein View Post
        I don't know about others, but I had a lot of trouble reading your question - so I stopped; please read the FAQ, esp. the part about (1) using CODE delimiters to make posts easier to read and (2) the advice on how to ask a question - bumping something that takes a great deal of work to even read will not help
        My goal:

        I'd like to use Stata's mi impute chained to impute all missing values across all imputed datasets.

        My problem:

        In the table that Stata returns at the end of mi impute chained, I see that not all incomplete values were imputed. Here is a subset of the table, reformatted in Excel:

        Observations per m
        variable complete incomplete imputed total
        dbp1 1859 1 0 1860
        glucose1 1804 56 44 1860
        hdl1 1836 24 16 1860
        tch1 1835 25 17 1860
        trigs1 1836 24 16 1860
        cholmeds1 1840 20 19 1860
        I am concerned that not all of the incomplete observations were imputed.

        My code:
        Code:
        *Impute data
        mi impute chained ///
             (ologit) edu ///
             (logit) apoe ///
             (regress) wmhvolume ///
             (regress) icvolume ///
             (logit) neurocog5 ///
             (regress) bmi1 ///
             (regress) sbp1 ///
             (regress) dbp1 ///
             (regress, include(c.glucose2)) glucose1 ///
             (regress) hdl1 ///
             (regress, include(c.tch2)) tch1 ///
             (regress) trigs1 ///
             (logit, omit(i.smoke5 i.black)) cholmeds11 ///
             (regress) bmi5 ///
             (ologit, omit(i.cholmeds11) include(i.smoke4)) smoke5 ///
             (regress) sbp5 ///
             (regress) dbp5 ///
             (regress, include(c.glucose4)) glucose5 ///
             (regress) hdl5 ///
             (regress) tch5 ///
             (regress) trigs5 ///
             (logit) cholmeds15 ///
             (logit, noimputed omit(i.smoke1 i.htnmeds1 i.htnmeds5 i.center) ///
                    include(c.bmi1 c.sbp1 c.dbp1 c.hdl1 c.tch1 c.trigs1)) dmmeds1 ///
             (logit, noimputed omit(i.smoke1 i.htnmeds1 i.htnmeds5 i.center) ///
                    include(c.bmi5 c.sbp5 c.dbp5 c.hdl5 c.tch5 c.trigs5)) dmmeds5 ///
             = i.male i.black i.center c.age1 c.agesq1 i.smoke1 ///
             i.htnmeds1 i.htnmeds5 ///
             c.fa1 c.fa2 c.fa3 c.fa4 c.fa5 c.fa6 c.fa7 ///
             c.md1 c.md2 c.md3 c.md4 c.md5 c.md6 c.md7 ///
             , add(5) burnin(100) rseed(`seed') augment force noisily ///
             savetrace("1-data\stata\MItrace_seed`seed'.dta", replace)
        *This code is edited from the OP, but it still yields the same problem.

        My previous unsuccessful attempts to solve the problem:

        1) Varying the seed
        2) Increasing the number of imputations and/or burn-ins
        3) Including or Omitting additional variables in the imputation equations
        4) Looking through the Stata MI manual for the "observations per m" table, like the one I posted above. Generally, these tables were only accompanied by notes explaining that Stata had, indeed, imputed all incomplete missing values. I did not find much in the way of useful information regarding how to troubleshoot in the event that Stata had not imputed all incomplete missing values. The one exception to this was the following passage:

        "Usually, the number of complete observations in the imputation sample...will be equal to the number of observations used in the estimation. Sometimes, however, observations may be dropped from the estimation—for example, if independent variables contain missing values. In this case, the number of complete observations in the imputation sample and the number of observations used in the estimation will be different, and the following note will appear following the table output: "Note: right-hand-side variables (or weights) have missing values; model parameters estimated using listwise deletion" You should evaluate such cases to verify that results are as expected. In general, missing values in independent variables (or in a weighting variable) do not affect the imputation sample but they may lead to missing imputed values."


        I do not understand why missing values in the independent variables would lead to missing values, per the bolded portion of the note above. Regardless, I would like still like more information regarding how I might go about addressing that issue if it does exist.

        ___

        Thank you for your help and for taking the time to read this. Please let me know if the question, or the formatting of the question, is unclear.

        Comment


        • #5
          I have had better luck using "ice" (type "search ice" to find and download) than using "mi import chained"; I don't know why; after using ice, the results can be imported using the "mi import" command; I don't know why your example failed - but thank you for making it legible

          Comment


          • #6
            In case anyone else runs into this problem, too, I finally figured it out (or at least, figured it out in my case). Maddeningly simple, but important nonetheless...

            Identifying the problem: All of the variables you "include" in your variable-specific MI regressions need to be free of missing values. Otherwise, you just impute missing values with new missing values. So, for instance, in my code above, if I "include" smoke4 in the model to impute smoke5, then smoke4 can't have any missing values, or else it might not be able to impute the missing values for smoke5.

            Identifying the solution: Instead of "including" ancillary variables in variable-specific MI regressions, just put those variables in as additional variables to be imputed (and "omit" them, if necessary, from the other models, instead). That way, the missing values of the ancillary variables will be imputed, too, which in turn allows mi impute chained to impute all missing values for the "main" variables.

            Hope this helps.

            Comment

            Working...
            X