Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Initializing MICE to avoid patchy imputation

    When using multiple imputation by chained equations to impute several MAR variables, some sources suggest the following steps:
    1. Use mean value imputation for all missing variables as placeholders
    2. Set the placeholder back to missing for one variable to be imputed ("var")
    3. Use regression imputation for "var" (benefitting from complete case data thanks to step 1)
    4. Repeat steps 2-3 for each variable you want to impute
    5. Repeat steps 2-4 for a given number of cycles, updating the imputations each cycle, resulting in one imputed dataset
    6. Repeat steps 1-5 for a given number of imputations
    https://onlinelibrary.wiley.com/doi/...0.1002/mpr.329

    The problem is that mi impute chained does not do steps 1 and 2!

    As a result, many values cannot be imputed because of missingness in the independent / auxiliary variables.

    Is there a way to get the STATA mi package or a 3rd party package to do step 1 and 2 as part of the mi workflow??

  • #2
    Originally posted by Jonathan Afilalo View Post
    As a result, many values cannot be imputed because of missingness in the independent / auxiliary variables.
    I think this is a misunderstanding. Please show syntax (and example data, if possible).

    My guess is that you are typing something like

    Code:
    mi imputed chained ... varname ... = varname ...
    and have missing values in variables on both sides of the equals sign. You should have registered all variables with missing values as imputed and include them to the left of the equals sign. Variables to the right of the equals sign should not have missing values.
    Last edited by daniel klein; 29 Jun 2023, 07:17.

    Comment


    • #3
      Originally posted by daniel klein View Post
      Variables to the right of the equals sign should not have missing values.
      Of course - in this example below imputation of hospitalized is incomplete because of missing values in bmi. This is precisely why some experts suggest to initialize the imputation procedure with a simple mean imputation of all missing values as a "placeholder", and then to reset one "placeholder" at a time back to missing in order to impute it using a complete dataset. See steps 1-2 in the aforementioned workflow. How can I achieve this in STATA's mi package?

      Code:
      mi set wide
      mi register hospitalized bmi
      mi impute chained (logit) hospitalized = age female bmi (regress) bmi = age female, add(20)
      Last edited by Jonathan Afilalo; 29 Jun 2023, 07:44.

      Comment


      • #4
        Simple. Code:

        Code:
        [...]
        mi impute chained (logit) hospitalized (regress) bmi = age female, add(20)

        Comment


        • #5
          Originally posted by daniel klein View Post
          Simple. Code:
          Code:
          mi impute chained (logit) hospitalized (regress) bmi = age female, add(20)
          In this example, what independent variables will STATA use to impute the dependent variables hospitalized and bmi? I assume that it will use the stated auxiliary variables age and female, but will it also use other variables in the dataset? (I may be missing something basic here or just not getting through...)

          Comment


          • #6
            Code:
            mi impute chained (logit) hospitalized (regress) bmi = age female, add(20)

            I am pretty sure that the dependent variables in any imputation are a combination of (a) the auxiliary variables + (b) the other imputed variables, equivalent to:

            Code:
            logit hospitalized age female bmi
            regress bmi age female hospitalized

            Naturally, since there are missing values for bmi, I get this error:

            Code:
            hospitalized: missing imputed values produced
                This may occur when imputation variables are used as independent variables or when independent variables contain
                missing values. You can specify option force if you wish to proceed anyway.

            If I specify force then it will work but it will generate an incomplete imputed dataset. This is exactly what steps 1-2 in the aforementioned workflow aim to circumvent!!

            Still unsolved...

            Comment


            • #7
              It's documented.

              For starters, type

              Code:
              mi impute chained ... , dryrun
              to see the model specifications. You will find something like

              Code:
              logit hospitalized bmi age female
              regress bmi i.hospitalized age female
              meaning that by default all variables are used in all equations. And, Stata does use steps 1 and 2 to fill in missing values in bmi and hospitalized where these variables appear as predictors.


              btw. female should probably be i.female

              Edit: The conditional specifications are shown differently from what I suggest here (see #9); the information is the same
              Last edited by daniel klein; 29 Jun 2023, 09:05.

              Comment


              • #8
                Originally posted by Jonathan Afilalo View Post
                Still unsolved...
                Really? Did you change your syntax as I have adviced? Please show the exact commands that you type and also the output that you get. Ideally, provide example data to reproduce the problem.

                Comment


                • #9
                  Here is an example that should replicate your case:

                  Code:
                  sysuse auto
                  
                  summarize price
                  generate expensive = price > r(mean)
                  
                  keep expensive mpg weight foreign
                  
                  set seed 42
                  
                  replace expensive = . if runiform() < .2
                  replace mpg = . if runiform() < .2
                  
                  mi set wide
                  mi register imputed expensive mpg
                  mi impute chained (logit) expensive (regress) mpg = weight i.foreign , add(5)
                  Here is the relevant output

                  Code:
                  . mi impute chained (logit) expensive (regress) mpg = weight i.foreign , add(5)
                  
                  Conditional models:
                                 mpg: regress mpg i.expensive weight i.foreign
                           expensive: logit expensive mpg weight i.foreign
                  
                  Performing chained iterations ...
                  
                  Multivariate imputation                     Imputations =        5
                  Chained equations                                 added =        5
                  Imputed: m=1 through m=5                        updated =        0
                  
                  Initialization: monotone                     Iterations =       50
                                                                  burn-in =       10
                  
                           expensive: logistic regression
                                 mpg: linear regression
                  
                  ------------------------------------------------------------------
                                     |               Observations per m             
                                     |----------------------------------------------
                            Variable |   Complete   Incomplete   Imputed |     Total
                  -------------------+-----------------------------------+----------
                           expensive |         60           14        14 |        74
                                 mpg |         62           12        12 |        74
                  ------------------------------------------------------------------
                  (Complete + Incomplete = Total; Imputed is the minimum across m
                   of the number of filled-in observations.)
                  Note how i.expensive is a predictor for mpg and mpg is a predictor for expensive. Note that both expensive and mpg have missing values. Note that all missing values are imputed.

                  Comment


                  • #10
                    I think it's OK now, thanks! (I may have missed a variable with missing values in my right hand side )

                    Was STATA doing steps 1-2 all along? Is this built-in to the standard mi impute workflow?

                    Comment


                    • #11
                      Originally posted by Jonathan Afilalo View Post
                      Was STATA doing steps 1-2 all along? Is this built-in to the standard mi impute workflow?
                      Yes, since Stata 12, which introduced mi chained.

                      I believe the misunderstanding in syntax might arise because the help file for mi impute chained (more precisely the syntax diagram in that help file) does not further explain the term indepvars. The documentation might benefit from adding something along the lines: indepvars are names of variables with no missing values that are used as predictors in all equations.

                      Comment


                      • #12
                        Yes indeed, the documentation could have benefitted from an example and explanation like this:

                        Code:
                        mi set wide
                        mi register imputed IMP_VAR1_CONT IMP_VAR2_CONT IMP_VAR3_DICHOT
                        mi impute chained (regress) IMP_VAR1_CONT IMP_VAR2_CONT (logit) IMP_VAR3_DICHOT = AUX_VAR1_CONT i.AUX_VAR2_DICHOT, add(10)
                        mi estimate : regress IMP_VAR1_CONT IMP_VAR2_CONT AUX_VAR1_CONT
                        • IMP_VARs = have missing data
                        • AUX_VARs = cannot have missing data
                        • IMP_VARs and AUX_VARs can be either dependent vars, independent vars, or vars only used to impute
                        • a given IMP_VAR will be imputed from all of the (a) other IMP_VARs & (b) AUX_VARs, unless specify omit(VAR)
                        • for example, the imputations in the example above are equivalent to these 3 regression commands:
                          • regress IMP_VAR1_CONT IMP_VAR2_CONT IMP_VAR3_DICHOT AUX_VAR1_CONT i.AUX_VAR2_DICHOT
                          • regress IMP_VAR2_CONT IMP_VAR1_CONT IMP_VAR3_DICHOT AUX_VAR1_CONT i.AUX_VAR2_DICHOT
                          • logit IMP_VAR3_DICHOT IMP_VAR1_CONT IMP_VAR2_CONT AUX_VAR1_CONT i.AUX_VAR2_DICHOT
                        • above commands ordered from most observed to least observed dependent var, unless specify orderasis

                        Comment

                        Working...
                        X