No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stratifying after multiple imputation


    I have estimated a full regression model after conducting a multiple imputation. My outcome is dichotomous (delivery) and one of my predictors is race. I would like to estimate a full regression model stratifying by race, but am unable to find the code that will do this. I have searched the forum and Stata help as suggested and have not come up with anything. I prefer to use stratification rather than interaction terms.

    Any suggestions would be greatly appreciated.

    My apologies in advance if this is too elementary of a question. I am a doctoral candidate so am somewhat inexperienced.

    Thank you in advance. Am looking forward to assistance since time is of the essence!


  • #2
    welcome to this forum.
    Including -i.race- or imposing an -if- condition would probably do the trick.
    Eventually, two asides, both covered in FAQ:
    - your chances of getting more helpful replies are conditional on posting what you typed and what Stata gave you back;
    - highlighting the urgency of your post (which, as always, is original poster's business) often decreases the likelihood for your query to be replied.
    Kind regards,
    (Stata 15.1 SE)


    • #3
      Does race contain missing values in the original dataset? If not, just impute by(race) then impose the if conditions in mi estimate, as Carlo suggests. Also as Carlo suggests, show us the code that you have used.



      • #4
        Thank you Carlo and Daniel. My apologies for not including the original code... the crux of the problem is that I'm not quite sure how to write up my code to stratify by race after the MI has been conducted (since I would like to use the full dataset).

        Race did not contain any missing values and was not one of the imputed variables. For my original full regression model, I ran 35 imputations using the following code:

        mi estimate, dots or: logistic deliverytype i.age i.race i.marital .... i.x
        I then sorted by race and attempted to use an -if- condition, but am unable to figure out the proper code. As an example, I used the following code, but I have tried placing the -if- condition in various positions with no result.

        mi estimate, dots or by (race): logistic deliverytype i.age i.marital .... i.x
        Each time I receive the following error:

        option by(race) not allowed
        Appreciative of the help,


        • #5
          Hi Roxanne,

          I got the exactly same question as you, so I'm wondering if you found out how to do the stratification after multiple imputations?
          I would love to hear from you.




          • #6
            I do not recall why I did not follow up on the original thread, but shouldn't a simple

            mi estimate : logit depvar indepvars if race==#
            do what you ask for?



            • #7
              Hi Roxanne and Milou

              The error is appearing for a good reason. The validity of multiple imputation inference depends partly on the analysis model (that you specify after mi estimate:) and imputation model (specified within mi impute) being 'compatible'. This comes from Meng's seminal paper 'Multiple-Imputation Inferences with Uncongenial Sources of Input'.

              When the two models make different assumptions, the Rubin variance estimator fails (which is what the standard errors and confidence intervals produced by mi estimate are based on). Here, your imputation model is based on the whole sample but the analysis is done separately for each value of race (each of a smaller sample size than that used in the imputation model), so the conditioning sets are different. If you wanted to stratify by a partially observed (and imputed) variable, this would be a pretty tricky problem. But since race is fully observed, you can just form separate datasets for each value of race and do imputation and analysis within these separately.

              You may be thinking 'But I want to use all the data together for imputation.' If you were to do this, you would likely get bias (the imputation model is implicitly borrowing information from one race to help impute information about another, but the analysis part is not aware of this assumption: they are incompatible) and Rubin's variance estimator also goes wrong (see above).

              In summary: if you want to stratify the analysis by race, do so before doing multiple imputation, not after.



              • #8
                While I do naturally agree with reminder that imputation and estimation model must be compatible, I am not quite so sure that you really need different (completed) datasets, stratified by race. I believe that imputing [not estimating] separately by(race), as I have suggested in #3 should be OK. I am happy to be proven wrong.

                Last edited by daniel klein; 27 Nov 2018, 10:19.


                • #9
                  Yes, you are quite right that your way is sound. Under the bonnet/hood, using by() with mi estimate is doing the imputation separately for each value of race then combining the imputed datasets back into one.

                  My suggestion was mainly a) aiming to make explicit that the imputation and analysis are being done separately for each value of race, but b) to practically avoid the errors you get when trying to use by(), if and similar with mi estimate (I recall something strange can happen when you use if with the estimation command in mi estimate... though now can't recall what it was).