Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple imputation for only one group in the dataset?

    Hello - long time reader, first time poster

    I have a survey of ~900 people including males and females. A small number of females (~20) were not asked a few questions on pregnancy and breastfeeding (we added those questions to the survey in week 2 of the study - oops!). I would like to use multiple imputation to address this missingness. Obviously I don't want to impute values for the males in the survey - they should continue to have missing values for these questions. I have done this successfully and conducted univariable analyses using code that looks like this, just dropping men and then running the MI:

    Code:
    preserve 
    drop if gender==1
    
    mi set mlong
    
    mi register imputed pregnant breastfeeding
        
    mi impute monotone (logit) pregnant breastfeeding = age totalchildren, add(10) rseed(12345) augment
    
    mi estimate: proportion pregnant
    mi estimate, or: xtlogit outcome pregnant, re
        
    mi estimate: proportion breastfeeding 
    mi estimate, or: xtlogit outcome breastfeeding, re
    
    restore
    But now, I would like to conduct multivariable analyses including both male and female participants, and including both the pregnancy/breastfeeding variables as well as other variables with no missingness. Is there a way to only conduct the MI for females but use the imputed values in a regression that includes males? I tried adding "replace pregnant = . if gender==1" after my "mi impute" command, but the estimates are different, I'm guessing because the imputation values are very different when including the ~400 men in the survey.

    Thank you so much for any advice!

  • #2
    I think you have two options here. You can use conditional imputation. Or, which might be simper in your case: set the values for the variables of interest to a constant for the male subsample, like:
    Code:
    replace pregnant = 999 if gender == 1
    and then run the imputation with the by option, like
    Code:
    mi impute monotone (logit) pregnant = age totalchildren, add(10) rseed(12345) augment by(gender)
    Best wishes

    (Stata 16.1 MP)

    Comment


    • #3
      Thank you so much, Felix! Unfortunately when I do the above, I am given an error when I try to run the imputation - because the values do not vary across men:

      Code:
       mi impute monotone (logit) _pregnant _breastfeeding = _age a4b, add(10) rseed(12
      > 345) augment by(_gender)
      
      Performing setup for each by() group:
      
      -> _gender = Man
      outcome does not vary; remember:
                                        0 = negative outcome,
              all other nonmissing values = positive outcome
       -- above applies to specification (logit ) _pregnant = _age a4b
      I tried to introduce variation (give some men 0s, some 1s using another variable to decide who gets which value), and the imputation then runs correctly, but I think I then need to replace the values before I do any estimation commands. I tried just using the following:

      Code:
         
       replace _pregnant = . if _gender==1
          replace _breastfeeding = . if _gender==1
      But of course this doesn't replace the imputed values, so the estimations I'm getting are way off. So, if I use that strategy, is there a way to replace the values in each m#? (I hope I'm asking this question right!)

      Thank you again
      M

      Comment


      • #4
        I think the technical difficulty can be resolved by adding:

        Code:
        mi impute monotone (logit) _pregnant _breastfeeding = ///
        _age a4b, add(10) rseed(12345) augment by(_gender, nostop)
        However, having a breastfeeding variable in a sample that includes males may have its own conceptual issues.

        Comment


        • #5
          If Andrew's solution does not help you, the Stata manual provides an example for conditional imputation as such:
          Code:
          webuse mheart10s0, clear
           mi impute chained ///
                      (pmm, knn(5)) bmi ///
                      (pmm, knn(5)) age ///
                      (logit, cond(if smokes==1) omit(i.smokes)) hightar ///
                      (logit) smokes = attack hsgrad female, add(10)
          If a person smokes, they can also smoke high tar cigarettes. However, if they do not smoke, this variable should always be 0. However, smoking itself can be missing. I think this example translates well to your problem. Just make sure that pregnancy and breastfeeding have values 0 for all men before you impute the data.


          Best wishes

          (Stata 16.1 MP)

          Comment

          Working...
          X