Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Running MICE with birthyear or age?

    Hi all,

    This is a bit of a general question and I may be overlooking a rather simple answer. I currently have a wide format dataset in which all respondents have a single variable measuring year of birth. Further, every wave has an age variable which captures the respondent's age in years, but if the respondent is missing in a particular wave, the age for the wave is system missing.

    When running a multiple imputation, should I use birthyear as a variable that has full information and place it to the right of equals sign in the MICE command? Or should I manually impute age based on birth year and known interview year when respondent's were not present in the data, and impute using the wave-specific measures?

    I hope this makes sense and I can provide data examples if needed.

  • #2
    If you can derive age when it's missing at a given wave then you should do that instead. It makes little sense to use imputation in this case.

    Comment


    • #3
      I do not plan to use multiple imputation for age, but use age as an auxiliary variable in the multiple imputation equation.

      Comment


      • #4
        Are you going to impute in a wide format, i.e., with one observation per respondent (as often suggested)? If so, then multiple age variables that all increase by exactly 1 for all observations in each wave will be collinear and, thus, dropped from the model.

        Comment


        • #5
          Yes, the data is currently in wide format for that reason. So continue on with birth year in the equation?

          Comment


          • #6
            Given the wide format, birth year and age are collinear, i.e. contain the same information; it does not matter which one you use. You just cannot use both or use age in more than one wave.

            Comment


            • #7
              Thanks, Daniel! I just wasn't sure if imputing from age at a single year (ex: 55 years old) would have significantly different results from imputing on birth year (ex: 1968) given the difference in distribution.

              Comment


              • #8
                I do not completely follow. Age in years and birth year should have the exact same distribution in terms of 2nd, 3rd, and 4th moment. The mean will obviously differ but that just changes the constant in the regression model. Anyway, the results should not be identical.

                Comment

                Working...
                X