Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Data management regarding new variables in imputed data

    Hi All

    I have some general inquiries about data management post multiple imputation:

    I have a longitudinal dataset with imputed data (data was imputed in the wide format). Post imputation, I converted the dataset to mlong (so data stays in wide format but imputations are in rows).

    Here comes my confusion:

    I was under the impression that data management in the mlong format (such recoding or generating new variables based on existing imputed variables) wouldn't require the mi: passive. But I think this is wrong?

    For example, I have a categorical variable childhood social class, that I would like to recode to have fewer categories:

    Code:
    tab childses
    tab childses, nolab
    recode childses (1 2 = 1) (3 = 2) (4 = 3) (5 = 4), gen(childsesx)
    tab childsesx
    
    label define childsesx 1 "I&II Professional/Managerial" 2 "IV&V partly/unskilled" 3 "III non-manual" 4 "III manual" 
    label values childsesx childsesx
    tab childsesx
    I don't know how to correctly recode a categorical variable like the one above, as mi est: does not support recode. And the above only recodes for those where _mi_m==0 (In hindsight I should have done this before imputing).

    For a continuos variable like BMI, I require quadratic BMI for my models:

    mi passive: gen bmi3 = bmi3*bmi3

    But the following:

    Code:
    gen bmi3 = bmi3*bmi3
    works only for _mi_m==0 and this would effect the N in any regression model even when run with mi est:

    Code:
    mi est: bmi i.childsesx
    will not include the total N...

    What are the best rules for data management post imputation?

    Many Thanks
    /Amal

  • #2
    Originally posted by Amal Khanolkar View Post
    I have a longitudinal dataset with imputed data (data was imputed in the wide format). Post imputation, I converted the dataset to mlong (so data stays in wide format but imputations are in rows).
    Make sure to distinguish between wide format (as in: reshape wide) and (mi) wide style (as in: mi set wide); they are not the same thing!


    Originally posted by Amal Khanolkar View Post
    I was under the impression that data management in the mlong format (such recoding or generating new variables based on existing imputed variables) wouldn't require the mi: passive. But I think this is wrong?
    Yes, that is wrong (except for specific situations).


    Originally posted by Amal Khanolkar View Post
    [...] the above only recodes for those where _mi_m==0 (In hindsight I should have done this before imputing).
    I do not believe that is true either. When you store the imputed data in mlong style, only observations with missing values in at least one variable will have_mi_m > 0. However, for those observations that have missing values (in any variable), any data modifications should definitely affect the values in _mi_m > 0 observations, too. Check the results of your recode command to confirm this.

    Concerning your idea of recoding before imputation, I would not necessarily agree. It might well make sense to use the information contained in the original data to create the imputed values. This is an interesting topic, by the way, but it is probably beyond the scope of this post.


    What could you do? I would probably just convert to flong style and use recode. However, the recommend (safe) way, is probably sticking with mi passive. You can always express recode-statements in terms of cond(). For example, you could

    Code:
    mi passive : generate childsesx = ///
        cond(childses == 1, 1,        ///
        cond(childses == 2, 1,        ///
        cond(childses == 3, 2,        ///
        cond(childses == 4, 3,        ///
        cond(childses == 5, 4,        ///
             childses)))))
    Because recode is implemented in terms of generate and replace, other solutions in terms of the latter two commands are naturally possible.

    By the way, if childses holds EGP classes, remember that IV represents the self-employed (not the unskilled).
    Last edited by daniel klein; 25 Nov 2020, 13:01.

    Comment


    • #3
      Hello, I am trying to use collapse with MI data. I am imputing categorical variables (dependent variable is a count). I am trying to aggregate the data across all five imputations, but collapse wont work. Any idea on how I can perform collapse manually using mi xeq?

      Comment


      • #4
        Please do not post the same question twice in different threads/topics. Start a new thread/topic for a new question.

        Comment

        Working...
        X