Data management regarding new variables in imputed data

Amal Khanolkar

Join Date: Feb 2015

Posts: 142
#1

Data management regarding new variables in imputed data

25 Nov 2020, 11:16

Hi All

I have some general inquiries about data management post multiple imputation:

I have a longitudinal dataset with imputed data (data was imputed in the wide format). Post imputation, I converted the dataset to mlong (so data stays in wide format but imputations are in rows).

Here comes my confusion:

I was under the impression that data management in the mlong format (such recoding or generating new variables based on existing imputed variables) wouldn't require the mi: passive. But I think this is wrong?

For example, I have a categorical variable childhood social class, that I would like to recode to have fewer categories:

Code:

tab childses tab childses, nolab recode childses (1 2 = 1) (3 = 2) (4 = 3) (5 = 4), gen(childsesx) tab childsesx label define childsesx 1 "I&II Professional/Managerial" 2 "IV&V partly/unskilled" 3 "III non-manual" 4 "III manual" label values childsesx childsesx tab childsesx

I don't know how to correctly recode a categorical variable like the one above, as mi est: does not support recode. And the above only recodes for those where _mi_m==0 (In hindsight I should have done this before imputing).

For a continuos variable like BMI, I require quadratic BMI for my models:

mi passive: gen bmi3 = bmi3*bmi3

But the following:

Code:

gen bmi3 = bmi3*bmi3

works only for _mi_m==0 and this would effect the N in any regression model even when run with mi est:

Code:

mi est: bmi i.childsesx

will not include the total N...

What are the best rules for data management post imputation?

Many Thanks
/Amal
Tags: None
daniel klein

Join Date: Mar 2014

Posts: 3860
#2

25 Nov 2020, 12:57

Originally posted by Amal Khanolkar View Post

I have a longitudinal dataset with imputed data (data was imputed in the wide format). Post imputation, I converted the dataset to mlong (so data stays in wide format but imputations are in rows).

Make sure to distinguish between wide format (as in: reshape wide) and (mi) wide style (as in: mi set wide); they are not the same thing!

Originally posted by Amal Khanolkar View Post

I was under the impression that data management in the mlong format (such recoding or generating new variables based on existing imputed variables) wouldn't require the mi: passive. But I think this is wrong?

Yes, that is wrong (except for specific situations).

Originally posted by Amal Khanolkar View Post

[...] the above only recodes for those where _mi_m==0 (In hindsight I should have done this before imputing).

I do not believe that is true either. When you store the imputed data in mlong style, only observations with missing values in at least one variable will have_mi_m > 0. However, for those observations that have missing values (in any variable), any data modifications should definitely affect the values in _mi_m > 0 observations, too. Check the results of your recode command to confirm this.

Concerning your idea of recoding before imputation, I would not necessarily agree. It might well make sense to use the information contained in the original data to create the imputed values. This is an interesting topic, by the way, but it is probably beyond the scope of this post.

What could you do? I would probably just convert to flong style and use recode. However, the recommend (safe) way, is probably sticking with mi passive. You can always express recode-statements in terms of cond(). For example, you could

Code:

mi passive : generate childsesx = /// cond(childses == 1, 1, /// cond(childses == 2, 1, /// cond(childses == 3, 2, /// cond(childses == 4, 3, /// cond(childses == 5, 4, /// childses)))))

Because recode is implemented in terms of generate and replace, other solutions in terms of the latter two commands are naturally possible.

By the way, if childses holds EGP classes, remember that IV represents the self-employed (not the unskilled).

Last edited by daniel klein; 25 Nov 2020, 13:01.
1 like
Comment
Mohammad Mansour

Join Date: Jan 2021

Posts: 21
#3

23 Jan 2021, 15:13

Hello, I am trying to use collapse with MI data. I am imputing categorical variables (dependent variable is a count). I am trying to aggregate the data across all five imputations, but collapse wont work. Any idea on how I can perform collapse manually using mi xeq?
Comment
daniel klein

Join Date: Mar 2014

Posts: 3860
#4

23 Jan 2021, 15:40

Please do not post the same question twice in different threads/topics. Start a new thread/topic for a new question.
Comment

Announcement

Data management regarding new variables in imputed data

Comment

Comment

Comment