Using mi xeq for making new variables

Senor Massao

Join Date: Dec 2015

Posts: 26
#1

Using mi xeq for making new variables

22 Mar 2016, 20:59

Hi,
I am facing some problems in making a new variable in multiple imputed dataset.
I need to make a new variable (h) that is equal to 1 if x1-x5 are all equal to 1. Else =0 (excluding any missing on x1-x5, due to missing values in m=0 dataset).
I used the following commands, but the total observations of the variable ‘h’ in datasets m=1-100 is still equal to the total observations in m=0 dataset (basically these commands did nothing for those with missing values in datasets m=1-100).
mi xeq: gen h =1 if !missing(x1, x2, x3, x4, x5)
mi xeq: replace h=0 if (x1 == 1 & x2 == 1 & x3 == 1 & x4 == 1 & x5 == 1)
Can anyone tell how to fix this?

Thankfully,
Massao
Tags: categorical, data, multiple imputation
Clyde Schechter

Join Date: Apr 2014

Posts: 30191
#2

22 Mar 2016, 21:47

So, pretend you are Stata following these commands. If none of x1 through x5 is missing, you set h = 1. Then if all of those x1 through x5 are equal to 1, you change it to 0. Notice that you haven't been given any commands for what to do if any of the x's are missing: so those still have missing values!

So you need to give another command to cover the case where one or more of the x1-x5 are missing. From your written description, I can't figure out what it is you want to do in that case. But you need to figure that out and then tell Stata with an additional command to cover that case.

I should also add that your commands for creating h seem to be opposite to what you say in words. Your words say that h should be equal to 1 if x1-x5 are all equal to 1, but your command sets it to zero in that case.
Comment
Senor Massao

Join Date: Dec 2015

Posts: 26
#3

23 Mar 2016, 00:24

Hi,

Thank you very much for the detailed reply. Sorry for the typo. I completely wrote the opposite earlier. I need to make a new variable (h) that is equal to 0 if x1-x5 are all equal to 1. Else =1 (excluding any missing on x1-x5, due to missing values in m=0 dataset).

x1-x5 are different domains of health. All taking the value from 1 to 3. Where, 1 represents healthy, and 2, and 3 represents relatively unhealthy on that domain. The objective is to create a binary variable (h) that represents those with perfect health (h=0) (i.e taking the value 1 on all x1-x5 variables), and for all others (except those with any missing on x1-x5), the value 1. I have used –MI- for multiple imputations on x1-x5 variables. But I need to perform analysis for the unimputed data (excluding missing), and the data with multiple imputations, for comparison.

When I do this:
gen h =1 if !missing(x1, x2, x3, x4, x5)
replace h=0 if (x1 == 1 & x2 == 1 & x3 == 1 & x4 == 1 & x5 == 1)
It works on m=0, that I obtain a binary variable which classifies missing to people with missing values on any of x1-x5, and gives 0 value for those with perfect health, and remaining with any other combination of values on x1-x5, the 1 value.

Probably the problem is that in the commands below, stata still looks at x1-x5 in m=0, and not in the corresponding values in m=1-100. So I think I need to tell stata somehow that it should look for x1-x5 in m=0 only when it makes changes in m=0, but for m=1-100, it should look for values in corresponding m=1-100.

mi xeq: gen h =1 if !missing(x1, x2, x3, x4, x5)
mi xeq: replace h=0 if (x1 == 1 & x2 == 1 & x3 == 1 & x4 == 1 & x5 == 1)

I would appreciate any clues on how to do that.

Thankfully,
Massao.
Comment
Oded Mcdossi

Join Date: Jun 2014

Posts: 577
#4

23 Mar 2016, 01:51

These kind of variables are easier to do before the imputation procedure. After the imputation all the observation will match the same results in all the datasets.
Assuming your data is in long format (mi set flong) and you want to refer only to variables with full data what you need to tell Stata to use only the information from the _mi_m==0 data. something like that:

Code:

bys _mi_id (_mi_m): g H_cap=h[1]

otherwise Stata will generate variables according to the unique imputation in each of the mi datasets.
Comment
Senor Massao

Join Date: Dec 2015

Posts: 26
#5

23 Mar 2016, 02:43

Thank you for the reply. The data is in wide format.
Comment

Oded Mcdossi

Join Date: Jun 2014
Posts: 577

23 Mar 2016, 03:35

I think there is some confusion with the terms, your data is in wide format but, according to your description in #1 and #3, your imputation data is in long format. See this example on a fake data

Code:

//CREATE FAKE DATA
clear*
set obs 50
g id=_n
forval i=1/3 {
    gen x`i' = floor((4-1+1)*runiform() + 1)
    replace x`i'=. if x`i'==4
}    
g b6=runiform()
g b7=runiform()
su x*

//IMPUTATION OF MISSING VALUES

mi set flong
set seed 1234
mi register imputed x*
mi register regular b*
mi impute chain (mlogit) x*=b*, augment add(3) 

//GEN THE INDICATOR
mi xeq: gen h =1 if !missing(x1,x2)
mi xeq: replace h=0 if (x1 == 1 & x2 == 1)

//THE PROPOSED SOLUTION TO FLAG OBSERVATIONS
bys _mi_id (_mi_m): g H_cap=h[1] 

// COMPARE THE RESULTS
list _mi_m x1 x2 h H_cap if _mi_id<5, sepby(_mi_id)

Comment

Senor Massao

Join Date: Dec 2015

Posts: 26
#7

24 Mar 2016, 23:07

Thank you very much.
I tried the folowing commands, and it worked.
Code:

mi passive: generate h = 1 if (x1<. & 2<. & x3<. & x4<. & x5<.)
mi xeq: replace h=0 if (x1 == 1 & x2 == 1 & x3 == 1 & x4 == 1 & x5 == 1)
Comment

Announcement

Using mi xeq for making new variables

Comment

Comment

Comment

Comment

Comment

Comment