Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using mi xeq for making new variables

    Hi,
    I am facing some problems in making a new variable in multiple imputed dataset.
    I need to make a new variable (h) that is equal to 1 if x1-x5 are all equal to 1. Else =0 (excluding any missing on x1-x5, due to missing values in m=0 dataset).
    I used the following commands, but the total observations of the variable ‘h’ in datasets m=1-100 is still equal to the total observations in m=0 dataset (basically these commands did nothing for those with missing values in datasets m=1-100).
    mi xeq: gen h =1 if !missing(x1, x2, x3, x4, x5)
    mi xeq: replace h=0 if (x1 == 1 & x2 == 1 & x3 == 1 & x4 == 1 & x5 == 1)
    Can anyone tell how to fix this?

    Thankfully,
    Massao

  • #2
    So, pretend you are Stata following these commands. If none of x1 through x5 is missing, you set h = 1. Then if all of those x1 through x5 are equal to 1, you change it to 0. Notice that you haven't been given any commands for what to do if any of the x's are missing: so those still have missing values!

    So you need to give another command to cover the case where one or more of the x1-x5 are missing. From your written description, I can't figure out what it is you want to do in that case. But you need to figure that out and then tell Stata with an additional command to cover that case.

    I should also add that your commands for creating h seem to be opposite to what you say in words. Your words say that h should be equal to 1 if x1-x5 are all equal to 1, but your command sets it to zero in that case.

    Comment


    • #3
      Hi,

      Thank you very much for the detailed reply. Sorry for the typo. I completely wrote the opposite earlier. I need to make a new variable (h) that is equal to 0 if x1-x5 are all equal to 1. Else =1 (excluding any missing on x1-x5, due to missing values in m=0 dataset).

      x1-x5 are different domains of health. All taking the value from 1 to 3. Where, 1 represents healthy, and 2, and 3 represents relatively unhealthy on that domain. The objective is to create a binary variable (h) that represents those with perfect health (h=0) (i.e taking the value 1 on all x1-x5 variables), and for all others (except those with any missing on x1-x5), the value 1. I have used –MI- for multiple imputations on x1-x5 variables. But I need to perform analysis for the unimputed data (excluding missing), and the data with multiple imputations, for comparison.

      When I do this:
      gen h =1 if !missing(x1, x2, x3, x4, x5)
      replace h=0 if (x1 == 1 & x2 == 1 & x3 == 1 & x4 == 1 & x5 == 1)
      It works on m=0, that I obtain a binary variable which classifies missing to people with missing values on any of x1-x5, and gives 0 value for those with perfect health, and remaining with any other combination of values on x1-x5, the 1 value.

      Probably the problem is that in the commands below, stata still looks at x1-x5 in m=0, and not in the corresponding values in m=1-100. So I think I need to tell stata somehow that it should look for x1-x5 in m=0 only when it makes changes in m=0, but for m=1-100, it should look for values in corresponding m=1-100.

      mi xeq: gen h =1 if !missing(x1, x2, x3, x4, x5)
      mi xeq: replace h=0 if (x1 == 1 & x2 == 1 & x3 == 1 & x4 == 1 & x5 == 1)

      I would appreciate any clues on how to do that.

      Thankfully,
      Massao.

      Comment


      • #4
        These kind of variables are easier to do before the imputation procedure. After the imputation all the observation will match the same results in all the datasets.
        Assuming your data is in long format (mi set flong) and you want to refer only to variables with full data what you need to tell Stata to use only the information from the _mi_m==0 data. something like that:
        Code:
        bys _mi_id (_mi_m): g H_cap=h[1]
        otherwise Stata will generate variables according to the unique imputation in each of the mi datasets.

        Comment


        • #5
          Thank you for the reply. The data is in wide format.

          Comment


          • #6
            I think there is some confusion with the terms, your data is in wide format but, according to your description in #1 and #3, your imputation data is in long format. See this example on a fake data
            Code:
            //CREATE FAKE DATA
            clear*
            set obs 50
            g id=_n
            forval i=1/3 {
                gen x`i' = floor((4-1+1)*runiform() + 1)
                replace x`i'=. if x`i'==4
            }    
            g b6=runiform()
            g b7=runiform()
            su x*
            
            //IMPUTATION OF MISSING VALUES
            
            mi set flong
            set seed 1234
            mi register imputed x*
            mi register regular b*
            mi impute chain (mlogit) x*=b*, augment add(3) 
            
            //GEN THE INDICATOR
            mi xeq: gen h =1 if !missing(x1,x2)
            mi xeq: replace h=0 if (x1 == 1 & x2 == 1)
            
            //THE PROPOSED SOLUTION TO FLAG OBSERVATIONS
            bys _mi_id (_mi_m): g H_cap=h[1] 
            
            // COMPARE THE RESULTS
            list _mi_m x1 x2 h H_cap if _mi_id<5, sepby(_mi_id)

            Comment


            • #7
              Thank you very much.
              I tried the folowing commands, and it worked.
              Code:

              mi passive: generate h = 1 if (x1<. & 2<. & x3<. & x4<. & x5<.)
              mi xeq: replace h=0 if (x1 == 1 & x2 == 1 & x3 == 1 & x4 == 1 & x5 == 1)

              Comment

              Working...
              X