Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create new variable by combining different variables

    Hello,

    I hope someone can help me. I want to create a new variable by combining 5 other variables. More specifically: the 5 variables I want to combine all have the values either "TRUE" or "FALSE". Now I would like to create a new variable where I can combine these 5 so that if in any of this 5 variables I have a "TRUE" a "TRUE" will show in the new variable and if I have a "FALSE" in all five variables a "FALSE" will appear in the new variable.
    FALSE=1 and TRUE=2. Would it be a possibility to change the FALSE value of 1 into 0 and then count all the ones that end up with a value of 0 and if yes how would this be done?

    Does someone, anyone know how to do it in STATA? As I have a very large database to do it all by hand would take very long.

    Thank you very much for your help it is much much appreciated.

    Isabel
    Last edited by Isabel Hostettler; 29 Jan 2015, 11:30.

  • #2
    So, let's call your five starting variables var1 var2 var3 var4 and var5

    Code:
    gen new_var = "FALSE"
    foreach v of varlist var1 var2 var3 var4 var5 {
         replace new_var = "TRUE" if `v' == "TRUE"
    }
    Now, that said, there are a few catches. First, string variables can be treacherous: this code will only work properly if everything is in upper case and there are no extra spaces around the variables, and no misspellings.

    Moreover, you don't say how you want to handle missing values of var1 through var5.

    Finally, for most purposes it is better to deal with true/false variables by coding them numerically as 0/1. So here's what you might think about doing:

    Code:
    label define boolean 0 "FALSE" 1 "TRUE"
    foreach v of varlist var1 var2 var3 var4 var5 {
         assert inlist(`v', "TRUE", "FALSE")
         encode `v', gen(numeric_`v') label(Boolean)
    }
    
    // AND NOW CALCULATE THE NEW VARIABLE FROM THESE USING LOGICAL OEPRATOR
    gen byte new_var = numeric_var1 | numeric_var2 | numeric_var3 | numeric_var4 | numeric_var5

    Comment


    • #3
      Dear Clyde,

      thank you very much for your answer. I have encoded all the variables so they are numerical now.




      This is just an example of one of the variables. In all the five variables I dont have any missing values. If I apply the command you wrote above to my data I would write:
      gen byte DCI = fndasDCI | gcsdropasDCI | dysphasiaasDCI | hemiasDCI | infarctasDCI
      but then the new variable ends up with just ones but there are people who surely have a "FALSE" respectivelly a "1" on the new variable which would be called DCI.

      Do you know where Ive done the mistake?

      Thank you very much for your help.

      Kind regards,

      Isabel

      Comment


      • #4
        The value of your new variable will be 1 (true) if any of the five variables is 1 (true). Isn't that precisely what you asked for in #1?

        Comment


        • #5
          Yes, thats what Ive asked for, but the new variable just consists of 1. There is no other value, so no "false". And this is not correct. Even by going over the first few patients there are some who dont have a TRUE in any of them but end up with a 1. So there is a mistake somewhere.

          Comment


          • #6
            We need to see therefore the code that produced the wrong result with sufficient data to reproduce it. Note that if you worked with 1 and 2 as original codes for true and false then any or operations will always produce 1 as a result.

            Code:
             
            . di 1 | 2
            1
            
            . di 1 | 1
            1
            
            . di 2 | 2
            1
            See also http://www.stata.com/support/faqs/da...lse/index.html

            Comment


            • #7
              Okay, I now recoded the five variables so that TRUE is 0 and FALSE is 1. Now he gives me some 0 and some FALSE but its still not in the way that every patient who has at least one of the five variables with a TRUE/0 has a TRUE/0 in the new variable. Im really sorry to bother you. My statistician is on leave until in 2 weeks and I need the new variable as its the outcome. Ive attached the five values.

              Thank you very much for your help.
              Attached Files

              Comment


              • #8
                We need to see therefore the code that produced the wrong result
                Still true.

                Comment


                • #9
                  That was the code:

                  gen byte DCI = fndasDCI | gcsdropasDCI | dysphasasDCI | hemiasDCI | infarctasDCI

                  Comment


                  • #10
                    Thanks for providing data and code.

                    The problem is exactly as I explained in #6.
                    The result of applying | to values that are 1 or 2 will always be 1. The FAQ referred to in #6 explains that logical operations in Stata take 0 to mean false and any other number to mean true.

                    Although you claim in #7 to have recoded, the dataset you posted is in terms of 1 and 2.


                    Nevertheless the solution is still to shift your false = 1, true = 2 coding by subtracting 1. To use the groups command used here, you would need to install by ssc inst groups.

                    Code:
                    . count
                     1729
                    
                    . su
                    
                        Variable |       Obs        Mean    Std. Dev.       Min        Max
                    -------------+--------------------------------------------------------
                         studyid |         0
                    dysphasasDCI |      1729    1.086755    .2815576          1          2
                    infarctasDCI |      1729    1.114517    .3185304          1          2
                    gcsdropasDCI |      1729    1.107577    .3099346          1          2
                        fndasDCI |      1729     1.17004     .375777          1          2
                    -------------+--------------------------------------------------------
                       hemiasDCI |      1729    1.131868    .3384452          1          2
                    
                    . foreach v of var *DCI {
                      2. replace `v' = `v' - 1
                      3. }
                    (1729 real changes made)
                    (1729 real changes made)
                    (1729 real changes made)
                    (1729 real changes made)
                    (1729 real changes made)
                    
                    . gen byte DCI = fndasDCI | gcsdropasDCI | dysphasasDCI | hemiasDCI | infarctasDCI
                    
                    . label def boolean 0 false 1 true
                    
                    . foreach v of var *DCI {
                      2. label val `v' boolean
                      3. }
                    
                    . groups *DCI
                    
                      +--------------------------------------------------------------------------------+
                      | dyspha~I   infarc~I   gcsdro~I   fndasDCI   hemias~I     DCI   Freq.   Percent |
                      |--------------------------------------------------------------------------------|
                      |    false      false      false      false      false   false    1348     77.96 |
                      |    false      false      false      false       true    true       5      0.29 |
                      |    false      false      false       true      false    true      11      0.64 |
                      |    false      false      false       true       true    true      35      2.02 |
                      |    false      false       true      false      false    true      19      1.10 |
                      |--------------------------------------------------------------------------------|
                      |    false      false       true       true      false    true      10      0.58 |
                      |    false      false       true       true       true    true      27      1.56 |
                      |    false       true      false      false      false    true      33      1.91 |
                      |    false       true      false      false       true    true       7      0.40 |
                      |    false       true      false       true      false    true       6      0.35 |
                      |--------------------------------------------------------------------------------|
                      |    false       true      false       true       true    true      29      1.68 |
                      |    false       true       true      false      false    true      10      0.58 |
                      |    false       true       true      false       true    true       2      0.12 |
                      |    false       true       true       true      false    true       2      0.12 |
                      |    false       true       true       true       true    true      35      2.02 |
                      |--------------------------------------------------------------------------------|
                      |     true      false      false      false      false    true       2      0.12 |
                      |     true      false      false      false       true    true       3      0.17 |
                      |     true      false      false       true      false    true      17      0.98 |
                      |     true      false      false       true       true    true      18      1.04 |
                      |     true      false       true      false      false    true       2      0.12 |
                      |--------------------------------------------------------------------------------|
                      |     true      false       true       true      false    true      13      0.75 |
                      |     true      false       true       true       true    true      21      1.21 |
                      |     true       true      false      false      false    true       2      0.12 |
                      |     true       true      false       true      false    true      12      0.69 |
                      |     true       true      false       true       true    true      15      0.87 |
                      |--------------------------------------------------------------------------------|
                      |     true       true       true      false      false    true       2      0.12 |
                      |     true       true       true       true      false    true      12      0.69 |
                      |     true       true       true       true       true    true      31      1.79 |
                      +--------------------------------------------------------------------------------+
                    .
                    Last edited by Nick Cox; 30 Jan 2015, 08:19.

                    Comment


                    • #11
                      Thank you. Ill download the command and try. Yes, its still 1 and 2 as I didnt safe it after change it. Thank you very much for your great help

                      Comment


                      • #12
                        Isabel: Please register with your full name. See FAQ Advice 6 for why and how.

                        Comment


                        • #13
                          The usual boolean coding is 0 for false and 1 for true - in post #7 you did it the other way round. To modify the dataset you attached to post #7, you should:
                          Code:
                          foreach V of varlist fndasDCI - infarctasDCI {
                             replace `V´= `V'-1
                          }
                          label define boole 0 "false" 1 "true"
                          label values fndasDCI - infarctasDCI boole
                          and this should give the desired result:
                          Code:
                          egen ntrue = rowtotal(fndasDCI - infarctasDCI)
                          Now, ntrue tells the number of variables with the value 1, meaning true.

                          Comment


                          • #14
                            Thank you very much. Really. I will register with my full name.

                            Comment

                            Working...
                            X