Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating Combination variable from two variables

    Dear forum members,
    I am decently familiar with Stata data management but have recently been stumped by this problem which I am sure has a simple answer. The problem is that
    1. I have two variables phakia_re and phakia_le, both of which take values ranging from 0/2 representing 0=phakia, 1=aphakia and 2=pseudophakia respectively for each eye in a person
    2. Each observation represents a person
    3. I need to create a person level status by combinations of these two variables. These possible combinations are: 00, 01, 02, 11, 12, 22
    The simple way is to handcode these. But I was looking for a command designed for this purpose in a more elegant manner.
    I have tried generating operatedCatType_person using egen, group but it did not help did not help since the results was permutations rather than combinations.

    Code:
     list phakic_re phakic_le , nol clean noobs
        phakia_re   phakia_le  
               0          2  
               2          0  
               2          0  
               2          2  
               2          0  
               2          0  
               2          0  
               2          2  
               2          0
    egen operatedCatType_person = group(phakic_re phakic_le)
    list phakic_re phakic_le operatedCatType_person , nol clean noobs
        phaki~re   phaki~le   operat~1  
               0          2            2  
               2          0            6  
               2          0            6  
               2          2            8  
               2          0            6  
               2          0            6  
               2          0            6
    Same was the case with the SSC groups command. Any help will be much appreciated.
    Thanks
    Vivek

    Stata 15.1 (MP 2 core)
    https://www.epidemiology.tech/category/stata/
    Google Scholar Profile

  • #2
    Generalizing with a random dataset
    Code:
    clear
    set seed 956
    set obs 500
    generate a = floor((4-1+1)*runiform() + 1)
    generate b = floor((4-1+1)*runiform() + 1)
    tab a b
    egen c = group(a b), lab
    tab c 
     group(a b) |      Freq.     Percent        Cum.
    ------------+-----------------------------------
            1 1 |         25        5.00        5.00
            1 2 |         41        8.20       13.20
            1 3 |         27        5.40       18.60
            1 4 |         29        5.80       24.40
            2 1 |         36        7.20       31.60
            2 2 |         36        7.20       38.80
            2 3 |         31        6.20       45.00
            2 4 |         27        5.40       50.40
            3 1 |         32        6.40       56.80
            3 2 |         32        6.40       63.20
            3 3 |         33        6.60       69.80
            3 4 |         33        6.60       76.40
            4 1 |         33        6.60       83.00
            4 2 |         27        5.40       88.40
            4 3 |         31        6.20       94.60
            4 4 |         27        5.40      100.00
    ------------+-----------------------------------
          Total |        500      100.00
    
     groups a b
    
      +-------------------------+
      | a   b   Freq.   Percent |
      |-------------------------|
      | 1   1      25      5.00 |
      | 1   2      41      8.20 |
      | 1   3      27      5.40 |
      | 1   4      29      5.80 |
      | 2   1      36      7.20 |
      |-------------------------|
      | 2   2      36      7.20 |
      | 2   3      31      6.20 |
      | 2   4      27      5.40 |
      | 3   1      32      6.40 |
      | 3   2      32      6.40 |
      |-------------------------|
      | 3   3      33      6.60 |
      | 3   4      33      6.60 |
      | 4   1      33      6.60 |
      | 4   2      27      5.40 |
      | 4   3      31      6.20 |
      |-------------------------|
      | 4   4      27      5.40 |
      +-------------------------+
    Stata 15.1 (MP 2 core)
    https://www.epidemiology.tech/category/stata/
    Google Scholar Profile

    Comment


    • #3
      Hi Vivek,

      While there might be a more elegant solution to this problem, this gets the job done:
      Code:
      egen min = rowmin(phakia_re phakia_le)
      egen max = rowmax(phakia_re phakia_le)
      egen operatedCatType_person = concat(min max)
      Here, I assume that a value of 20 is be the same as 02 and thus I use the auxiliary egens to determine the values' prositions in the combination.

      Comment


      • #4
        HI Mathias, Thanks! This indeed does. 20 is same as 02 as you have correctly assumed. Though wondering how we could scale this to three or more variables..
        Stata 15.1 (MP 2 core)
        https://www.epidemiology.tech/category/stata/
        Google Scholar Profile

        Comment


        • #5
          You can generalize with the package rowsort.

          Take a look at this example script, where I generate combinations with five variables (all between 0-4):
          Code:
          clear all
          set obs 20
          
          forvalues i = 1/5 {
              qui gen var`i' = .
              qui replace var`i' = runiformint(0,4)
          }
          
          rowsort var1-var5, gen(s1-s5)
          egen combination = concat(s*)
          
          drop s*
          sort combination
          list, noobs
          The result is the output below. Note for instance the value 03444 on the variable combination. This value is there twice, even though the values on the original variables are in different orders.
          Code:
          . clear all
          
          . set obs 20
          number of observations (_N) was 0, now 20
          
          .
          . forvalues i = 1/5 {
            2.         qui gen var`i' = .
            3.         qui replace var`i' = runiformint(0,4)
            4. }
          
          .
          . rowsort var1-var5, gen(s1-s5)
          
          . egen combination = concat(s*)
          
          .
          . drop s*
          
          . sort combination
          
          . list, noobs
          
            +---------------------------------------------+
            | var1   var2   var3   var4   var5   combin~n |
            |---------------------------------------------|
            |    0      0      0      1      2      00012 |
            |    1      0      4      0      1      00114 |
            |    0      1      3      3      0      00133 |
            |    3      4      0      0      4      00344 |
            |    1      1      3      0      4      01134 |
            |---------------------------------------------|
            |    1      4      1      3      0      01134 |
            |    0      4      1      2      2      01224 |
            |    4      0      3      1      2      01234 |
            |    4      4      1      2      0      01244 |
            |    4      0      4      1      3      01344 |
            |---------------------------------------------|
            |    2      4      2      4      0      02244 |
            |    3      3      0      3      2      02333 |
            |    4      4      2      3      0      02344 |
            |    4      4      2      0      4      02444 |
            |    3      4      0      3      3      03334 |
            |---------------------------------------------|
            |    3      4      4      0      4      03444 |
            |    4      0      4      4      3      03444 |
            |    2      4      1      3      1      11234 |
            |    2      3      3      2      1      12233 |
            |    2      2      2      2      3      22223 |
            +---------------------------------------------+
          I hope this is what you wanted to obtain.
          Last edited by Mathias Pedersen Heinze; 20 Oct 2016, 03:28.

          Comment


          • #6
            It's always pleasant to hear any of my programs mentioned (groups (SSC) in #1 and rowsort (SJ) in #5).

            On rowsort the implicit reference is

            SJ-9-1 pr0046 . . . . . . . . . . . . . . . . . . . Speaking Stata: Rowwise
            (help rowsort, rowranks if installed) . . . . . . . . . . . N. J. Cox
            Q1/09 SJ 9(1):137--157
            shows how to exploit functions, egen functions, and Mata
            for working rowwise; rowsort and rowranks are introduced


            http://www.stata-journal.com/sjpdf.h...iclenum=pr0046

            I note here that the problem originally posed for two variables yields to simple trickery with functions and no intermediate variables or special commands are needed at all.

            Directly to the point here, similar problems with pairs of variables were discussed in

            SJ-8-4 dm0043 . Tip 71: The problem of split identity, or how to group dyads
            . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
            Q4/08 SJ 8(4):588--591 (no commands)
            tip on how to handle dyadic identifiers


            http://www.stata-journal.com/sjpdf.h...iclenum=dm0043

            The availability of free .pdf versions should not discourage people from citing these, and freely.


            Code:
            clear
            input phakia_re   phakia_le  
                       0          2  
                       2          0  
                       2          0  
                       2          2  
                       2          0  
                       2          0  
                       2          0  
                       2          2  
                       2          0
            end
            
            gen class = string(min(phakia_re, phakia_le)) + string(max(phakia_re, phakia_le))
            
            list, sep(0)
            
                 +-----------------------------+
                 | phaki~re   phaki~le   class |
                 |-----------------------------|
              1. |        0          2      02 |
              2. |        2          0      02 |
              3. |        2          0      02 |
              4. |        2          2      22 |
              5. |        2          0      02 |
              6. |        2          0      02 |
              7. |        2          0      02 |
              8. |        2          2      22 |
              9. |        2          0      02 |
                 +-----------------------------+
            The logic is thus that the first character is the smaller digit and the second character is the larger digit. We can be sure that ties don't bite or even complicate the problem.

            Mathias is clearly right that other tools are needed to make this easy for three or more such variables.
            Last edited by Nick Cox; 20 Oct 2016, 04:12.

            Comment


            • #7
              Nick has the better solution for the two variables case.

              Originally posted by Nick Cox View Post
              It's always pleasant to hear any of my programs mentioned (groups (SSC) in #1 and rowsort (SJ) in #5).
              By the way, thanks for writing these very useful programs, and sorry about the missing credits and reference.
              Last edited by Mathias Pedersen Heinze; 20 Oct 2016, 04:22.

              Comment


              • #8
                Thanks Mathias, I was not aware of rowsort but am looking into it now. As you have amply demonstrated, it gets the job done and at the end of the day, thats what matters
                Gratitude Nick for the plethora of useful packages that you have contributed including groups and rowsort. I just read through the article on dyads in Stata tip 71 and it is very interesting indeed.
                Bests
                Vivek
                Stata 15.1 (MP 2 core)
                https://www.epidemiology.tech/category/stata/
                Google Scholar Profile

                Comment


                • #9
                  Vivek Gupta and Nick Cox : Please I would like to create a new variable with the different possible combinations (without duplicates) of six variables (each one from 1 to 3). Of note, the code 000003 is different from 300000. The final variable should display space between the different numbers, I mean 0 0 0 0 0 3 instead of 000003. Sorryif my request is not a standard one. Regards.

                  Comment


                  • #10
                    I don't really understand #9. But I will try. It seems that you have 6 variables which each may vary from 1 to 3 -- or is it 0 or 3? The post seems contradictory. But if so, then there are 3^6 = 729 or 4^6 = 4096 possibilities say

                    1 1 1 1 1 1
                    1 1 1 1 1 2

                    to

                    3 3 3 3 3 2
                    3 3 3 3 3 3

                    or starting

                    0 0 0 0 0 0
                    0 0 0 0 0 1

                    Do you want these as a new dataset or do you wish to check which of those occur in an existing dataset? Or is it something different completely?

                    Comment


                    • #11
                      Originally posted by Nick Cox View Post
                      I don't really understand #9. But I will try. It seems that you have 6 variables which each may vary from 1 to 3 -- or is it 0 or 3? The post seems contradictory. But if so, then there are 3^6 = 729 or 4^6 = 4096 possibilities say

                      1 1 1 1 1 1
                      1 1 1 1 1 2

                      to

                      3 3 3 3 3 2
                      3 3 3 3 3 3

                      or starting

                      0 0 0 0 0 0
                      0 0 0 0 0 1

                      Do you want these as a new dataset or do you wish to check which of those occur in an existing dataset? Or is it something different completely?
                      Sorry Nick Cox for the mistake.

                      starting :
                      0 0 0 0 0 0
                      0 0 0 0 0 1

                      So I am trying to construct a loop to run multitrajectory group based model testing different combinations of polynomial fitting. I was thinking to use for "each value" in this loop and the different combinations that I asked about.
                      I hope that this is clear.

                      traj , multgroups(6) var1(beverage1*) indep1(age_*) model1(beta) order1(3 3 3 3 3 3) ///
                      var2(beverage2*) indep2(age_*) model2(zip) order2(3 3 3 3 3 2) iorder2(-1) ///
                      var3(beverage3*) indep3(age_*) model3(zip) order3(3 3 3 3 3 1) iorder3(-1) ///
                      var4(beverage4*) indep4(age_*) model4(logit) order4(3 3 3 3 3 0)
                      Thank you Sir.

                      Comment


                      • #12
                        I think this is becoming clearer as a question about applying traj -- about which I know nothing. I suggest you start a new thread naming traj in the title and spell out your full question. (And please explain where that command comes from.)

                        Comment


                        • #13
                          Many Thanks Nick

                          Comment

                          Working...
                          X