Creating Combination variable from two variables

Vivek Gupta

Join Date: Sep 2016

Posts: 25
#1

Creating Combination variable from two variables

20 Oct 2016, 02:02

Dear forum members,
I am decently familiar with Stata data management but have recently been stumped by this problem which I am sure has a simple answer. The problem is that
1. I have two variables phakia_re and phakia_le, both of which take values ranging from 0/2 representing 0=phakia, 1=aphakia and 2=pseudophakia respectively for each eye in a person
2. Each observation represents a person
3. I need to create a person level status by combinations of these two variables. These possible combinations are: 00, 01, 02, 11, 12, 22
The simple way is to handcode these. But I was looking for a command designed for this purpose in a more elegant manner.
I have tried generating operatedCatType_person using egen, group but it did not help did not help since the results was permutations rather than combinations.

Code:

list phakic_re phakic_le , nol clean noobs phakia_re phakia_le 0 2 2 0 2 0 2 2 2 0 2 0 2 0 2 2 2 0 egen operatedCatType_person = group(phakic_re phakic_le) list phakic_re phakic_le operatedCatType_person , nol clean noobs phaki~re phaki~le operat~1 0 2 2 2 0 6 2 0 6 2 2 8 2 0 6 2 0 6 2 0 6

Same was the case with the SSC groups command. Any help will be much appreciated.
Thanks
Vivek

Stata 15.1 (MP 2 core)
https://www.epidemiology.tech/category/stata/
Google Scholar Profile
Tags: None

Vivek Gupta

Join Date: Sep 2016
Posts: 25

20 Oct 2016, 02:41

Generalizing with a random dataset

Code:

clear
set seed 956
set obs 500
generate a = floor((4-1+1)*runiform() + 1)
generate b = floor((4-1+1)*runiform() + 1)
tab a b
egen c = group(a b), lab
tab c 
 group(a b) |      Freq.     Percent        Cum.
------------+-----------------------------------
        1 1 |         25        5.00        5.00
        1 2 |         41        8.20       13.20
        1 3 |         27        5.40       18.60
        1 4 |         29        5.80       24.40
        2 1 |         36        7.20       31.60
        2 2 |         36        7.20       38.80
        2 3 |         31        6.20       45.00
        2 4 |         27        5.40       50.40
        3 1 |         32        6.40       56.80
        3 2 |         32        6.40       63.20
        3 3 |         33        6.60       69.80
        3 4 |         33        6.60       76.40
        4 1 |         33        6.60       83.00
        4 2 |         27        5.40       88.40
        4 3 |         31        6.20       94.60
        4 4 |         27        5.40      100.00
------------+-----------------------------------
      Total |        500      100.00

 groups a b

  +-------------------------+
  | a   b   Freq.   Percent |
  |-------------------------|
  | 1   1      25      5.00 |
  | 1   2      41      8.20 |
  | 1   3      27      5.40 |
  | 1   4      29      5.80 |
  | 2   1      36      7.20 |
  |-------------------------|
  | 2   2      36      7.20 |
  | 2   3      31      6.20 |
  | 2   4      27      5.40 |
  | 3   1      32      6.40 |
  | 3   2      32      6.40 |
  |-------------------------|
  | 3   3      33      6.60 |
  | 3   4      33      6.60 |
  | 4   1      33      6.60 |
  | 4   2      27      5.40 |
  | 4   3      31      6.20 |
  |-------------------------|
  | 4   4      27      5.40 |
  +-------------------------+

Stata 15.1 (MP 2 core)
https://www.epidemiology.tech/category/stata/
Google Scholar Profile

Comment

Mathias Pedersen Heinze

Join Date: Jun 2015

Posts: 78
#3

20 Oct 2016, 02:43

Hi Vivek,

While there might be a more elegant solution to this problem, this gets the job done:

Code:

egen min = rowmin(phakia_re phakia_le) egen max = rowmax(phakia_re phakia_le) egen operatedCatType_person = concat(min max)

Here, I assume that a value of 20 is be the same as 02 and thus I use the auxiliary egens to determine the values' prositions in the combination.
Comment
Vivek Gupta

Join Date: Sep 2016

Posts: 25
#4

20 Oct 2016, 02:57

HI Mathias, Thanks! This indeed does. 20 is same as 02 as you have correctly assumed. Though wondering how we could scale this to three or more variables..

Stata 15.1 (MP 2 core)
https://www.epidemiology.tech/category/stata/
Google Scholar Profile
Comment

Mathias Pedersen Heinze

Join Date: Jun 2015
Posts: 78

20 Oct 2016, 03:25

You can generalize with the package rowsort.

Take a look at this example script, where I generate combinations with five variables (all between 0-4):

Code:

clear all
set obs 20

forvalues i = 1/5 {
    qui gen var`i' = .
    qui replace var`i' = runiformint(0,4)
}

rowsort var1-var5, gen(s1-s5)
egen combination = concat(s*)

drop s*
sort combination
list, noobs

The result is the output below. Note for instance the value 03444 on the variable combination. This value is there twice, even though the values on the original variables are in different orders.

Code:

. clear all

. set obs 20
number of observations (_N) was 0, now 20

.
. forvalues i = 1/5 {
  2.         qui gen var`i' = .
  3.         qui replace var`i' = runiformint(0,4)
  4. }

.
. rowsort var1-var5, gen(s1-s5)

. egen combination = concat(s*)

.
. drop s*

. sort combination

. list, noobs

  +---------------------------------------------+
  | var1   var2   var3   var4   var5   combin~n |
  |---------------------------------------------|
  |    0      0      0      1      2      00012 |
  |    1      0      4      0      1      00114 |
  |    0      1      3      3      0      00133 |
  |    3      4      0      0      4      00344 |
  |    1      1      3      0      4      01134 |
  |---------------------------------------------|
  |    1      4      1      3      0      01134 |
  |    0      4      1      2      2      01224 |
  |    4      0      3      1      2      01234 |
  |    4      4      1      2      0      01244 |
  |    4      0      4      1      3      01344 |
  |---------------------------------------------|
  |    2      4      2      4      0      02244 |
  |    3      3      0      3      2      02333 |
  |    4      4      2      3      0      02344 |
  |    4      4      2      0      4      02444 |
  |    3      4      0      3      3      03334 |
  |---------------------------------------------|
  |    3      4      4      0      4      03444 |
  |    4      0      4      4      3      03444 |
  |    2      4      1      3      1      11234 |
  |    2      3      3      2      1      12233 |
  |    2      2      2      2      3      22223 |
  +---------------------------------------------+

I hope this is what you wanted to obtain.

Last edited by Mathias Pedersen Heinze; 20 Oct 2016, 03:28.

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35599
#6

20 Oct 2016, 03:39

It's always pleasant to hear any of my programs mentioned (groups (SSC) in #1 and rowsort (SJ) in #5).

On rowsort the implicit reference is

SJ-9-1 pr0046 . . . . . . . . . . . . . . . . . . . Speaking Stata: Rowwise
(help rowsort, rowranks if installed) . . . . . . . . . . . N. J. Cox
Q1/09 SJ 9(1):137--157
shows how to exploit functions, egen functions, and Mata
for working rowwise; rowsort and rowranks are introduced

http://www.stata-journal.com/sjpdf.h...iclenum=pr0046

I note here that the problem originally posed for two variables yields to simple trickery with functions and no intermediate variables or special commands are needed at all.

Directly to the point here, similar problems with pairs of variables were discussed in

SJ-8-4 dm0043 . Tip 71: The problem of split identity, or how to group dyads
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
Q4/08 SJ 8(4):588--591 (no commands)
tip on how to handle dyadic identifiers

http://www.stata-journal.com/sjpdf.h...iclenum=dm0043

The availability of free .pdf versions should not discourage people from citing these, and freely.

Code:

clear input phakia_re phakia_le 0 2 2 0 2 0 2 2 2 0 2 0 2 0 2 2 2 0 end gen class = string(min(phakia_re, phakia_le)) + string(max(phakia_re, phakia_le)) list, sep(0) +-----------------------------+ | phaki~re phaki~le class | |-----------------------------| 1. | 0 2 02 | 2. | 2 0 02 | 3. | 2 0 02 | 4. | 2 2 22 | 5. | 2 0 02 | 6. | 2 0 02 | 7. | 2 0 02 | 8. | 2 2 22 | 9. | 2 0 02 | +-----------------------------+

The logic is thus that the first character is the smaller digit and the second character is the larger digit. We can be sure that ties don't bite or even complicate the problem.

Mathias is clearly right that other tools are needed to make this easy for three or more such variables.

Last edited by Nick Cox; 20 Oct 2016, 04:12.
1 like
Comment
Mathias Pedersen Heinze

Join Date: Jun 2015

Posts: 78
#7

20 Oct 2016, 04:05

Nick has the better solution for the two variables case.

Originally posted by Nick Cox View Post

It's always pleasant to hear any of my programs mentioned (groups (SSC) in #1 and rowsort (SJ) in #5).

By the way, thanks for writing these very useful programs, and sorry about the missing credits and reference.

Last edited by Mathias Pedersen Heinze; 20 Oct 2016, 04:22.
Comment
Vivek Gupta

Join Date: Sep 2016

Posts: 25
#8

20 Oct 2016, 05:30

Thanks Mathias, I was not aware of rowsort but am looking into it now. As you have amply demonstrated, it gets the job done and at the end of the day, thats what matters
Gratitude Nick for the plethora of useful packages that you have contributed including groups and rowsort. I just read through the article on dyads in Stata tip 71 and it is very interesting indeed.
Bests
Vivek

Stata 15.1 (MP 2 core)
https://www.epidemiology.tech/category/stata/
Google Scholar Profile
Comment
Radhouene DOGGUI

Join Date: Jun 2018

Posts: 72
#9

09 Oct 2020, 07:42

Vivek Gupta and Nick Cox : Please I would like to create a new variable with the different possible combinations (without duplicates) of six variables (each one from 1 to 3). Of note, the code 000003 is different from 300000. The final variable should display space between the different numbers, I mean 0 0 0 0 0 3 instead of 000003. Sorryif my request is not a standard one. Regards.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35599
#10

09 Oct 2020, 08:23

I don't really understand #9. But I will try. It seems that you have 6 variables which each may vary from 1 to 3 -- or is it 0 or 3? The post seems contradictory. But if so, then there are 3^6 = 729 or 4^6 = 4096 possibilities say

1 1 1 1 1 1
1 1 1 1 1 2

to

3 3 3 3 3 2
3 3 3 3 3 3

or starting

0 0 0 0 0 0
0 0 0 0 0 1

Do you want these as a new dataset or do you wish to check which of those occur in an existing dataset? Or is it something different completely?
1 like
Comment
Radhouene DOGGUI

Join Date: Jun 2018

Posts: 72
#11

09 Oct 2020, 10:40

Originally posted by Nick Cox View Post

I don't really understand #9. But I will try. It seems that you have 6 variables which each may vary from 1 to 3 -- or is it 0 or 3? The post seems contradictory. But if so, then there are 3^6 = 729 or 4^6 = 4096 possibilities say

1 1 1 1 1 1
1 1 1 1 1 2

to

3 3 3 3 3 2
3 3 3 3 3 3

or starting

0 0 0 0 0 0
0 0 0 0 0 1

Do you want these as a new dataset or do you wish to check which of those occur in an existing dataset? Or is it something different completely?

Sorry Nick Cox for the mistake.

starting :
0 0 0 0 0 0
0 0 0 0 0 1

So I am trying to construct a loop to run multitrajectory group based model testing different combinations of polynomial fitting. I was thinking to use for "each value" in this loop and the different combinations that I asked about.
I hope that this is clear.

traj , multgroups(6) var1(beverage1*) indep1(age_*) model1(beta) order1(3 3 3 3 3 3) ///
var2(beverage2*) indep2(age_*) model2(zip) order2(3 3 3 3 3 2) iorder2(-1) ///
var3(beverage3*) indep3(age_*) model3(zip) order3(3 3 3 3 3 1) iorder3(-1) ///
var4(beverage4*) indep4(age_*) model4(logit) order4(3 3 3 3 3 0)
Thank you Sir.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35599
#12

09 Oct 2020, 11:25

I think this is becoming clearer as a question about applying traj -- about which I know nothing. I suggest you start a new thread naming traj in the title and spell out your full question. (And please explain where that command comes from.)
1 like
Comment
Radhouene DOGGUI

Join Date: Jun 2018

Posts: 72
#13

09 Oct 2020, 11:41

Many Thanks Nick
Comment

Announcement