Moving values from observations into new variables by group

Benjamin Martinez-Velasco

Join Date: Mar 2019

Posts: 8
#1

Moving values from observations into new variables by group

19 Mar 2019, 17:57

Hello everyone,

I'm working on a dataset. As you can observe in the example below, I got two variables: ID and KEY. I need to change the data from the KEY variable into several new dichotomous variables. This has to be done by creating a new variable for each of the different values that KEY can take (about 30 values). In the end, I'd need to keep only one observation per ID, in order for me to merge the data with another dataset. Can you provide me with any suggestions? I was thinking of working with grups (using by) but I couldn't find out how. I know that my algorithm has to go though each of the observations in every ID group to find which KEY values are recorded and then replace the values in the corresponding new variables in all of the group, and finally collapse the dataset.

Thank you very much.

P.S.: This is my first post, but I have been reading this forum for a long time already, and have learned a lot from it. Thank you for that too!

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input int(ID KEY) 1 12 1 14 1 29 1 9999 2 9999 3 9999 4 9999 5 9999 6 9999 7 9999 8 14 8 15 9 9999 10 9999 11 9999 12 9999 13 9999 14 9999 15 6 15 11 15 12 15 13 15 14 15 18 15 21 15 26 15 29 15 34 16 6 16 12 16 14 16 15 16 17 16 26 16 29 16 34 16 35 17 9999 18 9999 19 9999 20 9999 21 9999 22 21 23 9999 24 9999 25 9999 26 9999 27 9999 28 9999 29 6 29 34 30 6 30 12 30 14 30 15 30 17 30 26 30 29 30 34 30 35 31 9999 32 14 32 34 33 21 34 9999 35 6 35 14 35 26 36 11 37 9999 38 9999 39 9999 40 6 40 12 40 14 40 15 40 17 40 26 40 29 40 34 40 35 41 6 41 11 41 12 41 13 41 14 41 18 41 21 41 26 41 29 41 34 42 6 42 12 42 14 42 15 42 17 42 26 42 29 42 34 42 35 end
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30122
#2

19 Mar 2019, 19:02

I'm not entirely sure I understand what you want, but I think it might be this:

Code:

levelsof KEY, local(keys) foreach k of local keys { by ID, sort: gen dichot_`k' = KEY == `k' } collapse (max) dichot_*, by(ID)

If this isn't what you had in mind, when posting back, please give an example (worked by hand) of what you want the end result to look like.
1 like
Comment

David Benson

Join Date: Oct 2018
Posts: 489

19 Mar 2019, 23:36

Code:

. tabulate KEY, gen(ng)  // ng stands for "new_group"

        KEY |      Freq.     Percent        Cum.
------------+-----------------------------------
          6 |          8        8.00        8.00
         11 |          3        3.00       11.00
         12 |          7        7.00       18.00
         13 |          2        2.00       20.00
         14 |         10       10.00       30.00
         15 |          5        5.00       35.00
         17 |          4        4.00       39.00
         18 |          2        2.00       41.00
         21 |          4        4.00       45.00
         26 |          7        7.00       52.00
         29 |          7        7.00       59.00
         34 |          8        8.00       67.00
         35 |          4        4.00       71.00
       9999 |         29       29.00      100.00
------------+-----------------------------------
      Total |        100      100.00

sort KEY ID
list KEY ID ng1 ng2 ng3 ng4 ng5 if KEY<=14, sepby(KEY) noobs

  +----------------------------------------+
  | KEY   ID   ng1   ng2   ng3   ng4   ng5 |
  |----------------------------------------|
  |   6   15     1     0     0     0     0 |
  |   6   16     1     0     0     0     0 |
  |   6   29     1     0     0     0     0 |
  |   6   30     1     0     0     0     0 |
  |   6   35     1     0     0     0     0 |
  |   6   40     1     0     0     0     0 |
  |   6   41     1     0     0     0     0 |
  |   6   42     1     0     0     0     0 |
  |----------------------------------------|
  |  11   15     0     1     0     0     0 |
  |  11   36     0     1     0     0     0 |
  |  11   41     0     1     0     0     0 |
  |----------------------------------------|
  |  12    1     0     0     1     0     0 |
  |  12   15     0     0     1     0     0 |
  |  12   16     0     0     1     0     0 |
  |  12   30     0     0     1     0     0 |
  |  12   40     0     0     1     0     0 |
  |  12   41     0     0     1     0     0 |
  |  12   42     0     0     1     0     0 |
  |----------------------------------------|
  |  13   15     0     0     0     1     0 |
  |  13   41     0     0     0     1     0 |
  |----------------------------------------|
  |  14    1     0     0     0     0     1 |
  |  14    8     0     0     0     0     1 |
  |  14   15     0     0     0     0     1 |
  |  14   16     0     0     0     0     1 |
  |  14   30     0     0     0     0     1 |
  |  14   32     0     0     0     0     1 |
  |  14   35     0     0     0     0     1 |
  |  14   40     0     0     0     0     1 |
  |  14   41     0     0     0     0     1 |
  |  14   42     0     0     0     0     1 |
  +----------------------------------------+

* You could then collapse down the data as shown in Clyde's code

Last edited by David Benson; 19 Mar 2019, 23:38.

Comment

Benjamin Martinez-Velasco

Join Date: Mar 2019
Posts: 8

20 Mar 2019, 14:29

Originally posted by Clyde Schechter View Post

I'm not entirely sure I understand what you want, but I think it might be this:

Code:

levelsof KEY, local(keys)
foreach k of local keys {
by ID, sort: gen dichot_`k' = KEY == `k'
}
collapse (max) dichot_*, by(ID)

If this isn't what you had in mind, when posting back, please give an example (worked by hand) of what you want the end result to look like.

Yes! this is exactly what I wanted to do!

Here's an example of the resulting data (I couldn't paste all of the resulting variables since dataex has an output limit)

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int ID float(dichot_1 dichot_2 dichot_5 dichot_6 dichot_7 dichot_39 dichot_41)
  1 0 0 0 0 0 0 0
  2 0 0 0 0 0 0 0
  3 0 0 0 0 0 0 0
  4 0 0 0 0 0 0 0
  5 0 0 0 0 0 0 0
  6 0 0 0 0 0 0 0
  7 0 0 0 0 0 0 0
  8 0 0 0 0 0 0 0
  9 0 0 0 0 0 0 0
 10 0 0 0 0 0 0 0
 11 0 0 0 0 0 0 0
 12 0 0 0 0 0 0 0
 13 0 0 0 0 0 0 0
 14 0 0 0 0 0 0 0
 15 0 0 0 1 0 0 0
 16 0 0 0 1 0 0 0
 17 0 0 0 0 0 0 0
 18 0 0 0 0 0 0 0
 19 0 0 0 0 0 0 0
 20 0 0 0 0 0 0 0
 21 0 0 0 0 0 0 0
 22 0 0 0 0 0 0 0
 23 0 0 0 0 0 0 0
 24 0 0 0 0 0 0 0
 25 0 0 0 0 0 0 0
 26 0 0 0 0 0 0 0
 27 0 0 0 0 0 0 0
 28 0 0 0 0 0 0 0
 29 0 0 0 1 0 0 0
 30 0 0 0 1 0 0 0
 31 0 0 0 0 0 0 0
 32 0 0 0 0 0 0 0
 33 0 0 0 0 0 0 0
 34 0 0 0 0 0 0 0
 35 0 0 0 1 0 0 0
 36 0 0 0 0 0 0 0
 37 0 0 0 0 0 0 0
 38 0 0 0 0 0 0 0
 39 0 0 0 0 0 0 0
 40 0 0 0 1 0 0 0
 41 0 0 0 1 0 0 0
 42 0 0 0 1 0 0 0
 43 0 0 0 0 0 0 0
 44 0 0 0 1 0 0 0
 45 0 0 0 0 0 0 0
 46 0 0 0 0 0 0 0
 47 0 0 0 0 0 0 0
 48 0 0 0 1 0 0 0
 49 0 0 0 0 0 0 0
 50 0 0 0 1 0 0 0
 51 0 0 0 0 0 0 0
 52 0 0 0 0 0 0 0
 53 0 0 0 0 0 0 0
 54 0 0 0 0 0 0 0
 55 0 0 0 0 0 0 0
 56 0 0 0 1 0 0 0
 57 0 0 0 1 0 0 0
 58 0 0 0 0 0 0 0
 59 0 0 0 0 0 0 0
 60 0 0 0 1 0 0 0
 61 0 0 0 1 0 0 0
 62 0 0 0 1 0 0 0
 63 0 0 0 0 0 0 0
 64 0 0 0 0 0 0 0
 65 0 0 0 0 0 0 0
 66 0 0 0 0 0 0 0
 67 0 0 0 0 0 0 0
 68 0 0 0 1 0 0 0
 69 0 0 0 1 0 0 0
 70 0 0 0 0 0 0 0
 71 0 0 0 0 0 0 0
 72 0 0 0 0 0 0 0
 73 0 0 0 0 0 0 0
 74 0 0 0 1 0 0 0
 75 0 0 0 0 0 0 0
 76 0 0 0 0 0 0 0
 77 0 0 0 0 0 0 0
 78 0 0 0 0 0 0 0
 79 0 0 0 0 0 0 0
 80 0 0 0 0 0 0 0
 81 0 0 0 0 0 0 0
 82 0 0 0 0 0 0 0
 83 0 0 0 1 0 0 0
 84 0 0 0 0 0 0 0
 85 0 0 0 1 0 0 0
 86 0 0 0 1 0 0 0
 87 0 0 0 1 0 0 0
 88 0 0 0 0 0 0 0
 89 0 0 0 1 0 0 0
 90 0 0 0 0 0 0 0
 91 0 0 0 0 0 0 0
 92 0 0 0 0 0 0 0
 93 0 0 0 0 0 0 0
 94 0 0 0 0 0 0 0
 95 0 0 0 0 0 0 0
 96 0 0 0 1 0 0 0
 97 0 0 0 1 0 0 0
 98 0 0 0 1 0 0 0
 99 0 0 0 1 0 0 0
100 0 0 0 1 0 0 0
end

I do understand loops and collapses, but would you explain to me briefly how does the code work?

Thank you very much!

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30122
#5

20 Mar 2019, 15:13

So, we loop over the different values of key. For each of them, we create a dichotomous 0/1 variable: 1 if the value of KEY equals the current value of `k' in the loop, 0 otherwise.

Now, let's unpack the -collapse- command. Imagine an ID where at least one observation for that ID has KEY = 2. Then for that observation dichot_2 == 1. All values of dichot_2 are either 0 or 1. So the maximum value of dichot_2 will be 1. But if we imagine a different ID for which KEY is never 2, then dichot_2 will always be 0 for that ID and the maximum value of dichot_2 will be 0. Voila!
1 like
Comment
Benjamin Martinez-Velasco

Join Date: Mar 2019

Posts: 8
#6

20 Mar 2019, 17:52

Originally posted by Clyde Schechter View Post

So, we loop over the different values of key. For each of them, we create a dichotomous 0/1 variable: 1 if the value of KEY equals the current value of `k' in the loop, 0 otherwise.

Now, let's unpack the -collapse- command. Imagine an ID where at least one observation for that ID has KEY = 2. Then for that observation dichot_2 == 1. All values of dichot_2 are either 0 or 1. So the maximum value of dichot_2 will be 1. But if we imagine a different ID for which KEY is never 2, then dichot_2 will always be 0 for that ID and the maximum value of dichot_2 will be 0. Voila!

Thank you so much Clyde!
Comment

Announcement