Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Moving values from observations into new variables by group

    Hello everyone,

    I'm working on a dataset. As you can observe in the example below, I got two variables: ID and KEY. I need to change the data from the KEY variable into several new dichotomous variables. This has to be done by creating a new variable for each of the different values that KEY can take (about 30 values). In the end, I'd need to keep only one observation per ID, in order for me to merge the data with another dataset. Can you provide me with any suggestions? I was thinking of working with grups (using by) but I couldn't find out how. I know that my algorithm has to go though each of the observations in every ID group to find which KEY values are recorded and then replace the values in the corresponding new variables in all of the group, and finally collapse the dataset.

    Thank you very much.

    P.S.: This is my first post, but I have been reading this forum for a long time already, and have learned a lot from it. Thank you for that too!


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int(ID KEY)
     1   12
     1   14
     1   29
     1 9999
     2 9999
     3 9999
     4 9999
     5 9999
     6 9999
     7 9999
     8   14
     8   15
     9 9999
    10 9999
    11 9999
    12 9999
    13 9999
    14 9999
    15    6
    15   11
    15   12
    15   13
    15   14
    15   18
    15   21
    15   26
    15   29
    15   34
    16    6
    16   12
    16   14
    16   15
    16   17
    16   26
    16   29
    16   34
    16   35
    17 9999
    18 9999
    19 9999
    20 9999
    21 9999
    22   21
    23 9999
    24 9999
    25 9999
    26 9999
    27 9999
    28 9999
    29    6
    29   34
    30    6
    30   12
    30   14
    30   15
    30   17
    30   26
    30   29
    30   34
    30   35
    31 9999
    32   14
    32   34
    33   21
    34 9999
    35    6
    35   14
    35   26
    36   11
    37 9999
    38 9999
    39 9999
    40    6
    40   12
    40   14
    40   15
    40   17
    40   26
    40   29
    40   34
    40   35
    41    6
    41   11
    41   12
    41   13
    41   14
    41   18
    41   21
    41   26
    41   29
    41   34
    42    6
    42   12
    42   14
    42   15
    42   17
    42   26
    42   29
    42   34
    42   35
    end

  • #2
    I'm not entirely sure I understand what you want, but I think it might be this:
    Code:
    levelsof KEY, local(keys)
    foreach k of local keys {
        by ID, sort: gen dichot_`k' = KEY == `k'
    }
    collapse (max) dichot_*, by(ID)
    If this isn't what you had in mind, when posting back, please give an example (worked by hand) of what you want the end result to look like.

    Comment


    • #3
      Code:
      . tabulate KEY, gen(ng)  // ng stands for "new_group"
      
              KEY |      Freq.     Percent        Cum.
      ------------+-----------------------------------
                6 |          8        8.00        8.00
               11 |          3        3.00       11.00
               12 |          7        7.00       18.00
               13 |          2        2.00       20.00
               14 |         10       10.00       30.00
               15 |          5        5.00       35.00
               17 |          4        4.00       39.00
               18 |          2        2.00       41.00
               21 |          4        4.00       45.00
               26 |          7        7.00       52.00
               29 |          7        7.00       59.00
               34 |          8        8.00       67.00
               35 |          4        4.00       71.00
             9999 |         29       29.00      100.00
      ------------+-----------------------------------
            Total |        100      100.00
      
      sort KEY ID
      list KEY ID ng1 ng2 ng3 ng4 ng5 if KEY<=14, sepby(KEY) noobs
      
        +----------------------------------------+
        | KEY   ID   ng1   ng2   ng3   ng4   ng5 |
        |----------------------------------------|
        |   6   15     1     0     0     0     0 |
        |   6   16     1     0     0     0     0 |
        |   6   29     1     0     0     0     0 |
        |   6   30     1     0     0     0     0 |
        |   6   35     1     0     0     0     0 |
        |   6   40     1     0     0     0     0 |
        |   6   41     1     0     0     0     0 |
        |   6   42     1     0     0     0     0 |
        |----------------------------------------|
        |  11   15     0     1     0     0     0 |
        |  11   36     0     1     0     0     0 |
        |  11   41     0     1     0     0     0 |
        |----------------------------------------|
        |  12    1     0     0     1     0     0 |
        |  12   15     0     0     1     0     0 |
        |  12   16     0     0     1     0     0 |
        |  12   30     0     0     1     0     0 |
        |  12   40     0     0     1     0     0 |
        |  12   41     0     0     1     0     0 |
        |  12   42     0     0     1     0     0 |
        |----------------------------------------|
        |  13   15     0     0     0     1     0 |
        |  13   41     0     0     0     1     0 |
        |----------------------------------------|
        |  14    1     0     0     0     0     1 |
        |  14    8     0     0     0     0     1 |
        |  14   15     0     0     0     0     1 |
        |  14   16     0     0     0     0     1 |
        |  14   30     0     0     0     0     1 |
        |  14   32     0     0     0     0     1 |
        |  14   35     0     0     0     0     1 |
        |  14   40     0     0     0     0     1 |
        |  14   41     0     0     0     0     1 |
        |  14   42     0     0     0     0     1 |
        +----------------------------------------+
      
      * You could then collapse down the data as shown in Clyde's code
      Last edited by David Benson; 19 Mar 2019, 23:38.

      Comment


      • #4
        Originally posted by Clyde Schechter View Post
        I'm not entirely sure I understand what you want, but I think it might be this:
        Code:
        levelsof KEY, local(keys)
        foreach k of local keys {
        by ID, sort: gen dichot_`k' = KEY == `k'
        }
        collapse (max) dichot_*, by(ID)
        If this isn't what you had in mind, when posting back, please give an example (worked by hand) of what you want the end result to look like.
        Yes! this is exactly what I wanted to do!

        Here's an example of the resulting data (I couldn't paste all of the resulting variables since dataex has an output limit)

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input int ID float(dichot_1 dichot_2 dichot_5 dichot_6 dichot_7 dichot_39 dichot_41)
          1 0 0 0 0 0 0 0
          2 0 0 0 0 0 0 0
          3 0 0 0 0 0 0 0
          4 0 0 0 0 0 0 0
          5 0 0 0 0 0 0 0
          6 0 0 0 0 0 0 0
          7 0 0 0 0 0 0 0
          8 0 0 0 0 0 0 0
          9 0 0 0 0 0 0 0
         10 0 0 0 0 0 0 0
         11 0 0 0 0 0 0 0
         12 0 0 0 0 0 0 0
         13 0 0 0 0 0 0 0
         14 0 0 0 0 0 0 0
         15 0 0 0 1 0 0 0
         16 0 0 0 1 0 0 0
         17 0 0 0 0 0 0 0
         18 0 0 0 0 0 0 0
         19 0 0 0 0 0 0 0
         20 0 0 0 0 0 0 0
         21 0 0 0 0 0 0 0
         22 0 0 0 0 0 0 0
         23 0 0 0 0 0 0 0
         24 0 0 0 0 0 0 0
         25 0 0 0 0 0 0 0
         26 0 0 0 0 0 0 0
         27 0 0 0 0 0 0 0
         28 0 0 0 0 0 0 0
         29 0 0 0 1 0 0 0
         30 0 0 0 1 0 0 0
         31 0 0 0 0 0 0 0
         32 0 0 0 0 0 0 0
         33 0 0 0 0 0 0 0
         34 0 0 0 0 0 0 0
         35 0 0 0 1 0 0 0
         36 0 0 0 0 0 0 0
         37 0 0 0 0 0 0 0
         38 0 0 0 0 0 0 0
         39 0 0 0 0 0 0 0
         40 0 0 0 1 0 0 0
         41 0 0 0 1 0 0 0
         42 0 0 0 1 0 0 0
         43 0 0 0 0 0 0 0
         44 0 0 0 1 0 0 0
         45 0 0 0 0 0 0 0
         46 0 0 0 0 0 0 0
         47 0 0 0 0 0 0 0
         48 0 0 0 1 0 0 0
         49 0 0 0 0 0 0 0
         50 0 0 0 1 0 0 0
         51 0 0 0 0 0 0 0
         52 0 0 0 0 0 0 0
         53 0 0 0 0 0 0 0
         54 0 0 0 0 0 0 0
         55 0 0 0 0 0 0 0
         56 0 0 0 1 0 0 0
         57 0 0 0 1 0 0 0
         58 0 0 0 0 0 0 0
         59 0 0 0 0 0 0 0
         60 0 0 0 1 0 0 0
         61 0 0 0 1 0 0 0
         62 0 0 0 1 0 0 0
         63 0 0 0 0 0 0 0
         64 0 0 0 0 0 0 0
         65 0 0 0 0 0 0 0
         66 0 0 0 0 0 0 0
         67 0 0 0 0 0 0 0
         68 0 0 0 1 0 0 0
         69 0 0 0 1 0 0 0
         70 0 0 0 0 0 0 0
         71 0 0 0 0 0 0 0
         72 0 0 0 0 0 0 0
         73 0 0 0 0 0 0 0
         74 0 0 0 1 0 0 0
         75 0 0 0 0 0 0 0
         76 0 0 0 0 0 0 0
         77 0 0 0 0 0 0 0
         78 0 0 0 0 0 0 0
         79 0 0 0 0 0 0 0
         80 0 0 0 0 0 0 0
         81 0 0 0 0 0 0 0
         82 0 0 0 0 0 0 0
         83 0 0 0 1 0 0 0
         84 0 0 0 0 0 0 0
         85 0 0 0 1 0 0 0
         86 0 0 0 1 0 0 0
         87 0 0 0 1 0 0 0
         88 0 0 0 0 0 0 0
         89 0 0 0 1 0 0 0
         90 0 0 0 0 0 0 0
         91 0 0 0 0 0 0 0
         92 0 0 0 0 0 0 0
         93 0 0 0 0 0 0 0
         94 0 0 0 0 0 0 0
         95 0 0 0 0 0 0 0
         96 0 0 0 1 0 0 0
         97 0 0 0 1 0 0 0
         98 0 0 0 1 0 0 0
         99 0 0 0 1 0 0 0
        100 0 0 0 1 0 0 0
        end
        I do understand loops and collapses, but would you explain to me briefly how does the code work?

        Thank you very much!

        Comment


        • #5
          So, we loop over the different values of key. For each of them, we create a dichotomous 0/1 variable: 1 if the value of KEY equals the current value of `k' in the loop, 0 otherwise.

          Now, let's unpack the -collapse- command. Imagine an ID where at least one observation for that ID has KEY = 2. Then for that observation dichot_2 == 1. All values of dichot_2 are either 0 or 1. So the maximum value of dichot_2 will be 1. But if we imagine a different ID for which KEY is never 2, then dichot_2 will always be 0 for that ID and the maximum value of dichot_2 will be 0. Voila!

          Comment


          • #6
            Originally posted by Clyde Schechter View Post
            So, we loop over the different values of key. For each of them, we create a dichotomous 0/1 variable: 1 if the value of KEY equals the current value of `k' in the loop, 0 otherwise.

            Now, let's unpack the -collapse- command. Imagine an ID where at least one observation for that ID has KEY = 2. Then for that observation dichot_2 == 1. All values of dichot_2 are either 0 or 1. So the maximum value of dichot_2 will be 1. But if we imagine a different ID for which KEY is never 2, then dichot_2 will always be 0 for that ID and the maximum value of dichot_2 will be 0. Voila!
            Thank you so much Clyde!

            Comment

            Working...
            X