Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merge groups if they have at least one common ID

    Hello.

    I have the following dataset with variables id and group. IDs are repeated because right now they are allowed to be in more than one group and my dataset is in long format. Ultimately, what I would like to do is to merge groups if they have at least one common ID, something along the lines of variable idgroup.
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float id str1 group str2 idgroup
    1 "A" "AD"
    2 "B" "B"
    3 "C" "CG"
    1 "D" "AD"
    4 "E" "E"
    5 "F" "F"
    3 "G" "CG"
    6 "A" "AD"
    end
    However, although I think I might need to use something along the lines of ‘cond’ to generate idgroup, I am still missing how could I include id values to determine whether to capture the value of group. Also, I would need to make sure the variable idgroup is actually merging groups with at least one common id. For example, id==6 should be part of the idgroup==”AD” since they share id==1 even though id==6 only appears on group==”A”. I appreciate any suggestions.

  • #2
    The following uses group_id from SSC by Robert Picard. Because you have a string variable, you need to be careful when encoding it such that its values are different from your id values. The condition that I specify below is that its values are negative because id in your example is strictly positive.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float id str1 group str2 idgroup
    1 "A" "AD"
    2 "B" "B"
    3 "C" "CG"
    1 "D" "AD"
    4 "E" "E"
    5 "F" "F"
    3 "G" "CG"
    6 "A" "AD"
    end
    
    
    encode group, gen(ngroup)
    replace ngroup= -5000-ngroup
    gen obsid=_n
    expand 2
    bysort obsid: gen long idpair = cond(_n == 1, id, ngroup)
    clonevar wanted = id
    *ssc install group_id
    group_id wanted, matchby(idpair)
    bysort obsid: keep if _n == 1
    sort wanted
    l, sepby(wanted)

    Code:
     
    . l, sepby(wanted)
    
         +---------------------------------------------------------+
         | id   group   idgroup   ngroup   obsid   idpair   wanted |
         |---------------------------------------------------------|
      1. |  1       D        AD    -5004       4        1        1 |
      2. |  6       A        AD    -5001       8        6        1 |
      3. |  1       A        AD    -5001       1        1        1 |
         |---------------------------------------------------------|
      4. |  2       B         B    -5002       2    -5002        2 |
         |---------------------------------------------------------|
      5. |  3       G        CG    -5007       7        3        3 |
      6. |  3       C        CG    -5003       3    -5003        3 |
         |---------------------------------------------------------|
      7. |  4       E         E    -5005       5        4        4 |
         |---------------------------------------------------------|
      8. |  5       F         F    -5006       6        5        5 |
         +---------------------------------------------------------+

    Comment


    • #3
      Hi Andrew.

      Thank you so much. Your solution is much better than I expected as it also already returns me the lower id as the grouping variable under wanted.

      Thanks again!

      Comment

      Working...
      X