Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating Clusters for cell-mates

    Hi Everybody

    I have the following data including three variables with (i) "person" identifying a subject in the study, (ii) "mate" identifying a subject that has spent some time in the same room as the "person" and (iii) the time the pair of "person" and "mate" actually spent in the same room in hours. Please see the example below.

    Now I would like to create a variable that puts these observations into corresponding clusters if they spent time together in the same room. For example, for person 1, they would get the same cluster ID as person 49, but not as 75, since they spent time in the same room, but had no overlap. Person 2 would get the same cluster ID as 77 and 85, but not 91, and so forth. Can you help me?

    Thanks a lot!


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int(person mate) double overlap_duration
     1  49 13.207777777777778
     1  75                  0
     2  77  3.112777777709961
     2  85 14.054722222154405
     2  91                  0
     3  34                  0
     3 108 13.466666666734483
     4  90                  0
     5  57                  0
     5  70 15.940277777845594
     5 105                  0
     6  36                  0
     6  15                  0
     7   .                  .
     8  93                  0
     8  19 10.896944444512261
     9   .                  .
    10  31                  0
    10  87                  0
    10 134                  0
    10  23                  0
    10  98                  0
    11  27              2.435
    11  13 2.3219444443766277
    12  17                  0
    13  27                  0
    13  11 2.3219444443766277
    14  24 13.554444444512262
    14  44 12.386666666666667
    15   6                  0
    15  36                  0
    16  35                  0
    16 103  16.77027777777778
    17  12                  0
    18  89                  0
    19   8 10.896944444512261
    19  93                  0
    20  52                  0
    20  96 17.812777777845593
    20  65                  0
    21  78  23.52833333326552
    21 133                  0
    22  47                  0
    22  42  23.80388888888889
    23  31                  0
    23  87 12.374166666666667
    23  98                  0
    23  10                  0
    23 134                  0
    24  44                  0
    24  14 13.554444444512262
    25  83  2.137777777777778
    25  29 16.116666666666667
    26  64                  0
    27  13                  0
    27  11              2.435
    28 104                  0
    28 117                  0
    28 129 12.519722222222223
    28  99                  0
    29  25 16.116666666666667
    29  83                  0
    30  80                  0
    30 122                  0
    30  54  18.00861111117893
    31  10                  0
    31  23                  0
    31 134                  0
    31  87  5.771111111043294
    31  98                  0
    32  81                  0
    32 106            27.5575
    33   .                  .
    34   3                  0
    34 108                  0
    35 103                  0
    35  16                  0
    36  15                  0
    36   6                  0
    37 112 11.449444444444444
    38 101                  0
    38  71  10.95111111111111
    39  51                  0
    40  50 18.378333333265516
    40 132                  0
    40 124                  0
    41   .                  .
    42  47                  0
    42  22  23.80388888888889
    43  73                  0
    44  14 12.386666666666667
    44  24                  0
    45  60                  0
    45  56                  0
    46 116                  0
    46  82                  0
    47  22                  0
    47  42                  0
    48   .                  .
    49   1 13.207777777777778
    end


  • #2
    Code:
    ssc install group_id, replace
    Code:
    gen cluster= cond(missing(mate), -_n, mate)
    group_id cluster, matchby(person)
    Res.:

    Code:
    . sort cluster person
    
    . l, sepby(cluster)
    
         +-------------------------------------+
         | person   mate   overlap~n   cluster |
         |-------------------------------------|
      1. |     48      .           .       -99 |
         |-------------------------------------|
      2. |     41      .           .       -87 |
         |-------------------------------------|
      3. |     33      .           .       -73 |
         |-------------------------------------|
      4. |      9      .           .       -17 |
         |-------------------------------------|
      5. |      7      .           .       -14 |
         |-------------------------------------|
      6. |     49      1   13.207778         1 |
         |-------------------------------------|
      7. |      3     34           0         3 |
      8. |      3    108   13.466667         3 |
      9. |     34      3           0         3 |
     10. |     34    108           0         3 |
         |-------------------------------------|
     11. |      6     15           0         6 |
     12. |      6     36           0         6 |
     13. |     15     36           0         6 |
     14. |     15      6           0         6 |
     15. |     36     15           0         6 |
     16. |     36      6           0         6 |
         |-------------------------------------|
     17. |      8     93           0         8 |
     18. |      8     19   10.896944         8 |
     19. |     19     93           0         8 |
     20. |     19      8   10.896944         8 |
         |-------------------------------------|
     21. |     10     23           0        10 |
     22. |     10     87           0        10 |
     23. |     10     98           0        10 |
     24. |     10    134           0        10 |
     25. |     10     31           0        10 |
     26. |     23     31           0        10 |
     27. |     23    134           0        10 |
     28. |     23     87   12.374167        10 |
     29. |     23     98           0        10 |
     30. |     23     10           0        10 |
     31. |     31     10           0        10 |
     32. |     31     23           0        10 |
     33. |     31     98           0        10 |
     34. |     31    134           0        10 |
     35. |     31     87   5.7711111        10 |
         |-------------------------------------|
     36. |     11     27       2.435        11 |
     37. |     11     13   2.3219444        11 |
     38. |     13     27           0        11 |
     39. |     13     11   2.3219444        11 |
     40. |     27     13           0        11 |
     41. |     27     11       2.435        11 |
         |-------------------------------------|
     42. |     17     12           0        12 |
         |-------------------------------------|
     43. |     14     44   12.386667        14 |
     44. |     14     24   13.554444        14 |
     45. |     24     44           0        14 |
     46. |     24     14   13.554444        14 |
     47. |     44     14   12.386667        14 |
     48. |     44     24           0        14 |
         |-------------------------------------|
     49. |     16     35           0        16 |
     50. |     16    103   16.770278        16 |
     51. |     35     16           0        16 |
     52. |     35    103           0        16 |
         |-------------------------------------|
     53. |     12     17           0        17 |
         |-------------------------------------|
     54. |     22     47           0        22 |
     55. |     22     42   23.803889        22 |
     56. |     42     47           0        22 |
     57. |     42     22   23.803889        22 |
     58. |     47     42           0        22 |
     59. |     47     22           0        22 |
         |-------------------------------------|
     60. |     25     83   2.1377778        25 |
     61. |     25     29   16.116667        25 |
     62. |     29     83           0        25 |
     63. |     29     25   16.116667        25 |
         |-------------------------------------|
     64. |      1     49   13.207778        49 |
     65. |      1     75           0        49 |
         |-------------------------------------|
     66. |     40     50   18.378333        50 |
     67. |     40    132           0        50 |
     68. |     40    124           0        50 |
         |-------------------------------------|
     69. |     39     51           0        51 |
         |-------------------------------------|
     70. |     20     96   17.812778        52 |
     71. |     20     52           0        52 |
     72. |     20     65           0        52 |
         |-------------------------------------|
     73. |     30    122           0        54 |
     74. |     30     80           0        54 |
     75. |     30     54   18.008611        54 |
         |-------------------------------------|
     76. |     45     56           0        56 |
     77. |     45     60           0        56 |
         |-------------------------------------|
     78. |      5    105           0        57 |
     79. |      5     70   15.940278        57 |
     80. |      5     57           0        57 |
         |-------------------------------------|
     81. |     26     64           0        64 |
         |-------------------------------------|
     82. |     38    101           0        71 |
     83. |     38     71   10.951111        71 |
         |-------------------------------------|
     84. |     43     73           0        73 |
         |-------------------------------------|
     85. |      2     91           0        77 |
     86. |      2     85   14.054722        77 |
     87. |      2     77   3.1127778        77 |
         |-------------------------------------|
     88. |     21    133           0        78 |
     89. |     21     78   23.528333        78 |
         |-------------------------------------|
     90. |     32     81           0        81 |
     91. |     32    106     27.5575        81 |
         |-------------------------------------|
     92. |     46    116           0        82 |
     93. |     46     82           0        82 |
         |-------------------------------------|
     94. |     18     89           0        89 |
         |-------------------------------------|
     95. |      4     90           0        90 |
         |-------------------------------------|
     96. |     28    104           0        99 |
     97. |     28    117           0        99 |
     98. |     28     99           0        99 |
     99. |     28    129   12.519722        99 |
         |-------------------------------------|
    100. |     37    112   11.449444       112 |
         +-------------------------------------+

    Comment


    • #3
      Thanks Andrew Musau , that's almost it! Now it just gives me everybody that has been together in a cell in the same cluster. I want something just slightly different: it should only put people together in a cluster, if the time they spent in a cell together (overlap_duration) is not zero. Can you help me with this?

      Comment


      • #4
        Can you provide an example of how you want the output to look like for a few cases?

        Comment


        • #5
          For example, in your output the persons 3, 34 and 108 are all put together in cluster 3, since they all were in the same room. However, only 3 and 108 spent time together in that room (overlap = 13.46h). Thus, I'd like to have my code only put 3 and 108 in the same cluster and not 34, since 34 was also in the same room, but did not spend any time with 3 or with 108 in the same room. Does that make sense?

          Maybe a more intuitive explanation is the following: I'm trying to build clusters for groups of people that could have influences each other. I'm not really interested in clustering people that were in the same room, but groups of people that were in the same room and actually had the chance to talk to each other. Therefore, I'm not interested in putting 108 in the same cluster, since they did not have the chance to talk to 3 or 34 directly or indirectly.
          Last edited by Arto Arman; 14 Dec 2023, 00:19.

          Comment


          • #6
            Still not clear: if a cluster identifies a group of people, shouldn't it contain all observations of those people? Or are you implying that the same person can be in more than one cluster? The clarification is intended to illustrate how you want to handle such exclusions and is best demonstrated with a data example, showing how you want the result to look rather than describing it in words.

            Comment


            • #7
              Yes, but the cluster should identify only people that could influence each other, aka spent some time together. I created some example data that might be better at showing what I need:


              Code:
              * Example generated by -dataex-. For more info, type help dataex
              clear
              input float(ID mate overlap cluster cell)
               1  . . 1 101
               2  3 5 2 122
               2 11 0 2 122
               3  2 5 2 122
               3 11 2 2 122
               4  6 0 3 193
               5  . . 4 104
               6  4 0 5 193
               7  . . 6 202
               8  9 0 7 274
               8 12 0 7 274
               8 13 0 7 274
               9  8 0 8 274
               9 12 4 8 274
               9 13 9 8 274
              10  . . 9 199
              11  2 0 2 122
              11  3 2 2 122
              12  8 0 8 274
              12  9 4 8 274
              12 13 0 8 274
              13  8 0 8 274
              13  9 9 8 274
              13 12 0 8 274
              end

              Comment


              • #8
                Code:
                clear
                input float(ID mate overlap cluster cell)
                 1  . . 1 101
                 2  3 5 2 122
                 2 11 0 2 122
                 3  2 5 2 122
                 3 11 2 2 122
                 4  6 0 3 193
                 5  . . 4 104
                 6  4 0 5 193
                 7  . . 6 202
                 8  9 0 7 274
                 8 12 0 7 274
                 8 13 0 7 274
                 9  8 0 8 274
                 9 12 4 8 274
                 9 13 9 8 274
                10  . . 9 199
                11  2 0 2 122
                11  3 2 2 122
                12  8 0 8 274
                12  9 4 8 274
                12 13 0 8 274
                13  8 0 8 274
                13  9 9 8 274
                13 12 0 8 274
                end
                
                bys ID: egen tag= max(overlap<. & overlap)
                gen cluster2= cond(!tag, -_n, mate)
                group_id cluster2, matchby(ID)
                Res.:

                Code:
                . sort cluster ID
                
                . l, sepby(cluster)
                
                     +-------------------------------------------------------+
                     | ID   mate   overlap   cluster   cell   tag   cluster2 |
                     |-------------------------------------------------------|
                  1. |  1      .         .         1    101     0         -1 |
                     |-------------------------------------------------------|
                  2. |  2     11         0         2    122     1          2 |
                  3. |  2      3         5         2    122     1          2 |
                  4. |  3     11         2         2    122     1          2 |
                  5. |  3      2         5         2    122     1          2 |
                  6. | 11      3         2         2    122     1          2 |
                  7. | 11      2         0         2    122     1          2 |
                     |-------------------------------------------------------|
                  8. |  4      6         0         3    193     0         -6 |
                     |-------------------------------------------------------|
                  9. |  5      .         .         4    104     0         -7 |
                     |-------------------------------------------------------|
                 10. |  6      4         0         5    193     0         -8 |
                     |-------------------------------------------------------|
                 11. |  7      .         .         6    202     0         -9 |
                     |-------------------------------------------------------|
                 12. |  8     13         0         7    274     0        -12 |
                 13. |  8     12         0         7    274     0        -12 |
                 14. |  8      9         0         7    274     0        -12 |
                     |-------------------------------------------------------|
                 15. |  9     12         4         8    274     1          8 |
                 16. |  9      8         0         8    274     1          8 |
                 17. |  9     13         9         8    274     1          8 |
                 18. | 12      8         0         8    274     1          8 |
                 19. | 12      9         4         8    274     1          8 |
                 20. | 12     13         0         8    274     1          8 |
                 21. | 13     12         0         8    274     1          8 |
                 22. | 13      8         0         8    274     1          8 |
                 23. | 13      9         9         8    274     1          8 |
                     |-------------------------------------------------------|
                 24. | 10      .         .         9    199     0        -16 |
                     +-------------------------------------------------------+

                Comment


                • #9
                  that's it! thanks a lot!

                  Comment

                  Working...
                  X