Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating groups based on the value differences between each other

    Hi,

    probably it's quite simple, but I have been struggling to find the right command.

    I have list of individuals who are ranked based on their performance. I would like to group peers based on two conditions. The group should start ones x >5.5s and group other variables which then are x<2s from each other. "difference" variable identify how far individuals are from each other, thus this should be the variable to condition upon. The groups should repeat by raceid.

    . list raceid finalrank difference new_group in 1/20

    +-----------------------------------------+
    | raceid finalr~k differ~e new_gr~p |
    |-----------------------------------------|
    1. | 36426 1 0:0.0 . |
    2. | 36426 2 0:32.2 1 |
    3. | 36426 3 0:7.3 2 |
    4. | 36426 4 0:6.6 3 |
    5. | 36426 5 0:3.7 . |
    |-----------------------------------------|
    6. | 36426 6 0:6.1 4 |
    7. | 36426 7 0:0.2 4 |
    8. | 36426 8 0:1.2 4 |
    9. | 36426 9 0:3.7 . |
    10. | 36426 10 0:0.6 . |
    |-----------------------------------------|
    11. | 36426 11 0:6.1 5 |
    12. | 36426 12 0:0.6 5 |
    13. | 36426 13 0:2.7 . |
    14. | 36426 14 0:4.4 . |
    15. | 36426 15 0:3.0 . |
    |-----------------------------------------|
    16. | 36426 16 0:0.2 . |
    17. | 36426 17 0:5.5 6 |
    18. | 36426 18 0:2.9 6 |
    19. | 36426 19 0:0.5 6 |
    20. | 36426 20 0:0.5 6 |
    +-------------

    Thank you in advance for any hint!

  • #2
    Welcome to Statalist. Your question is not clear. I would suggest that you manually create a variable called "wanted" illustrating what you want, say from the first 20 observations. Then provide a data example including this variable, e.g., by copying and pasting the result of

    Code:
    dataex raceid finalrank difference new_group wanted in 1/20

    Comment


    • #3
      Thank for your message. The variable new_group is the wanted variable, changed to wanted. The data is already sorted by raceid and finalrank. Now, I am interested in wanted variable, which would group observations together and assign a group number, based on two conditions: 1) the difference is above 5.5 sec and 2) the observations after it are below 2 sec. The groups numbers need to repeat by raceid.

      I provide an example below. 1st group is the individual with 32.2 sec, 2nd group 7.3, 3rd is individual with 6.6 sec, the 4th group is already bigger, because the second condition applies, the individuals are all below 2 sec range.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input long raceid int finalrank float(difference wanted)
      36426  1     0 .
      36426  2 32200 1
      36426  3  7300 2
      36426  4  6600 3
      36426  5  3700 .
      36426  6  6100 4
      36426  7   200 4
      36426  8  1200 4
      36426  9  3700 .
      36426 10   600 .
      36426 11  6100 5
      36426 12   600 5
      36426 13  2700 .
      36426 14  4400 .
      36426 15  3000 .
      36426 16   200 .
      36426 17  5500 6
      36426 18  2900 6
      36426 19   500 6
      36426 20   500 6
      end
      format %tcmm:ss.s difference

      I hope this is makes more sense.

      Comment


      • #4
        Thanks for the data example. I guess that there is an error in the final group as 2900 milliseconds is more than 2 seconds. Here is some technique:

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input long raceid int finalrank float(difference wanted)
        36426  1     0 .
        36426  2 32200 1
        36426  3  7300 2
        36426  4  6600 3
        36426  5  3700 .
        36426  6  6100 4
        36426  7   200 4
        36426  8  1200 4
        36426  9  3700 .
        36426 10   600 .
        36426 11  6100 5
        36426 12   600 5
        36426 13  2700 .
        36426 14  4400 .
        36426 15  3000 .
        36426 16   200 .
        36426 17  5500 6
        36426 18  2900 6
        36426 19   500 6
        36426 20   500 6
        end
        format %tcmm:ss.s difference
        
        bys raceid (finalrank): g Wanted= sum(difference>=(5.5*1000)) if difference>=(5.5*1000)
        replace Wanted= Wanted[_n-1] if (diff<(2*1000)) & missing(Wanted) & !missing(Wanted[_n-1])
        Res.:

        Code:
        . l, sep(0)
        
             +------------------------------------------------+
             | raceid   finalr~k   differ~e   wanted   Wanted |
             |------------------------------------------------|
          1. |  36426          1      0:0.0        .        . |
          2. |  36426          2     0:32.2        1        1 |
          3. |  36426          3      0:7.3        2        2 |
          4. |  36426          4      0:6.6        3        3 |
          5. |  36426          5      0:3.7        .        . |
          6. |  36426          6      0:6.1        4        4 |
          7. |  36426          7      0:0.2        4        4 |
          8. |  36426          8      0:1.2        4        4 |
          9. |  36426          9      0:3.7        .        . |
         10. |  36426         10      0:0.6        .        . |
         11. |  36426         11      0:6.1        5        5 |
         12. |  36426         12      0:0.6        5        5 |
         13. |  36426         13      0:2.7        .        . |
         14. |  36426         14      0:4.4        .        . |
         15. |  36426         15      0:3.0        .        . |
         16. |  36426         16      0:0.2        .        . |
         17. |  36426         17      0:5.5        6        6 |
         18. |  36426         18      0:2.9        6        . |
         19. |  36426         19      0:0.5        6        . |
         20. |  36426         20      0:0.5        6        . |
             +------------------------------------------------+

        Comment

        Working...
        X