Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating a Unique within group Identifier

    I have data on clusters. Within each cluster (read "administrative rural village"), I have a number of observations. I now want to generate unique Identifier for every observation assigning a
    HTML Code:
    ==1
    to the highest value for each cluster,
    HTML Code:
    ==2
    for the second highest value and so on for all clusters. Thus I will be able to aptly use a qualification
    HTML Code:
    ..,if id==1
    when I want to analyse just the highest scores for all clusters in the dataset. I am having problems figuring out how to go about it.

    Thanks

  • #2
    George: You should know Statalist well enough to know that we ask for data examples. I could guess at your data structure, variable names, and so forth, but it's your job to show us more (please). Highest value of what? What do you want to happen if there are ties?

    Alternatively, isn't this just egen's rank() function?

    Comment


    • #3
      Now, here is what i mean and for a sample example of my dataset, I have a
      Code:
      district_id
      variable that uniquely identify's my cluster. As in the example
      Code:
      239
      refers to District ID while the variable turnout_total refers to the number in each village in the district that turned up for a relief ration. I have many districts with a similar way of presentation but different
      Code:
      district_id
      . I have sorted the data such that for each district, I have the highest village turnout to the least. I now want to have a new unique ID with ==1 for the highest turnout in each district followed by ==2 for the second highest in that sequence. An example dataset for one district is as this;

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input int(district_id turnout_total)
      239 959
      239 902
      239 892
      239 841
      239 806
      239 769
      239 764
      239 746
      239 736
      239 719
      239 705
      239 700
      239 690
      239 687
      239 683
      239 678
      239 677
      239 676
      239 676
      239 675
      end

      Comment


      • #4
        Thanks. So you can use egen here.

        Code:
        egen rank = rank(-turnout), by(district_id)
        Don't forget the minus sign.

        (You didn't answer my question about ties.)

        Comment


        • #5
          Admittedly, I had not thought about such a situation with tallying turnouts but for the purposes of my analysis, I would wish to have the tallies assigned the same value. That is to mean, if there are two or more villages with equal highest turnout in a given district, assign all ==1. Running the egen command above works but seems to average the observations that have tied.

          Comment


          • #6
            The unique option seems to arbitrarily assign a successive rank for the ties or I did not grasp properly how to instrument the option.

            Comment


            • #7
              Correct about the unique option -- which is really intended for graphical purposes -- but is the field option then what you need?

              (Note: when Rich Goldstein and I wrote the original code behind these options about 1999 I searched high and low for standard names: in the end I gave up and invented "field" and "track". Even as someone with minimal interest in sports I was aware that big numbers win in field events and small numbers win in track events. Rich may have alternative facts.)

              Since then I discovered "schoolmaster's rank" as an alternative in print to "field", which is probably too obscure (hinging on certain British practices probably now obsolete) and too sexist to pass muster, and in any case what about track?)

              Code:
              . sysuse auto, clear
              (1978 Automobile Data)
              
              . egen rank = rank(mpg), field
              
              . sort mpg rank
              
              . sort rank
              
              . list rank  mpg, sepby(rank)
              
                   +------------+
                   | rank   mpg |
                   |------------|
                1. |    1    41 |
                   |------------|
                2. |    2    35 |
                3. |    2    35 |
                   |------------|
                4. |    4    34 |
                   |------------|
                5. |    5    31 |
                   |------------|
                6. |    6    30 |
                7. |    6    30 |
                   |------------|
                8. |    8    29 |
                   |------------|
                9. |    9    28 |
               10. |    9    28 |
               11. |    9    28 |
                   |------------|
               12. |   12    26 |
               13. |   12    26 |
               14. |   12    26 |
                   |------------|
               15. |   15    25 |
               16. |   15    25 |
               17. |   15    25 |
               18. |   15    25 |
               19. |   15    25 |
                   |------------|
               20. |   20    24 |
               21. |   20    24 |
               22. |   20    24 |
               23. |   20    24 |
                   |------------|
               24. |   24    23 |
               25. |   24    23 |
               26. |   24    23 |
                   |------------|
               27. |   27    22 |
               28. |   27    22 |
               29. |   27    22 |
               30. |   27    22 |
               31. |   27    22 |
                   |------------|
               32. |   32    21 |
               33. |   32    21 |
               34. |   32    21 |
               35. |   32    21 |
               36. |   32    21 |
                   |------------|
               37. |   37    20 |
               38. |   37    20 |
               39. |   37    20 |
                   |------------|
               40. |   40    19 |
               41. |   40    19 |
               42. |   40    19 |
               43. |   40    19 |
               44. |   40    19 |
               45. |   40    19 |
               46. |   40    19 |
               47. |   40    19 |
                   |------------|
               48. |   48    18 |
               49. |   48    18 |
               50. |   48    18 |
               51. |   48    18 |
               52. |   48    18 |
               53. |   48    18 |
               54. |   48    18 |
               55. |   48    18 |
               56. |   48    18 |
                   |------------|
               57. |   57    17 |
               58. |   57    17 |
               59. |   57    17 |
               60. |   57    17 |
                   |------------|
               61. |   61    16 |
               62. |   61    16 |
               63. |   61    16 |
               64. |   61    16 |
                   |------------|
               65. |   65    15 |
               66. |   65    15 |
                   |------------|
               67. |   67    14 |
               68. |   67    14 |
               69. |   67    14 |
               70. |   67    14 |
               71. |   67    14 |
               72. |   67    14 |
                   |------------|
               73. |   73    12 |
               74. |   73    12 |
                   +------------+

              Comment


              • #8
                Thank you so much. The expound on field and track option is undoubtedly my today's big "learn". With the sequence of code, I think track works great for me and gives the answer am looking for. Field leads to numbers larger than the score by some +1 (obvious from its construct). Their arguments seem to be a mirror of the other and thus allows individuals to have both approaches - mine and the opposite of it. Appreciated!

                Comment

                Working...
                X