Generating a Unique within group Identifier

George Kariuki

Join Date: Jul 2015

Posts: 93
#1

Generating a Unique within group Identifier

03 Mar 2017, 05:56

I have data on clusters. Within each cluster (read "administrative rural village"), I have a number of observations. I now want to generate unique Identifier for every observation assigning a

HTML Code:

==1

to the highest value for each cluster,

HTML Code:

==2

for the second highest value and so on for all clusters. Thus I will be able to aptly use a qualification

HTML Code:

..,if id==1

when I want to analyse just the highest scores for all clusters in the dataset. I am having problems figuring out how to go about it.

Thanks
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35717
#2

03 Mar 2017, 06:41

George: You should know Statalist well enough to know that we ask for data examples. I could guess at your data structure, variable names, and so forth, but it's your job to show us more (please). Highest value of what? What do you want to happen if there are ties?

Alternatively, isn't this just egen's rank() function?
Comment
George Kariuki

Join Date: Jul 2015

Posts: 93
#3

03 Mar 2017, 08:25

Now, here is what i mean and for a sample example of my dataset, I have a

Code:

district_id

variable that uniquely identify's my cluster. As in the example

Code:

239

refers to District ID while the variable turnout_total refers to the number in each village in the district that turned up for a relief ration. I have many districts with a similar way of presentation but different

Code:

district_id

. I have sorted the data such that for each district, I have the highest village turnout to the least. I now want to have a new unique ID with ==1 for the highest turnout in each district followed by ==2 for the second highest in that sequence. An example dataset for one district is as this;

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input int(district_id turnout_total) 239 959 239 902 239 892 239 841 239 806 239 769 239 764 239 746 239 736 239 719 239 705 239 700 239 690 239 687 239 683 239 678 239 677 239 676 239 676 239 675 end
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35717
#4

03 Mar 2017, 08:30

Thanks. So you can use egen here.

Code:

egen rank = rank(-turnout), by(district_id)

Don't forget the minus sign.

(You didn't answer my question about ties.)
Comment
George Kariuki

Join Date: Jul 2015

Posts: 93
#5

03 Mar 2017, 08:53

Admittedly, I had not thought about such a situation with tallying turnouts but for the purposes of my analysis, I would wish to have the tallies assigned the same value. That is to mean, if there are two or more villages with equal highest turnout in a given district, assign all ==1. Running the egen command above works but seems to average the observations that have tied.
Comment
George Kariuki

Join Date: Jul 2015

Posts: 93
#6

03 Mar 2017, 08:59

The unique option seems to arbitrarily assign a successive rank for the ties or I did not grasp properly how to instrument the option.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35717

03 Mar 2017, 09:10

Correct about the unique option -- which is really intended for graphical purposes -- but is the field option then what you need?

(Note: when Rich Goldstein and I wrote the original code behind these options about 1999 I searched high and low for standard names: in the end I gave up and invented "field" and "track". Even as someone with minimal interest in sports I was aware that big numbers win in field events and small numbers win in track events. Rich may have alternative facts.)

Since then I discovered "schoolmaster's rank" as an alternative in print to "field", which is probably too obscure (hinging on certain British practices probably now obsolete) and too sexist to pass muster, and in any case what about track?)

Code:

. sysuse auto, clear
(1978 Automobile Data)

. egen rank = rank(mpg), field

. sort mpg rank

. sort rank

. list rank  mpg, sepby(rank)

     +------------+
     | rank   mpg |
     |------------|
  1. |    1    41 |
     |------------|
  2. |    2    35 |
  3. |    2    35 |
     |------------|
  4. |    4    34 |
     |------------|
  5. |    5    31 |
     |------------|
  6. |    6    30 |
  7. |    6    30 |
     |------------|
  8. |    8    29 |
     |------------|
  9. |    9    28 |
 10. |    9    28 |
 11. |    9    28 |
     |------------|
 12. |   12    26 |
 13. |   12    26 |
 14. |   12    26 |
     |------------|
 15. |   15    25 |
 16. |   15    25 |
 17. |   15    25 |
 18. |   15    25 |
 19. |   15    25 |
     |------------|
 20. |   20    24 |
 21. |   20    24 |
 22. |   20    24 |
 23. |   20    24 |
     |------------|
 24. |   24    23 |
 25. |   24    23 |
 26. |   24    23 |
     |------------|
 27. |   27    22 |
 28. |   27    22 |
 29. |   27    22 |
 30. |   27    22 |
 31. |   27    22 |
     |------------|
 32. |   32    21 |
 33. |   32    21 |
 34. |   32    21 |
 35. |   32    21 |
 36. |   32    21 |
     |------------|
 37. |   37    20 |
 38. |   37    20 |
 39. |   37    20 |
     |------------|
 40. |   40    19 |
 41. |   40    19 |
 42. |   40    19 |
 43. |   40    19 |
 44. |   40    19 |
 45. |   40    19 |
 46. |   40    19 |
 47. |   40    19 |
     |------------|
 48. |   48    18 |
 49. |   48    18 |
 50. |   48    18 |
 51. |   48    18 |
 52. |   48    18 |
 53. |   48    18 |
 54. |   48    18 |
 55. |   48    18 |
 56. |   48    18 |
     |------------|
 57. |   57    17 |
 58. |   57    17 |
 59. |   57    17 |
 60. |   57    17 |
     |------------|
 61. |   61    16 |
 62. |   61    16 |
 63. |   61    16 |
 64. |   61    16 |
     |------------|
 65. |   65    15 |
 66. |   65    15 |
     |------------|
 67. |   67    14 |
 68. |   67    14 |
 69. |   67    14 |
 70. |   67    14 |
 71. |   67    14 |
 72. |   67    14 |
     |------------|
 73. |   73    12 |
 74. |   73    12 |
     +------------+

Comment

George Kariuki

Join Date: Jul 2015

Posts: 93
#8

03 Mar 2017, 09:45

Thank you so much. The expound on field and track option is undoubtedly my today's big "learn". With the sequence of code, I think track works great for me and gives the answer am looking for. Field leads to numbers larger than the score by some +1 (obvious from its construct). Their arguments seem to be a mirror of the other and thus allows individuals to have both approaches - mine and the opposite of it. Appreciated!
Comment

Announcement