RANK.AVG equivalent

Constantin Alba

Join Date: Sep 2014

Posts: 80
#1

RANK.AVG equivalent

14 Sep 2016, 06:14

Hi,

I wonder whether there is something equivalent to Excel's RANK.AVG function in stata?
I only found regular egen =rank() function

thank you

C.
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35613
#2

14 Sep 2016, 07:32

Please explain what it does precisely for the benefit of people who don't use Excel.
Comment
Constantin Alba

Join Date: Sep 2014

Posts: 80
#3

14 Sep 2016, 07:38

Originally posted by Nick Cox View Post

Please explain what it does precisely for the benefit of people who don't use Excel.

You right, apologies.

RANK.AVG: Returns the rank of a number in a list of numbers: its size relative to other values in the list; if more than one value has the same rank (A TIE), the average rank is returned.

Closest stata command is egen = rank(), but it has no correction for A TIE, like in rank.avg
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35613
#4

14 Sep 2016, 07:46

The default of that function does adjust for ties: check e.g.

Code:

sysuse auto, clear egen rank = rank(mpg) tabdisp mpg, c(rank)
Comment
Constantin Alba

Join Date: Sep 2014

Posts: 80
#5

14 Sep 2016, 07:56

Originally posted by Nick Cox View Post

The default of that function does adjust for ties: check e.g.

Code:

sysuse auto, clear egen rank = rank(mpg) tabdisp mpg, c(rank)

You right, I missed this in the description. However, two problems:
1. I am using "field" option as I need the highest value to be ranked 1. The default ranks the smallest value as #1
2. I run your code, here is the output (top 4):
-------------------------------
(mpg) | rank of (mpg)
----------+--------------------
12 | 1.5
14 | 5.5
15 | 9.5
16 | 12.5

All numbers are different, i.e. there are no equal observations and no averaging suppose to happen.
But this does not happen. I expected the ranks to be 1, 2, 3, 4 in the above case and 1, 2.5, 2.5, 4 in the following case:

-------------------------------
(mpg) | rank of (mpg)
----------+--------------------
12 | 1
14 | 2.5
14 | 2.5
16 | 4
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35613
#6

14 Sep 2016, 08:06

I think you're misinterpreting tabdisp, which shows distinct values but not their frequencies unless that is a supplied variable.

See in conjunction with

Code:

tabulate mpg
Comment
Constantin Alba

Join Date: Sep 2014

Posts: 80
#7

14 Sep 2016, 08:10

Originally posted by Nick Cox View Post

I think you're misinterpreting tabdisp, which shows distinct values but not their frequencies unless that is a supplied variable.

See in conjunction with

Code:

tabulate mpg

yes, no I see, thanks. but how i can get the averaging and "field" ranking together? i.e. ranking the largest value first?

thanks!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35613
#8

14 Sep 2016, 08:25

If you specify -field- tied values get the same rank, just as with other options.
Comment
Constantin Alba

Join Date: Sep 2014

Posts: 80
#9

14 Sep 2016, 08:32

You right, once again. I checked it with the auto dataset. What confused me is the description of the rank() function. It says (at least this is how I understood it) that there no averaging

The field option calculates the field rank of exp: the highest value is ranked 1, and there is no correction for ties. That is, the field rank is 1 + the number of values that are higher.
Comment
Constantin Alba

Join Date: Sep 2014

Posts: 80
#10

14 Sep 2016, 08:37

Actually, as I can see at the second look - there is no averaging, when -field- is used, it just keeps the same rank. Not sure if it matters, but it does affect the actual values in my later calculations based on rank - rank value is used in the calculations and not just for ranking

Last edited by Constantin Alba; 14 Sep 2016, 08:46.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35613
#11

14 Sep 2016, 08:59

I am puzzled on what you want here as your desiderata are quite contradictory: you can't insist that ranks are as low as possible and also that the average rank is preserved. The point of the variant options, which mostly go back to code Richard Goldstein and I wrote in 1999, is that one desideratum or another may be key in particular problems, even insisting on unique ranks (which makes sense for various graphs).
Comment
Constantin Alba

Join Date: Sep 2014

Posts: 80
#12

14 Sep 2016, 17:44

Originally posted by Nick Cox View Post

I am puzzled on what you want here as your desiderata are quite contradictory: you can't insist that ranks are as low as possible and also that the average rank is preserved. The point of the variant options, which mostly go back to code Richard Goldstein and I wrote in 1999, is that one desideratum or another may be key in particular problems, even insisting on unique ranks (which makes sense for various graphs).

I am looking to do -field- ranking as it is currently performed by stata, but with one small difference, instead of keeping the same rank for equal observations, average the rank among them. Example:

current rank (mpg) , field output:

-------------------------------
(mpg) | rank of (mpg)
----------+--------------------
16 | 1
14 | 2
14 | 2
12 | 4

desired:

-------------------------------
(mpg) | rank of (mpg)
----------+--------------------
16 | 1
14 | 2.5
14 | 2.5
12 | 4

2.5 rank was calculated as an average of 2 and 3: (rank 2 + rank 3) /2

At the next stage I use rank as an input to a formula to calculate a concentration index 1 / [ 2* Sigma (rank * ranked_var) - 1]
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4457
#13

14 Sep 2016, 19:03

this looks to me like the reverse of the default method; so, I would use the default method and then, for the example shown, reverse by subtracting each value from 5:

Code:

. input x x 1. 16 2. 14 3. 14 4. 12 5. end egen rank=rank(x) replace rank=5-rank . li x rank, clean x rank 1. 16 1 2. 14 2.5 3. 14 2.5 4. 12 4
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35613
#14

15 Sep 2016, 01:33

I see; you just want ranks reversed. Rich's solution is fine; here's another one. it's explicit in the help that you can rank expressions (not just variables) and explicit in the manual entry

http://www.stata.com/manuals14/degen.pdf

that

Most applications of rank() will be to one variable, but the argument exp can be more general, namely, an expression. In particular, rank(-varname) reverses ranks from those obtained by rank(varname).

Thus, this works in the example given:

Code:

input x 16 14 14 12 end egen rank = rank(-x) li x rank, clean
1 like
Comment
Constantin Alba

Join Date: Sep 2014

Posts: 80
#15

15 Sep 2016, 06:04

Thank you, Rich, your solution does work, i considered it before, but for some reason thought it won't work.

Nick, thank you for your patience in responses, your solution is just great. Simple yet elegant
Comment

Announcement

RANK.AVG equivalent

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment