RANKING

Guest
#1

RANKING

24 Aug 2014, 01:35

Hello everyone!

I recently started using stata and I am facing some difficulties.

I have a dataset of athletes and their scores on the Olympic games. I have compute the rank of each athlete in every stage of the competition. What I want to do now is to compute for each observation the potential improvement in rank position in case of perfect score for each athlete, given the observed score of all the other competitors.

Any thoughts?

Thanks in advance!
Tags: None
Stephen Jenkins

Join Date: Apr 2014

Posts: 1435
#2

24 Aug 2014, 03:17

Welcome to Statalist, "garbiel"! You are more likely to get helpful responses if you provide some sample data that readers can play with. Look at http://www.statalist.org/forums/foru...s-to-play-with for recommendations about how best to do this. We'd want to see (a) what your data structure currently looks like, and (b) precisely how you'd want the values of the new variables to look like for this sample of observations. You'll probably have to add some remarks about how tied rankings are intended to be handled.
By the way: Forum etiquette -- see the FAQ -- strongly recommends the use of real names (firstname lastname), so please re-register. It's easy to do: hit the Contact Us button at bottom right of screen, and make your request. Thank you.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35708
#3

24 Aug 2014, 03:25

I don't understand the question. If an athlete achieved perfect score, wouldn't they achieve first position, regardless of other competitors?

Otherwise this reminds me of an FAQ:

http://www.stata.com/support/faqs/da...ng-properties/

There is in effect a second edition of that, with better explanations and (in some cases) better solutions:

http://www.stata-journal.com/article...article=dm0075
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35708
#4

24 Aug 2014, 04:50

My previous may have been a red herring. It may be as simple as

1. Compute ranks for actual scores using egen.

2. Change selected scores to perfect.

3. Repeat 1.

4. Calculate difference in ranks.

By the way, I second Stephen's request that you use a full real name.

Last edited by Nick Cox; 24 Aug 2014, 04:53.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#5

24 Aug 2014, 10:05

What I am trying to do is to make a new variable, if possible, which will contain the perfect cumulative score of one athlete and the real observed cumulative score of the other athletes. But I want to do for each and every athlete respectively. And after that to compute the new ranking.

I don't understand this. There are 12 athletes, each with 10 observations. So I don't know how you will fit the real observed cumulative scores of the other 11 athletes into the ten observations allocated to each athlete, nor do I understand how you would decide which other athlete's cumulative score would go into which observations. Can you hand-calculate the variables you want to create for just the first two athletes and show us what the data look like with those included? I don't grasp the result you are looking for.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#6

24 Aug 2014, 17:15

Thank you, but now I'm confused by something else. If an athlete could achieve perfect scores, how could his/her rank be anything other than #1 (unless perhaps there is some athlete in the data who actually achieved perfect scores in all attempts, in which case, they would be tied for #1). I don't understand why you need to compute anything to do this. What am I missing?

Also, your data structure doesn't make sense to me. The variable new_variable in the sample data set, is focused on athlete "a," but it necessarily extends through the observations of the other athletes as well. When you want to focus on athlete "b," will you want yet another new variable. In the end, do you want to create a separate new variable for each athlete?
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5006
#7

24 Aug 2014, 18:54

I think this may be starting to make sense to me. Like Clyde, I was thinking that if you got a perfect score you had to finish first. But perfect score is not the same as highest possible score, right? Rather, the score is the product of how well you did the event (which apparently maxes out at 30?) times the level of difficulty (variable dd). So a perfect score on a low difficulty effort could still place you below an imperfect performance on something that was high difficulty. e.g. somebody could get a 30 for a performance with difficulty level 1.5, yielding a score of 45. But. if somebody else got a 26 with difficulty = 1.8, their score would be 46.8. and they would rank higher.

So, you want to see what the persons rank would be if they had performed perfectly while everybody else had performed the same. This rank would not necessarily be first if whatever it is they are doing did not have that high a level of difficulty.

Just off the top of my head, I am not sure how to do that, but if you confirm that I have described the task correctly maybe we can figure it out from here.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5006
#8

24 Aug 2014, 19:02

Also, your data structure doesn't make sense to me. The variable new_variable in the sample data set, is focused on athlete "a," but it necessarily extends through the observations of the other athletes as well. When you want to focus on athlete "b," will you want yet another new variable. In the end, do you want to create a separate new variable for each athlete?

That might be one way to do it. Clone the score variable. For person one only, replace the score with the so-called perfect score (level of difficulty * 30). Compute the rank for that person.

Then, repeat the process for each subsequent person, creating a new ranking variable each time.

But, that seems pretty unwieldy. I think you would just want one newrank variable, not hundreds. Maybe you would need to set up a loop, where you literally go one case at a time computing the new rank variable. That also seems a bit unwieldy, so maybe there is some other option out there.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment

Richard Williams

Join Date: Apr 2014
Posts: 5006

24 Aug 2014, 20:03

This is a little klutzy, but is it doing what you want?

Code:

bysort athlete (attempt): keep if _n == _N
capture gen newrank = .
forval casenum = 1/ `=_N' {
    capture drop xscore xrank
    clonevar xscore = cumulative_score
    replace xscore = cumulative_perfect_score in `casenum'
    egen xrank = rank(xscore), field
    replace newrank = xrank in `casenum'
}
list athlete cumulative_score rank cumulative_perfect_score newrank


. list athlete cumulative_score rank cumulative_perfect_score newrank

     +------------------------------------------------+
     | athlete   ~e_score   rank   c~perf~e   newrank |
     |------------------------------------------------|
  1. |       a     506.43      5        693         1 |
  2. |       b     481.71      7        705         1 |
  3. |       c     429.18     11        675         1 |
  4. |       d     417.42     12        681         1 |
  5. |       e     431.61     10        681         1 |
     |------------------------------------------------|
  6. |       f      450.3      9        681         1 |
  7. |       g     580.23      1        693         1 |
  8. |       h     465.45      8        693         1 |
  9. |       i     526.65      4        714         1 |
 10. |       j     534.33      2        696         1 |
     |------------------------------------------------|
 11. |       k     498.81      6        693         1 |
 12. |       l     533.19      3        705         1 |
     +------------------------------------------------+

Basically, the program goes person by person, substitutes the perfect socre for their observed score, and figures out what their rank would be if they had been perfect while everyone else stayed the same. Hopefully the data sert is not so enormous that this is painfully slow.

The sample data you gave may not have been the best for testing this, since everybody would have finished first if they had been "perfect" while everyone else stayed the same. So, to make it a little more interesting, in the next variation I lower the perfect scores so they are only 25 points higher than the imperfect scores:

Code:

bysort athlete (attempt): keep if _n == _N
capture gen newrank = .
replace cumulative_perfect_score = cumulative_score + 25
forval casenum = 1/ `=_N' {
    capture drop xscore xrank
    clonevar xscore = cumulative_score
    replace xscore = cumulative_perfect_score in `casenum'
    egen xrank = rank(xscore), field
    replace newrank = xrank in `casenum'
}
list athlete cumulative_score rank cumulative_perfect_score newrank

. list athlete cumulative_score rank cumulative_perfect_score newrank

     +------------------------------------------------+
     | athlete   ~e_score   rank   c~perf~e   newrank |
     |------------------------------------------------|
  1. |       a     531.43      5     556.43         2 |
  2. |       b     506.71      7     531.71         4 |
  3. |       c     454.18     11     479.18         8 |
  4. |       d     442.42     12     467.42         9 |
  5. |       e     456.61     10     481.61         8 |
     |------------------------------------------------|
  6. |       f      475.3      9      500.3         7 |
  7. |       g     580.23      1     605.23         1 |
  8. |       h     465.45      8     490.45         8 |
  9. |       i     526.65      4     551.65         2 |
 10. |       j     534.33      2     559.33         2 |
     |------------------------------------------------|
 11. |       k     498.81      6     523.81         6 |
 12. |       l     533.19      3     558.19         2 |
     +------------------------------------------------+

The way I did it, ranks improve or at least stay the same but the perfect scores aren't so much better that everyone would have finished first if they had been the only one to achieve perfection.

Anyway, see if the first code seems to do what you want, and if so it can probably be made less klutzy.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam

Comment

Richard Williams

Join Date: Apr 2014
Posts: 5006

#10

25 Aug 2014, 07:36

Looking at the first athlete,

Code:

. list athlete attempt dd score rank cumulative_score perfect_score cumulative_perfect_score if athlete=="a"

     +-------------------------------------------------------------------------+
     | athlete   attempt    dd   score   rank   ~e_score   perfec~e   c~perf~e |
     |-------------------------------------------------------------------------|
  1. |       a         1   1.6   34.08     10      34.08         48         48 |
  2. |       a         2   1.8   41.58      7      75.66         54        102 |
  3. |       a         3   1.9   43.89      6     119.55         57        159 |
  4. |       a         4   2.1   49.77      5     169.32         63        222 |
  5. |       a         5   2.1   47.25      6     216.57         63        285 |
     |-------------------------------------------------------------------------|
  6. |       a         6   2.4   53.28      7     269.85         72        357 |
  7. |       a         7   2.8   61.32      5     331.17         84        441 |
  8. |       a         8   2.8   63.84      5     395.01         84        525 |
  9. |       a         9   2.7   46.17      5     441.18         81        606 |
 10. |       a        10   2.9   65.25      5     506.43         87        693 |
     +-------------------------------------------------------------------------+

For the 2nd attempt -- was the rank of 7 based only on their score (41.58) or was it based on their cumulative score (75.66)? In other words, should they be ranked on each attempt separately, or should their rank be based on their cumulative score based on the current and all previous attempts?

Either way, you could probably repeat my earlier code 10 times, each time specifying a different attempt #, and then saving the files and merging together:

Code:

keep if attempt == 1
capture gen newrank = .
forval casenum = 1/ `=_N' {
    capture drop xscore xrank
    clonevar xscore = cumulative_score
    replace xscore = cumulative_perfect_score in `casenum'
    egen xrank = rank(xscore), field
    replace newrank = xrank in `casenum'
}
list athlete attempt cumulative_score rank cumulative_perfect_score newrank

. list athlete attempt cumulative_score rank cumulative_perfect_score newrank

     +----------------------------------------------------------+
     | athlete   attempt   ~e_score   rank   c~perf~e   newrank |
     |----------------------------------------------------------|
  1. |       a         1      34.08     10         48         1 |
  2. |       b         1      32.64     11         48         1 |
  3. |       c         1      36.21      8         51         1 |
  4. |       d         1       40.2      4         60         1 |
  5. |       e         1      31.05     12         45         2 |
     |----------------------------------------------------------|
  6. |       f         1      42.84      2         63         1 |
  7. |       g         1       38.4      5         48         1 |
  8. |       h         1      35.52      9         48         1 |
  9. |       i         1      42.66      3         54         1 |
 10. |       j         1       45.9      1         54         1 |
     |----------------------------------------------------------|
 11. |       k         1       36.9      7         45         2 |
 12. |       l         1      37.44      6         48         1 |
     +----------------------------------------------------------+

There is probably a simpler way to do this. But before bothering, are you sure it is worth it? Even after only one attempt, the worst perfect performer is ranked 2nd. After 2 or 3 attempts, it wouldn't surprise me if every perfect performer is ranked #1 (assuming you want to use cumulative scores). Even somebody who is doing pretty good with very difficult stuff is going to have a hard time competing with somebody who is perfect on easier things.

In other words, I strongly suspect somebody who is perfect on every attempt will wind up ranked #1. So, unless you have some people who are only doing incredibly easy stuff, I am not sure that this is worth it.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment