Create a rank variable

Sakib Anwar

Join Date: Feb 2019

Posts: 2
#1

Create a rank variable

25 Feb 2019, 09:55

mpg rank

12 1

12 1

14 2

14 2

14 2

15 3

15 3

16 4

17 5

I want to create a rank like the one in the table above. I tried using the default rank command. None of the commands gives me the above ranking.
Tags: egen, rank, variables
Nick Cox

Join Date: Mar 2014

Posts: 35708
#2

25 Feb 2019, 10:10

I guess you're referring to the rank functionality of egen. You're correct that egen doesn't regard this numbering as a kind of ranking. The essence of ranking is to count how many observations have higher or lower values, and this method fails at showing that except in some limiting cases.

That said, you should consider

Code:

egen rank = group(mpg)

as a way to get what you want.
2 likes
Comment
Sakib Anwar

Join Date: Feb 2019

Posts: 2
#3

25 Feb 2019, 11:18

This is exactly what I need! Thanks a lot!
1 like
Comment
Matilde Maggi

Join Date: Oct 2021

Posts: 4
#4

18 Oct 2021, 10:05

dear all,

I should create a rank variable using data currently in the long format:

Pid; rank; country
1 1 NZ
1 2 AU
1 3 CA
1 4 US
1 5 TH
2 1 US
2 2 GB
2 3 IT
2 4 DE
2 5 GR
3 1 DO
3 2 AW
3 3 AU
3 4 ES
3 5 CH
4 1 ES
4 2 GB
4 3 US
4 4 TH
4 5 CK
5 1 CA
5 2 JP
5 3 AU
5 4 GB
5 5 DK
6 1 AL
6 2 DE

for every individual, I would obtain an ordered variable (from most preferred, value 1 in the rank variable, to least preferred, value 5 in the rank variable) of the destinations according to the ranking. Before reshaping I've tried several alternatives. Including commands introduced by Nick. But none is working for this specific case, I can't figure out how to do this and the clock is ticking...can anyone help?

Thank you in advance!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35708
#5

18 Oct 2021, 11:37

Sorry, but I don't,understand what you're asking in #4. Your data example seems to show a variable that already is the rank. So, what is to be calculated.

I've tried several alternatives. Including commands introduced by Nick. But none is working for this specific case I can't figure out how to do this and the clock is ticking

There is nothing there to comment on. This is just "My code is not working". Sympathy is natural, but a reply with substance is impossible.
Comment
Matilde Maggi

Join Date: Oct 2021

Posts: 4
#6

18 Oct 2021, 13:16

thank you for this Nick, I wanted to be synthetic and not generic but I did not succeed!

Each individual in my sample has answered a question about a preferred destination among the countries in the world.
Each Pid ranked 5 countries among all possible destinations:
1st most preferred, 2nd most preferred, 3rd most preferred, and so on.
Now I have a dataset that for each individual has this information in this form:

I need a variable ordering the preferences in a rank the destination preferences for each individual. Hence an ordered variable Xi that takes 5 values. So as instance should be for Pid =1
X1 = (1 = UK, 2=USA, 3 =NZ, 4=IT, 5=AU)

In the post above I reported my data in long format. In fact unable to find a solution, I've done the following:

keep Pid first_prefered_world second_prefered_world third_prefered_world fourth_prefered_world fifth_prefered_world
rename first_prefered_world q0_1
rename second_prefered_world q0_2
rename third_prefered_world q0_3
rename fourth_prefered_world q0_4
rename fifth_prefered_world q0_5

sort Pid
reshape long q0_, i(Pid) j(rank)

rename q0_ country

However, now I'm still unable to associate the values of the country answer with their respective rank position in the ordered preference of each individual. I still don't know how to obtain, eg for Pid =1 X1 = (1 = UK, 2=USA, 3 =NZ, 4=IT, 5=AU)

I hope this is clear and some of you can help me!
Comment
Matilde Maggi

Join Date: Oct 2021

Posts: 4
#7

18 Oct 2021, 15:45

to follow up wrt the post above (#6), I would need a variable that for Pid =1 is valued X1 = (1 = NZ, 2=AU, 3 =CA 4=US, 5=TH), for Pid=2 X2 = (1 = US, 2=GB, 3 =IT, 4=DE, 5=GR), Pid=3 X3 = (1 = DO, 2=AW, 3 =AU, 4=ES, 5=CH), and so on.
I have tried different methods but I can't manage to figure it out, I hope is not under my eyes and I couldn't see it.
Thank you!

Matilde
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2403
#8

18 Oct 2021, 15:55

This is still very obscure, and without a clear explanation using words (not code), you are unlikely to get any substantive help.

You appear to already have data in long format which contains each person's 5 most preferred destinations which are already ranked 1 to 5. Then you talk about a reshape to wide format, which itself is straightforward, and also preserves ranks. So, please provide a clear description of what you have (if different) and what you want.
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#9

18 Oct 2021, 15:56

I don't understand how what you want differs from what you have in the variable -rank- you created in #6. It seems to me that variable, which you created yourself, is exactly what you are asking for.

Added: Crossed with #8, which expresses the same puzzlement.
1 like
Comment
Matilde Maggi

Join Date: Oct 2021

Posts: 4
#10

18 Oct 2021, 16:22

thank you Clyde and Leonardo for trying to help me,

I need a variable that for each individual takes 5 values, corresponding to the 5 countries that represent their preferred destination.
So it does not have to be just the 'rank' variable like in the long format in #6.

I need to associate the rank to the value of the string variable 'country'. So if '' i '' has replied that the US is her first preferred destination, GB the second ... and GR (Greece) is her fifth preferred one,
I need a variable in which Xi = (US =1 , GB =2 , IT= 3, DE=4, GR=5), and not only X=( 1, 2, 3, 4, 5). I need the variable 'country' to be the one ranked for each individual.

I hope this is clearer
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#11

18 Oct 2021, 17:30

But that's what you already have in the starting variables 1stMostPreferred through 5thMost Preferred, right? What am I missing?
Comment

Leonardo Guizzetti

Join Date: Jul 2016
Posts: 2403

#12

18 Oct 2021, 18:09

I'm still puzzled. Perhaps you want this instead? There is one observation per person, and one variable for each country. The values indicate the person's rank preference for that country (or missing if not ranked).

Code:

. reshape wide rank , i(pid) j(country) string
. list pid rankUS rankGB rankIT rankDE rankGR, sep(0)

     +--------------------------------------------------+
     | pid   rankUS   rankGB   rankIT   rankDE   rankGR |
     |--------------------------------------------------|
  1. |   1        4        .        .        .        . |
  2. |   2        1        2        3        4        5 |
  3. |   3        .        .        .        .        . |
  4. |   4        3        2        .        .        . |
  5. |   5        .        4        .        .        . |
  6. |   6        .        .        .        2        . |
     +--------------------------------------------------+

This is some approach, but hardly useful for subsequent analysis compared to the long-format dataset you have, or an alternative reshape, as below.

Code:

. reshape wide country, i(pid) j(rank)

     +------------------------------------------------------------+
     | pid   country1   country2   country3   country4   country5 |
     |------------------------------------------------------------|
  1. |   1         NZ         AU         CA         US         TH |
  2. |   2         US         GB         IT         DE         GR |
  3. |   3         DO         AW         AU         ES         CH |
  4. |   4         ES         GB         US         TH         CK |
  5. |   5         CA         JP         AU         GB         DK |
  6. |   6         AL         DE                                  |
     +------------------------------------------------------------+

Comment

Cristiano Bellavitis

Join Date: Jul 2018

Posts: 31
#13

12 Dec 2022, 07:29

hi guys,

I also need some help with the rank variable. I am trying to create two variables (rank and rank percentiles) and I'd like to group it by month_year. I copy something I have created to test my variable.

I managed to create the rank variable with the code egen rank = rank(data), by(month_year), but I was wondering whether:
1) there is a way to calculate it based on the last 30 moving days (in the past)?
2) how to create a percentile rank

Thank you,

Cristiano

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float(ID data) str11 date float Date str6 month_year float rank 1 50 "1 Jan 2020" 21915 "1_2020" 1 2 40 "4 Feb 2020" 21949 "2_2020" 1 3 70 "7 Jan 2020" 21921 "1_2020" 2 4 55 "15 Feb 2020" 21960 "2_2020" 2 5 65 "17 Feb 2020" 21962 "2_2020" 3 6 80 "18 Jan 2020" 21932 "1_2020" 3 end format %td Date

------------------ copy up to and including the previous line ------------------

Last edited by Cristiano Bellavitis; 12 Dec 2022, 07:33.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35708
#14

12 Dec 2022, 08:08

#13 asks two questions I don't understand.

1) Each observation might have several percentiles depending on which window you are talking about. Consider a simplified example in which observations have values 1 up at times 1 up. Then observation 7 for example holds a value with rank 7 for a window of length 7 or more starting at 1, rank 6 for such a window starting at 2, rank 5 for such a window starting at 3, and so on. So how you do want to hold results?

2) seems to raise the same query.

Disjoint windows are fine, but you already know how to deal with those.
Comment
Cristiano Bellavitis

Join Date: Jul 2018

Posts: 31
#15

12 Dec 2022, 10:38

Originally posted by Nick Cox View Post

#13 asks two questions I don't understand.

1) Each observation might have several percentiles depending on which window you are talking about. Consider a simplified example in which observations have values 1 up at times 1 up. Then observation 7 for example holds a value with rank 7 for a window of length 7 or more starting at 1, rank 6 for such a window starting at 2, rank 5 for such a window starting at 3, and so on. So how you do want to hold results?

2) seems to raise the same query.

Disjoint windows are fine, but you already know how to deal with those.

Hi Nick,
Thank you for your prompt reply. My ideal rank is based on the date.

For example, the first observation has January 1st 2020. I'd like to rank that observation against all the other observations occuring in the prior 30 days, which would be December (no observation here). Observation 6, would be ranked against observation 3 and 1 since they happen in the prior 30 days. And since the value of that observation is 80, it would be the top. It is a moving window.

I hope this clarifies. Below I copy a simplified version.

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float(ID data) str11 date 1 50 "1 Jan 2020" 2 40 "4 Feb 2020" 3 70 "7 Jan 2020" 4 55 "15 Feb 2020" 5 65 "17 Feb 2020" 6 80 "18 Jan 2020" end

------------------ copy up to and including the previous line ------------------

Last edited by Cristiano Bellavitis; 12 Dec 2022, 10:44.
Comment

mpg	rank
12	1
12	1
14	2
14	2
14	2
15	3
15	3
16	4
17	5

Announcement

Create a rank variable

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment