How to rank a variable in Stata?

Lander Seyma

Join Date: Jul 2014

Posts: 19
#1

How to rank a variable in Stata?

01 Aug 2014, 13:18

Hi all,

I have a panel data that has the following variables: 'firm id'.......'year'.......'Carhart alpha'.

So, for example, I have 100 firms and each has observations from 2000-2013. Every year, I want to assign a fractional rank ranging from 0 to 1 to each fund based on the fund’s alpha.
Thus I want to create a new variable, call it 'Rank', where I can store these ranks.

I would like to ask - how can I perform this in Stata?

Let me, please, introduce a quotation from a paper (this is actually what I want to do):
"The fractional rank at time t for fund i in the bottom performance quintile is defined as LOW(i,t) = Min(Rank(i,t), 0.2), in the three medium performance quintiles as MID(i,t) = Min(0.6, Rank(i,t) − LOWi,t), and in the top performance quintile as HIGH(i,t) = Rank(i,t) − MIDi,t − LOWi,t ...
... where Ranki,t is fund i’s performance percentile".

I appreciate any help on this issue.
Tags: None
Roberto Ferrer

Join Date: Apr 2014

Posts: 449
#2

01 Aug 2014, 14:43

Can you share example working data? It's best if you use -input- as in:

Code:

input /// variable names first line of data second line of data . . . last line of data end

See -help input- for details on how to do this.

I also suggest formatting the text as necessary using the advanced editor button A, in the top-right corner of the editor.

What's the relation between "Carhart alpha" and Rank_it ?

You should:

1. Read the FAQ carefully.

2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.
1 like
Comment
Lander Seyma

Join Date: Jul 2014

Posts: 19
#3

01 Aug 2014, 15:03

Hello Roberto,

Basically, I would like the following:
1. I have a variable that contains firms' abnormal returns (alphas).
2. I want to create another variable, where I would have ranking scores for each fund over a certain year. In other words, every year each fund would be allocated to a certain rank (from 0 to 1) according to its alpha.
3. Given these ranks I would be able to see top-performers and low-performers within a year and then perform further analysis.

Here is the sample of my dataset (in fact the dataset is extremely large):
Attached Files

Book (1).xlsx (10.0 KB, 1 view)
Comment
ben earnhart

Join Date: May 2014

Posts: 1027
#4

01 Aug 2014, 15:14

given 100 companies and that the alpha is called alpha:

Code:

sort year alpha by year: gen alpharank=_n/100
1 like
Comment
Lander Seyma

Join Date: Jul 2014

Posts: 19
#5

01 Aug 2014, 15:31

Sorry Ben, could you please explain your code?

I am really confused with that.

In fact, the number of firms in my dataset equals 953; time period is 2000-2013. That's, each fund has alpha in each year.
Comment
ben earnhart

Join Date: May 2014

Posts: 1027
#6

01 Aug 2014, 15:41

it sorts cases so that all in the same year are together, and within year, the lowest alphas are first, highest alphas last.
Then within that, the "by year" takes their internally-generated case number (first case in year 2000, _n=1, second case, _n=2... first case in year 2001, _n=1, second _n=1) and then divides by the total # of cases in that year, which I had set to 100.

Actually a more generalizable code would be:

Code:

sort year alpha by year: gen alpharank=_n/_N

Last edited by ben earnhart; 01 Aug 2014, 15:42. Reason: code didn't format right.
1 like
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4987
#7

01 Aug 2014, 16:07

Are there ties in the data? If so that might require some code tweaking. It is hard to check things on my iPhone but egen has a rank fnc that I think has some options for handling ties.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
1 like
Comment
ben earnhart

Join Date: May 2014

Posts: 1027
#8

01 Aug 2014, 16:29

Richard has a good point. Could use

Code:

egen alpharank=rank(alpha), by(year) unique by year: replace alpharank=alpharank/_N

which might handle ties better.
I figured keep it simple; ties are unlikely with eight (or maybe more) significant digits, and not sure he cares about ties.
1 like
Comment
Lander Seyma

Join Date: Jul 2014

Posts: 19
#9

01 Aug 2014, 16:47

To be honest I am new to Stata, just started to use this software about a couple weeks ago. Thus I have to ask - what are the ties you mentioned above? Does it refers to correlation?

By the way, I have tried the code you proposed above and it ranked my firms from 0 to 1 within each year. But I also want to figure out what are the ties, maybe I really need to use the last code.

So please could you explain that to me.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4987
#10

01 Aug 2014, 16:54

Ties occur when multiple cases have the exact same value, e.g. if the values were

1
2
3
3
3
4
5

Three cases have the same value of 3. If you used Ben's original approach, then the first 3 would be ranked 3, the next 3 would be ranked 4th, and the fifth 3 would be ranked fifth, even though they have the exact same value. If there are no ties, this is not an issue. If there are ties, you have to figure out how you want to break the ties. Or else maybe just assign each tied value the same rank

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
1 like
Comment
ben earnhart

Join Date: May 2014

Posts: 1027
#11

01 Aug 2014, 17:04

Ties would simply be situations where (within the same year) the alpha for company 1 has the exact same value as company 2. In my original code, I think it would give a higher or lower ranking based whatever the data had been sorted on last before the year and apha sort. In the code Richard suggested, it will randomly pick one over the other at computation time. Given that your alpha variable has so many digits, the odds of a tie within a given year are pretty small, so I don't think I'd worry too much about it. What to do with ties is very important when you only have a small range of values relative to the # of cases, but here, should be a non-issue. I actually ran my code against 1,300 cases (100 per year), and the two approaches did the same thing, correlated at 1.0000. You can try both, and Richard's is slightly preferable.
1 like
Comment
Lander Seyma

Join Date: Jul 2014

Posts: 19
#12

01 Aug 2014, 17:21

Oh, now I understand what does this mean.
I think this not the case in my dataset since alpha values have 5-7 decimals. Here is an example from my sample:

-6.29765
-.9554463
2.159768
7.610835
-4.91084
11.75322
.0445696
-1.896034

Although there is approx. 400-500 observations (alphas) per year in the dataset, I reckon it is impossible (hope so) to have the same values.
Hence, as I understand I can use Ben's original code for ranking.

I really appreciate the help you guys provide here to all people! This is amazing to get advices from specialists. Thank you!
Comment
ben earnhart

Join Date: May 2014

Posts: 1027
#13

01 Aug 2014, 17:37

Ah! Just to be clear, my "original" (first post) code assumed full data with exactly 100 companies per year. The later posts, divide by _N (# cases in that year) instead of a constant, are better since you don't have the same # of cases per year.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4987
#14

01 Aug 2014, 17:43

Looks like Ben has nailed it. I mostly provided a distraction by suggesting a problem that almost certainly does not exist. ;-)

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Roberto Ferrer

Join Date: Apr 2014

Posts: 449
#15

01 Aug 2014, 21:55

Couldn't the existence of ties be checked? Something like:

Code:

clear all set more off *----- example data ----- set obs 1000 set seed 296 gen double alpha = runiform()*10 *----- check alpha ----- bysort alpha: gen counter = _N summarize counter, meanonly assert r(sum) == _N

You should:

1. Read the FAQ carefully.

2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.
Comment

Announcement

How to rank a variable in Stata?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment