how to generate the fractional rank for a variable

Jessica Guo

Join Date: Feb 2015

Posts: 31
#1

how to generate the fractional rank for a variable

12 May 2015, 21:40

I have an unbalanced panel data of stocks.
permno: uniquely identify each stock
yrm: date variable, such as 1998m1,1998m2,...
x: some stock characteristics, e.g. market capitalization
excd: exchange code, this stock is NYSE stock, or non-NYSE stock (NYSE: New York Stock Exchange)
shcd: share code, such as 10,11,12,14,18,30,31,32....
What I want to do is to get : for each month, get each stock's x's percentile in the distribution of all NYSE stocks with share codes of 10 or 11.

need to get a variable called xpt, which contains a stock's x rank percentile, say if for one stock, in 1998m1, its x percentile is 0.7 if it is the 70th percentile of x distribution of all NYSE stocks with share codes of 10 or 11.

Can you help me with generating this variable?
Thanks a lot!
Tags: None
Stephen Jenkins

Join Date: Apr 2014

Posts: 1433
#2

13 May 2015, 02:00

I recommend Philippe Van Kerm's fracrank package (bundled with sgini)

Code:

net install sgini, from("http://medim.ceps.lu/stata") replace help fracrank

fracrank generates a “fractional rank” variable, which is essentially the empirical CDF (ranges between zero and one), but with appropriate treatment of ties, so that the expected value is 0.5
Comment
Jessica Guo

Join Date: Feb 2015

Posts: 31
#3

20 May 2015, 08:58

I use the following code:

bysort yrm: fracrank x , gen(pct_x)

it gives me the error msg:
fracrank may not be combined with by
r(190);
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35664
#4

20 May 2015, 09:21

The procedures described in http://www.stata.com/support/faqs/st...ons/index.html are perfectly compatible with by:

Note that this FAQ is cited in the documentation for fracrank.

What I imagine you want follows from first principles:

Code:

bysort yrm : egen rank = rank(x) by yrm : egen N = count(x) gen pct_x = (rank - 0.5) / N
Comment
shem shen

Join Date: Mar 2016

Posts: 136
#5

07 Oct 2020, 13:18

Originally posted by Nick Cox View Post

The procedures described in http://www.stata.com/support/faqs/st...ons/index.html are perfectly compatible with by:

Note that this FAQ is cited in the documentation for fracrank.

What I imagine you want follows from first principles:

Code:

bysort yrm : egen rank = rank(x) by yrm : egen N = count(x) gen pct_x = (rank - 0.5) / N

Is it possible to incorporate weight into your example code above (to calculate fractional rank by group and taking into account weight)?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35664
#6

07 Oct 2020, 15:14

No; that all hinges on values being equally weighted (so that weights can be ignored). The generalisation to variable weights as well I take to be as in this example:

[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(value weight)
1 100
2 200
3 300
6 600
8 800
end

gen group = 1

bysort group (value) : gen double work = sum(value * weight)
by group : gen rank = (work - (value * weight) / 2) / work[_N]

list

+--------------------------------------------+
| value weight group work rank |
|--------------------------------------------|
1. | 1 100 1 100 .00438596 |
2. | 2 200 1 500 .02631579 |
3. | 3 300 1 1400 .08333333 |
4. | 6 600 1 5000 .28070175 |
5. | 8 800 1 11400 .71929825 |
+--------------------------------------------+

[CODE]

Although the example is written for one group, the code should work for several.

The midpoint rule here goes back to Francis Galton in one sense. For much discussion and several references, see the help of distplot (Stata Journal).

Here the example is deliberately lop-sided to drive home the principle. Illustration: If 6400 / 11400 of the weighted total belongs to the highest value, then that highest value accounts for

Code:

. di 6400 / 11400 .56140351

0.561 of the weighted cumulative probability and the midpoint of that interval thus lies half of that, almost 0.281, below 1, as checks out above in a fractional rank of 0.719.

Notice that fractional ranks of 0 and 1 are unattainable with this rule, but it is a rule (the only rule?) that treats the weighted distribution symmetrically.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35664

07 Oct 2020, 15:23

#6 is not general enough to cope with tied values. Here is an untested sketch.

Code:

bysort group (value) : gen double work = sum(value * weight)
bysort group value: replace work = work[_N] 
bysort group value: replace weight = sum(weight) 
bysort group value: replace weight = weight[_N] 
by group : gen rank = (work - (value * weight) / 2) / work[_N]

Comment

shem shen

Join Date: Mar 2016
Posts: 136

07 Oct 2020, 15:30

Originally posted by Nick Cox View Post

#6 is not general enough to cope with tied values. Here is an untested sketch.

Code:

bysort group (value) : gen double work = sum(value * weight)
bysort group value: replace work = work[_N]
bysort group value: replace weight = sum(weight)
bysort group value: replace weight = weight[_N]
by group : gen rank = (work - (value * weight) / 2) / work[_N]

Thank you Nick. Just to clarify, the rank generated from your code is (conceptually) different from the fractional rank generated by using fracrank (as cited by Professor Jenkins above), right? I tried fracrank and found different results

Code:

clear
input float(value weight)
1 100
2 200
3 300
6 600
8 800
end

gen group = 1

fracrank value,gen(fracrank)

fracrank value [w=weight],gen(fracrankw)
(frequency weights assumed)

list

     +-----------------------------------------------------------------+
     | value   weight   group    work       rank   fracrank   fracra~w |
     |-----------------------------------------------------------------|
  1. |     1      100       1     100    .004386         .1       .025 |
  2. |     2      200       1     500   .0263158         .3         .1 |
  3. |     3      300       1    1400   .0833333         .5       .225 |
  4. |     6      600       1    5000   .2807018         .7        .45 |
  5. |     8      800       1   11400   .7192982         .9         .8 |
     +-----------------------------------------------------------------+

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35664
#9

07 Oct 2020, 15:41

Surely, as help fracrank explains: its results are scaled to ensure that average fractional rank is 0.5, which is nowhere part of my code. If you want that, you should surely use fracrank. (I get 0.48 as an average. but I have not read all the documentation to understand.)
Comment
shem shen

Join Date: Mar 2016

Posts: 136
#10

07 Oct 2020, 16:19

Originally posted by Nick Cox View Post

Surely, as help fracrank explains: its results are scaled to ensure that average fractional rank is 0.5, which is nowhere part of my code. If you want that, you should surely use fracrank. (I get 0.48 as an average. but I have not read all the documentation to understand.)

Thank you! fracrank suits my purpose, but it is too slow when the data is large.
Comment
shem shen

Join Date: Mar 2016

Posts: 136
#11

26 Dec 2022, 20:43

Originally posted by Stephen Jenkins View Post

I recommend Philippe Van Kerm's fracrank package (bundled with sgini)

Code:

net install sgini, from("http://medim.ceps.lu/stata") replace help fracrank

fracrank generates a “fractional rank” variable, which is essentially the empirical CDF (ranges between zero and one), but with appropriate treatment of ties, so that the expected value is 0.5

When using fracrank on data with sampling weight, should I incorporate weight both when creating the ranks and when using the rank variable in the subsequent analysis, or should I first create the fractional ranks without using weights? Thank you!
Comment

Announcement