Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to generate the fractional rank for a variable

    I have an unbalanced panel data of stocks.
    permno: uniquely identify each stock
    yrm: date variable, such as 1998m1,1998m2,...
    x: some stock characteristics, e.g. market capitalization
    excd: exchange code, this stock is NYSE stock, or non-NYSE stock (NYSE: New York Stock Exchange)
    shcd: share code, such as 10,11,12,14,18,30,31,32....
    What I want to do is to get : for each month, get each stock's x's percentile in the distribution of all NYSE stocks with share codes of 10 or 11.

    need to get a variable called xpt, which contains a stock's x rank percentile, say if for one stock, in 1998m1, its x percentile is 0.7 if it is the 70th percentile of x distribution of all NYSE stocks with share codes of 10 or 11.

    Can you help me with generating this variable?
    Thanks a lot!


  • #2
    I recommend Philippe Van Kerm's fracrank package (bundled with sgini)
    Code:
    net install sgini, from("http://medim.ceps.lu/stata") replace
    help fracrank
    fracrank generates a “fractional rank” variable, which is essentially the empirical CDF (ranges between zero and one), but with appropriate treatment of ties, so that the expected value is 0.5

    Comment


    • #3
      I use the following code:

      bysort yrm: fracrank x , gen(pct_x)

      it gives me the error msg:
      fracrank may not be combined with by
      r(190);

      Comment


      • #4
        The procedures described in http://www.stata.com/support/faqs/st...ons/index.html are perfectly compatible with by:

        Note that this FAQ is cited in the documentation for fracrank.

        What I imagine you want follows from first principles:

        Code:
        bysort yrm : egen rank = rank(x)
        by yrm : egen N = count(x)
        gen pct_x = (rank - 0.5) / N


        Comment


        • #5
          Originally posted by Nick Cox View Post
          The procedures described in http://www.stata.com/support/faqs/st...ons/index.html are perfectly compatible with by:

          Note that this FAQ is cited in the documentation for fracrank.

          What I imagine you want follows from first principles:

          Code:
          bysort yrm : egen rank = rank(x)
          by yrm : egen N = count(x)
          gen pct_x = (rank - 0.5) / N

          Is it possible to incorporate weight into your example code above (to calculate fractional rank by group and taking into account weight)?

          Comment


          • #6
            No; that all hinges on values being equally weighted (so that weights can be ignored). The generalisation to variable weights as well I take to be as in this example:


            [CODE]
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input float(value weight)
            1 100
            2 200
            3 300
            6 600
            8 800
            end

            gen group = 1

            bysort group (value) : gen double work = sum(value * weight)
            by group : gen rank = (work - (value * weight) / 2) / work[_N]

            list


            +--------------------------------------------+
            | value weight group work rank |
            |--------------------------------------------|
            1. | 1 100 1 100 .00438596 |
            2. | 2 200 1 500 .02631579 |
            3. | 3 300 1 1400 .08333333 |
            4. | 6 600 1 5000 .28070175 |
            5. | 8 800 1 11400 .71929825 |
            +--------------------------------------------+

            [CODE]

            Although the example is written for one group, the code should work for several.

            The midpoint rule here goes back to Francis Galton in one sense. For much discussion and several references, see the help of distplot (Stata Journal).

            Here the example is deliberately lop-sided to drive home the principle. Illustration: If 6400 / 11400 of the weighted total belongs to the highest value, then that highest value accounts for

            Code:
            . di 6400 / 11400
            .56140351
            0.561 of the weighted cumulative probability and the midpoint of that interval thus lies half of that, almost 0.281, below 1, as checks out above in a fractional rank of 0.719.

            Notice that fractional ranks of 0 and 1 are unattainable with this rule, but it is a rule (the only rule?) that treats the weighted distribution symmetrically.


            Comment


            • #7
              #6 is not general enough to cope with tied values. Here is an untested sketch.


              Code:
              bysort group (value) : gen double work = sum(value * weight)
              bysort group value: replace work = work[_N] 
              bysort group value: replace weight = sum(weight) 
              bysort group value: replace weight = weight[_N] 
              by group : gen rank = (work - (value * weight) / 2) / work[_N]

              Comment


              • #8
                Originally posted by Nick Cox View Post
                #6 is not general enough to cope with tied values. Here is an untested sketch.


                Code:
                bysort group (value) : gen double work = sum(value * weight)
                bysort group value: replace work = work[_N]
                bysort group value: replace weight = sum(weight)
                bysort group value: replace weight = weight[_N]
                by group : gen rank = (work - (value * weight) / 2) / work[_N]
                Thank you Nick. Just to clarify, the rank generated from your code is (conceptually) different from the fractional rank generated by using fracrank (as cited by Professor Jenkins above), right? I tried fracrank and found different results

                Code:
                clear
                input float(value weight)
                1 100
                2 200
                3 300
                6 600
                8 800
                end
                
                gen group = 1
                
                fracrank value,gen(fracrank)
                
                fracrank value [w=weight],gen(fracrankw)
                (frequency weights assumed)
                
                list
                
                     +-----------------------------------------------------------------+
                     | value   weight   group    work       rank   fracrank   fracra~w |
                     |-----------------------------------------------------------------|
                  1. |     1      100       1     100    .004386         .1       .025 |
                  2. |     2      200       1     500   .0263158         .3         .1 |
                  3. |     3      300       1    1400   .0833333         .5       .225 |
                  4. |     6      600       1    5000   .2807018         .7        .45 |
                  5. |     8      800       1   11400   .7192982         .9         .8 |
                     +-----------------------------------------------------------------+

                Comment


                • #9
                  Surely, as help fracrank explains: its results are scaled to ensure that average fractional rank is 0.5, which is nowhere part of my code. If you want that, you should surely use fracrank. (I get 0.48 as an average. but I have not read all the documentation to understand.)

                  Comment


                  • #10
                    Originally posted by Nick Cox View Post
                    Surely, as help fracrank explains: its results are scaled to ensure that average fractional rank is 0.5, which is nowhere part of my code. If you want that, you should surely use fracrank. (I get 0.48 as an average. but I have not read all the documentation to understand.)
                    Thank you! fracrank suits my purpose, but it is too slow when the data is large.

                    Comment


                    • #11
                      Originally posted by Stephen Jenkins View Post
                      I recommend Philippe Van Kerm's fracrank package (bundled with sgini)
                      Code:
                      net install sgini, from("http://medim.ceps.lu/stata") replace
                      help fracrank
                      fracrank generates a “fractional rank” variable, which is essentially the empirical CDF (ranges between zero and one), but with appropriate treatment of ties, so that the expected value is 0.5
                      When using fracrank on data with sampling weight, should I incorporate weight both when creating the ranks and when using the rank variable in the subsequent analysis, or should I first create the fractional ranks without using weights? Thank you!

                      Comment

                      Working...
                      X