Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to rank a variable in Stata?

    Hi all,

    I have a panel data that has the following variables: 'firm id'.......'year'.......'Carhart alpha'.

    So, for example, I have 100 firms and each has observations from 2000-2013. Every year, I want to assign a fractional rank ranging from 0 to 1 to each fund based on the fund’s alpha.
    Thus I want to create a new variable, call it 'Rank', where I can store these ranks.

    I would like to ask - how can I perform this in Stata?

    Let me, please, introduce a quotation from a paper (this is actually what I want to do):
    "The fractional rank at time t for fund i in the bottom performance quintile is defined as LOW(i,t) = Min(Rank(i,t), 0.2), in the three medium performance quintiles as MID(i,t) = Min(0.6, Rank(i,t) − LOWi,t), and in the top performance quintile as HIGH(i,t) = Rank(i,t) − MIDi,t − LOWi,t ...
    ... where Ranki,t is fund i’s performance percentile".


    I appreciate any help on this issue.




  • #2
    Can you share example working data? It's best if you use -input- as in:
    Code:
    input ///
    variable names
    first line of data
    second line of data
    .
    .
    .
    last line of data
    end
    See -help input- for details on how to do this.

    I also suggest formatting the text as necessary using the advanced editor button A, in the top-right corner of the editor.

    What's the relation between "Carhart alpha" and Rank_it ?
    You should:

    1. Read the FAQ carefully.

    2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

    3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

    4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.

    Comment


    • #3
      Hello Roberto,

      Basically, I would like the following:
      1. I have a variable that contains firms' abnormal returns (alphas).
      2. I want to create another variable, where I would have ranking scores for each fund over a certain year. In other words, every year each fund would be allocated to a certain rank (from 0 to 1) according to its alpha.
      3. Given these ranks I would be able to see top-performers and low-performers within a year and then perform further analysis.

      Here is the sample of my dataset (in fact the dataset is extremely large):
      Attached Files

      Comment


      • #4
        given 100 companies and that the alpha is called alpha:
        Code:
        sort year alpha
        by year: gen alpharank=_n/100

        Comment


        • #5
          Sorry Ben, could you please explain your code?

          I am really confused with that.

          In fact, the number of firms in my dataset equals 953; time period is 2000-2013. That's, each fund has alpha in each year.

          Comment


          • #6
            it sorts cases so that all in the same year are together, and within year, the lowest alphas are first, highest alphas last.
            Then within that, the "by year" takes their internally-generated case number (first case in year 2000, _n=1, second case, _n=2... first case in year 2001, _n=1, second _n=1) and then divides by the total # of cases in that year, which I had set to 100.

            Actually a more generalizable code would be:
            Code:
            sort year alpha
            by year: gen alpharank=_n/_N
            Last edited by ben earnhart; 01 Aug 2014, 15:42. Reason: code didn't format right.

            Comment


            • #7
              Are there ties in the data? If so that might require some code tweaking. It is hard to check things on my iPhone but egen has a rank fnc that I think has some options for handling ties.
              -------------------------------------------
              Richard Williams, Notre Dame Dept of Sociology
              StataNow Version: 19.5 MP (2 processor)

              EMAIL: [email protected]
              WWW: https://www3.nd.edu/~rwilliam

              Comment


              • #8
                Richard has a good point. Could use
                Code:
                egen alpharank=rank(alpha), by(year) unique
                by year: replace alpharank=alpharank/_N
                which might handle ties better.
                I figured keep it simple; ties are unlikely with eight (or maybe more) significant digits, and not sure he cares about ties.

                Comment


                • #9
                  To be honest I am new to Stata, just started to use this software about a couple weeks ago. Thus I have to ask - what are the ties you mentioned above? Does it refers to correlation?

                  By the way, I have tried the code you proposed above and it ranked my firms from 0 to 1 within each year. But I also want to figure out what are the ties, maybe I really need to use the last code.

                  So please could you explain that to me.

                  Comment


                  • #10
                    Ties occur when multiple cases have the exact same value, e.g. if the values were

                    1
                    2
                    3
                    3
                    3
                    4
                    5

                    Three cases have the same value of 3. If you used Ben's original approach, then the first 3 would be ranked 3, the next 3 would be ranked 4th, and the fifth 3 would be ranked fifth, even though they have the exact same value. If there are no ties, this is not an issue. If there are ties, you have to figure out how you want to break the ties. Or else maybe just assign each tied value the same rank
                    -------------------------------------------
                    Richard Williams, Notre Dame Dept of Sociology
                    StataNow Version: 19.5 MP (2 processor)

                    EMAIL: [email protected]
                    WWW: https://www3.nd.edu/~rwilliam

                    Comment


                    • #11
                      Ties would simply be situations where (within the same year) the alpha for company 1 has the exact same value as company 2. In my original code, I think it would give a higher or lower ranking based whatever the data had been sorted on last before the year and apha sort. In the code Richard suggested, it will randomly pick one over the other at computation time. Given that your alpha variable has so many digits, the odds of a tie within a given year are pretty small, so I don't think I'd worry too much about it. What to do with ties is very important when you only have a small range of values relative to the # of cases, but here, should be a non-issue. I actually ran my code against 1,300 cases (100 per year), and the two approaches did the same thing, correlated at 1.0000. You can try both, and Richard's is slightly preferable.

                      Comment


                      • #12
                        Oh, now I understand what does this mean.
                        I think this not the case in my dataset since alpha values have 5-7 decimals. Here is an example from my sample:

                        -6.29765
                        -.9554463
                        2.159768
                        7.610835
                        -4.91084
                        11.75322
                        .0445696
                        -1.896034

                        Although there is approx. 400-500 observations (alphas) per year in the dataset, I reckon it is impossible (hope so) to have the same values.
                        Hence, as I understand I can use Ben's original code for ranking.


                        I really appreciate the help you guys provide here to all people! This is amazing to get advices from specialists. Thank you!

                        Comment


                        • #13
                          Ah! Just to be clear, my "original" (first post) code assumed full data with exactly 100 companies per year. The later posts, divide by _N (# cases in that year) instead of a constant, are better since you don't have the same # of cases per year.

                          Comment


                          • #14
                            Looks like Ben has nailed it. I mostly provided a distraction by suggesting a problem that almost certainly does not exist. ;-)
                            -------------------------------------------
                            Richard Williams, Notre Dame Dept of Sociology
                            StataNow Version: 19.5 MP (2 processor)

                            EMAIL: [email protected]
                            WWW: https://www3.nd.edu/~rwilliam

                            Comment


                            • #15
                              Couldn't the existence of ties be checked? Something like:
                              Code:
                              clear all
                              set more off
                              
                              *----- example data -----
                              
                              set obs 1000
                              
                              set seed 296
                              gen double alpha = runiform()*10
                              
                              *----- check alpha -----
                              
                              bysort alpha: gen counter = _N
                              summarize counter, meanonly
                              
                              assert r(sum) == _N
                              You should:

                              1. Read the FAQ carefully.

                              2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

                              3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

                              4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.

                              Comment

                              Working...
                              X