Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • RANK.AVG equivalent

    Hi,

    I wonder whether there is something equivalent to Excel's RANK.AVG function in stata?
    I only found regular egen =rank() function

    thank you

    C.


  • #2
    Please explain what it does precisely for the benefit of people who don't use Excel.

    Comment


    • #3
      Originally posted by Nick Cox View Post
      Please explain what it does precisely for the benefit of people who don't use Excel.
      You right, apologies.

      RANK.AVG: Returns the rank of a number in a list of numbers: its size relative to other values in the list; if more than one value has the same rank (A TIE), the average rank is returned.

      Closest stata command is egen = rank(), but it has no correction for A TIE, like in rank.avg

      Comment


      • #4
        The default of that function does adjust for ties: check e.g.

        Code:
        sysuse auto, clear 
        egen rank = rank(mpg)
        tabdisp mpg, c(rank)

        Comment


        • #5
          Originally posted by Nick Cox View Post
          The default of that function does adjust for ties: check e.g.

          Code:
          sysuse auto, clear
          egen rank = rank(mpg)
          tabdisp mpg, c(rank)
          You right, I missed this in the description. However, two problems:
          1. I am using "field" option as I need the highest value to be ranked 1. The default ranks the smallest value as #1
          2. I run your code, here is the output (top 4):
          -------------------------------
          (mpg) | rank of (mpg)
          ----------+--------------------
          12 | 1.5
          14 | 5.5
          15 | 9.5
          16 | 12.5

          All numbers are different, i.e. there are no equal observations and no averaging suppose to happen.
          But this does not happen. I expected the ranks to be 1, 2, 3, 4 in the above case and 1, 2.5, 2.5, 4 in the following case:

          -------------------------------
          (mpg) | rank of (mpg)
          ----------+--------------------
          12 | 1
          14 | 2.5
          14 | 2.5
          16 | 4

          Comment


          • #6
            I think you're misinterpreting tabdisp, which shows distinct values but not their frequencies unless that is a supplied variable.

            See in conjunction with

            Code:
            tabulate mpg

            Comment


            • #7
              Originally posted by Nick Cox View Post
              I think you're misinterpreting tabdisp, which shows distinct values but not their frequencies unless that is a supplied variable.

              See in conjunction with

              Code:
              tabulate mpg
              yes, no I see, thanks. but how i can get the averaging and "field" ranking together? i.e. ranking the largest value first?

              thanks!

              Comment


              • #8
                If you specify -field- tied values get the same rank, just as with other options.

                Comment


                • #9
                  You right, once again. I checked it with the auto dataset. What confused me is the description of the rank() function. It says (at least this is how I understood it) that there no averaging
                  The field option calculates the field rank of exp: the highest value is ranked 1, and there is no correction for ties. That is, the field rank is 1 + the number of values that are higher.

                  Comment


                  • #10
                    Actually, as I can see at the second look - there is no averaging, when -field- is used, it just keeps the same rank. Not sure if it matters, but it does affect the actual values in my later calculations based on rank - rank value is used in the calculations and not just for ranking
                    Last edited by Constantin Alba; 14 Sep 2016, 08:46.

                    Comment


                    • #11
                      I am puzzled on what you want here as your desiderata are quite contradictory: you can't insist that ranks are as low as possible and also that the average rank is preserved. The point of the variant options, which mostly go back to code Richard Goldstein and I wrote in 1999, is that one desideratum or another may be key in particular problems, even insisting on unique ranks (which makes sense for various graphs).

                      Comment


                      • #12
                        Originally posted by Nick Cox View Post
                        I am puzzled on what you want here as your desiderata are quite contradictory: you can't insist that ranks are as low as possible and also that the average rank is preserved. The point of the variant options, which mostly go back to code Richard Goldstein and I wrote in 1999, is that one desideratum or another may be key in particular problems, even insisting on unique ranks (which makes sense for various graphs).
                        I am looking to do -field- ranking as it is currently performed by stata, but with one small difference, instead of keeping the same rank for equal observations, average the rank among them. Example:

                        current rank (mpg) , field output:

                        -------------------------------
                        (mpg) | rank of (mpg)
                        ----------+--------------------
                        16 | 1
                        14 | 2
                        14 | 2
                        12 | 4


                        desired:

                        -------------------------------
                        (mpg) | rank of (mpg)
                        ----------+--------------------
                        16 | 1
                        14 | 2.5
                        14 | 2.5
                        12 | 4

                        2.5 rank was calculated as an average of 2 and 3: (rank 2 + rank 3) /2


                        At the next stage I use rank as an input to a formula to calculate a concentration index 1 / [ 2* Sigma (rank * ranked_var) - 1]

                        Comment


                        • #13
                          this looks to me like the reverse of the default method; so, I would use the default method and then, for the example shown, reverse by subtracting each value from 5:
                          Code:
                          . input x
                          
                                       x
                            1. 16
                            2. 14
                            3. 14
                            4. 12
                            5. end
                          
                          egen rank=rank(x)
                          replace rank=5-rank
                          . li x rank, clean
                          
                                  x   rank  
                            1.   16      1  
                            2.   14    2.5  
                            3.   14    2.5  
                            4.   12      4

                          Comment


                          • #14
                            I see; you just want ranks reversed. Rich's solution is fine; here's another one. it's explicit in the help that you can rank expressions (not just variables) and explicit in the manual entry

                            http://www.stata.com/manuals14/degen.pdf

                            that

                            Most applications of rank() will be to one variable, but the argument exp can be more general, namely, an expression. In particular, rank(-varname) reverses ranks from those obtained by rank(varname).
                            Thus, this works in the example given:

                            Code:
                            input x
                            16
                            14
                            14
                            12
                            end
                            egen rank = rank(-x)
                            li x rank, clean

                            Comment


                            • #15
                              Thank you, Rich, your solution does work, i considered it before, but for some reason thought it won't work.

                              Nick, thank you for your patience in responses, your solution is just great. Simple yet elegant

                              Comment

                              Working...
                              X