Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Ranking variables with Tying and between Groups

    Hello,
    I'm trying to rank companies by their profits. This is the kind of result I'm trying to create:
    company year avg_profit rank
    honda 1970 10 1
    honda 1971 10 1
    toyota 1970 10 1
    toyota 1971 10 1
    ford 1970 11 2
    ford 1972 11 2
    I've tried:
    Code:
    sort avg_profit company 
    egen rank = rank(avg_profit), by(company) track
    But of course this doesn't work because the by(company) option means that each company is its own 1st rank. If skip the "by(company)" option, then the 2nd rank isn't 2, but whatever row number the next rank occurs (so for example in a large dataset, something like 1020). If I use "unique" rather than "track," then the rank just corresponds to the row number, and "field" doesn't work either.

    How do I get the correct ranks so that they are both tied (i.e. multiple companies can rank in 1st position) and grouped by company?

    Related posts:
    - https://www.statalist.org/forums/for...he-rank-option
    - https://www.stata.com/statalist/arch.../msg00581.html

    Thank you!

  • #2
    Your description is still a bit unclear to me. How does year factor into the problem? Are you ranking companies by average profit for each year, or overall?

    Also, please read the FAQ, especially on how to provide a reproducible data example (-help dataex-). When asking for help, it's more likely someone may be inclined to help if there is a -dataex- that can be copied into Stata and use.

    Comment


    • #3
      Hello, the year is more or less irrelevant. It's average profits across the time of years. Let me see if I can make a good data example to use.

      Comment


      • #4
        Code:
        sort avg_profit
        gen wanted = sum(avg_profit != avg_profit[_n-1])
        produces the results you show in your example.

        But I do not understand what you mean here by "grouped by company," so I fear this will not work with a full data set. The only way I can understand that phrase would produce precisely what you say you don't want: each company, since the average profit never changes in the example data, is always ranked 1 in its own group.

        Added: crossed with #2 and #3.

        Comment


        • #5
          Hi Clyde,

          Thank you for your reply. I wasn't sure how to phrase it - I just meant that the rank should apply to the company -- that it is the company's rank relative to other companies that I care about.

          When there are ties, is there a way to count how many ties there are per rank? For example in this case, both honda and toyota are in rank 1. If I had 100 companies in total and I wanted to list the 30 most profitable, what would be the quickest way to get this list? (maybe there is also a different way to use it aside from using rank)

          Let me know if this data extract is helpful. I can also provide a larger one.

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input str6 company int year byte(yearly_profit avg_profit)
          "honda " 1970  9 10
          "honda " 1971 10 10
          "honda " 1972 11 10
          "toyota" 1970  8 10
          "toyota" 1971 12 10
          "toyota" 1972 10 10
          "ford "  1970 15 21
          "ford "  1971 18 21
          "ford "  1972 30 21
          end

          Comment


          • #6
            Hi Clyde,
            The code you provided worked - thank you! All companies with the same avg profits have the same rank. Perhaps my second question above would need to be a separate thread - I don't want to disobey forum rules.
            Thank you very much.

            Comment


            • #7
              Again, you are not being completely clear in your wording. I'm going to assume you mean you want a count of how many different companies there are per rank. That will be different from the number of ties per rank, because each company also ties with itself in different years in this data.

              Code:
              by rank (company), sort: gen companies_tied_this_rank = sum(company != company[_n-1])
              by rank (companies_tied_this_rank), sort: replace companies_tied_this_rank ///
                  = companies_tied_this_rank[_N]
              As for the list of the 30 most profitable companies, probably it cannot be done. The reason is that if you list the companies in rank order, there will likely be ties for 30th position. Any arbitrary choice to include some of them and exclude others would be misleading. It could only be done in the event that there were no such ties at 30th position--which, given the nature of your data, seems improbable.

              Comment


              • #8
                My apologies for the lack of clarity, I will try to improve my formulations. Both of the codes you provided were interesting so I'm definitely learning. What I'm asking about now is something like how the rankings are displayed for these universities: https://www.topuniversities.com/univ...-rankings/2022 . Here, Stanford and Cambridge are tied for 3rd. But rather than being ranked 4th, Harvard is ranked 5th (since there are 4 universities that are better).

                Comment


                • #9
                  Code:
                  egen flag = tag(company)
                  egen wanted = rank(avg_profit) if flag, track
                  by company (wanted), sort: replace wanted = wanted[1]

                  Comment


                  • #10
                    Thank you, Clyde. That worked!

                    Comment

                    Working...
                    X