Ranking variables with Tying and between Groups

John Singer

Join Date: Oct 2021

Posts: 46
#1

Ranking variables with Tying and between Groups

03 Oct 2021, 16:02

Hello,
I'm trying to rank companies by their profits. This is the kind of result I'm trying to create:
company year avg_profit rank

honda 1970 10 1

honda 1971 10 1

toyota 1970 10 1

toyota 1971 10 1

ford 1970 11 2

ford 1972 11 2

I've tried:

Code:

sort avg_profit company egen rank = rank(avg_profit), by(company) track

But of course this doesn't work because the by(company) option means that each company is its own 1st rank. If skip the "by(company)" option, then the 2nd rank isn't 2, but whatever row number the next rank occurs (so for example in a large dataset, something like 1020). If I use "unique" rather than "track," then the rank just corresponds to the row number, and "field" doesn't work either.

How do I get the correct ranks so that they are both tied (i.e. multiple companies can rank in 1st position) and grouped by company?

Related posts:
- https://www.statalist.org/forums/for...he-rank-option
- https://www.stata.com/statalist/arch.../msg00581.html

Thank you!
Tags: None
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2403
#2

03 Oct 2021, 16:14

Your description is still a bit unclear to me. How does year factor into the problem? Are you ranking companies by average profit for each year, or overall?

Also, please read the FAQ, especially on how to provide a reproducible data example (-help dataex-). When asking for help, it's more likely someone may be inclined to help if there is a -dataex- that can be copied into Stata and use.
Comment
John Singer

Join Date: Oct 2021

Posts: 46
#3

03 Oct 2021, 16:16

Hello, the year is more or less irrelevant. It's average profits across the time of years. Let me see if I can make a good data example to use.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#4

03 Oct 2021, 16:19

Code:

sort avg_profit gen wanted = sum(avg_profit != avg_profit[_n-1])

produces the results you show in your example.

But I do not understand what you mean here by "grouped by company," so I fear this will not work with a full data set. The only way I can understand that phrase would produce precisely what you say you don't want: each company, since the average profit never changes in the example data, is always ranked 1 in its own group.

Added: crossed with #2 and #3.
Comment
John Singer

Join Date: Oct 2021

Posts: 46
#5

03 Oct 2021, 16:35

Hi Clyde,

Thank you for your reply. I wasn't sure how to phrase it - I just meant that the rank should apply to the company -- that it is the company's rank relative to other companies that I care about.

When there are ties, is there a way to count how many ties there are per rank? For example in this case, both honda and toyota are in rank 1. If I had 100 companies in total and I wanted to list the 30 most profitable, what would be the quickest way to get this list? (maybe there is also a different way to use it aside from using rank)

Let me know if this data extract is helpful. I can also provide a larger one.

Code:

* Example generated by -dataex-. For more info, type help dataex clear input str6 company int year byte(yearly_profit avg_profit) "honda " 1970 9 10 "honda " 1971 10 10 "honda " 1972 11 10 "toyota" 1970 8 10 "toyota" 1971 12 10 "toyota" 1972 10 10 "ford " 1970 15 21 "ford " 1971 18 21 "ford " 1972 30 21 end
Comment
John Singer

Join Date: Oct 2021

Posts: 46
#6

03 Oct 2021, 16:45

Hi Clyde,
The code you provided worked - thank you! All companies with the same avg profits have the same rank. Perhaps my second question above would need to be a separate thread - I don't want to disobey forum rules.
Thank you very much.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#7

03 Oct 2021, 16:47

Again, you are not being completely clear in your wording. I'm going to assume you mean you want a count of how many different companies there are per rank. That will be different from the number of ties per rank, because each company also ties with itself in different years in this data.

Code:

by rank (company), sort: gen companies_tied_this_rank = sum(company != company[_n-1]) by rank (companies_tied_this_rank), sort: replace companies_tied_this_rank /// = companies_tied_this_rank[_N]

As for the list of the 30 most profitable companies, probably it cannot be done. The reason is that if you list the companies in rank order, there will likely be ties for 30th position. Any arbitrary choice to include some of them and exclude others would be misleading. It could only be done in the event that there were no such ties at 30th position--which, given the nature of your data, seems improbable.
Comment
John Singer

Join Date: Oct 2021

Posts: 46
#8

03 Oct 2021, 17:06

My apologies for the lack of clarity, I will try to improve my formulations. Both of the codes you provided were interesting so I'm definitely learning. What I'm asking about now is something like how the rankings are displayed for these universities: https://www.topuniversities.com/univ...-rankings/2022 . Here, Stanford and Cambridge are tied for 3rd. But rather than being ranked 4th, Harvard is ranked 5th (since there are 4 universities that are better).
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30101

03 Oct 2021, 17:20

Code:

egen flag = tag(company)
egen wanted = rank(avg_profit) if flag, track
by company (wanted), sort: replace wanted = wanted[1]

Comment

John Singer

Join Date: Oct 2021

Posts: 46
#10

04 Oct 2021, 05:50

Thank you, Clyde. That worked!
Comment

company	year	avg_profit	rank
honda	1970	10	1
honda	1971	10	1
toyota	1970	10	1
toyota	1971	10	1
ford	1970	11	2
ford	1972	11	2

Announcement

Ranking variables with Tying and between Groups

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment