Copy highest, 2nd highest,... nth highest value into columns 1, 2, ..., n

John Stata

Join Date: Aug 2018
Posts: 4

Copy highest, 2nd highest,... nth highest value into columns 1, 2, ..., n

12 Aug 2018, 05:31

Hi to all!
I have a data set with Countries, operators and dates. I know want to have the highest date (most recent) per group into a new column that only contains most the recent dates across groups. The same with the 2nd ... nth highest dates. There's about up to 10 of those different dates. The dates within a country group and in one column should obviously be constant.

Country	Operator	Date	highest date	2nd highest date	3 highest date
US	AT&T	2017	2017	2016	2004
US	Sprint	2016	2017	2016	2004
US	AT&T	2004	2017	2016	2004
UK	EE	2004	2004	2001	.
UK	O2	2001	2004	2001	.
France	Orange	2012	2012	.	.
France	SFR	2012	2012	.	.

Can anyone help me set up those columns to the right, please? I tried to work with groups and ranks but nothing really worked.
Thanks a lot!
Bests

Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35698

12 Aug 2018, 05:36

Please read and act on FAQ Advice #12 and #6.

Data example in requested form; show us your code attempted; error reports like "nothing really worked" don't allow us to explain what you did wrong; real names are preferred here.

Stata has variables, not columns.

That said, this shows some technique.

Code:

. webuse grunfeld, clear 

. 
. bysort company (mvalue) : gen year1 = year[_N] 

. by company (mvalue) : gen year2 = year[_N-1] 

. by company (mvalue) : gen year3 = year[_N-2] 

. 
. list company year mvalue year? in 18/20 

     +-------------------------------------------------+
     | company   year   mvalue   year1   year2   year3 |
     |-------------------------------------------------|
 18. |       1   1937   5387.1    1953    1954    1937 |
 19. |       1   1954   5593.6    1953    1954    1937 |
 20. |       1   1953   6241.7    1953    1954    1937 |
     +-------------------------------------------------+

Last edited by Nick Cox; 12 Aug 2018, 05:45.

Comment

John Stata

Join Date: Aug 2018

Posts: 4
#3

12 Aug 2018, 06:09

Thanks! Sorry, I've just read over the relevant FAQ paragraphs and will change that.

Thanks for your help. This approach however, only gives me the highest value as long as those observations have distinct years assigned to them. So, if there's 1954 assigned twice it gives me that as the highest and second highest value. Can this be prevented? Thanks!

I tried

Code:

sort company year egen date_ranks = rank(year), by(company) bysort company: gen year_1 == year if rank == 1 bysort company: gen year_2 == year if rank == 2

and so on but my ranks are partly decimal numbers (avg ranks). Any idea how to fix this?
It is more about putting the right year in the right variable (year_n). I can "fill up" all the observations per company easily.
Thanks!

Last edited by John Stata; 12 Aug 2018, 06:18.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#4

12 Aug 2018, 07:16

Data example in requested form that shows the difficulty please.

Also cross-posted at https://stackoverflow.com/questions/...n-new-variable

Please read the entirety of the FAQ Advice -- as every prompt reminds you -- including policy on cross-posting.

Last edited by Nick Cox; 12 Aug 2018, 07:20.
Comment
Romalpa Akzo

Join Date: Oct 2017

Posts: 369
#5

12 Aug 2018, 21:23

Code:

bys company: egen rank = rank(year), f tab rank, gen(ear_) foreach v of var ear_* { by company: egen y`v' = max(cond(`v'==1,year,.)) } drop rank ear_*

For future posts, please notice to use -dataex- (as guided in section 12.2 of FAQ) to give out a small example of your data.
1 like
Comment

Announcement

Copy highest, 2nd highest,... nth highest value into columns 1, 2, ..., n

Comment

Comment

Comment

Comment