How to keep 5 observations per id?

Geert van Hooff

Join Date: Apr 2017

Posts: 11
#1

How to keep 5 observations per id?

29 Apr 2017, 10:41

Hi all,

I've got a problem with reformatting my dataset. I've got multiple observations of the percentage of shares held by an institution (minimum 20) per id (Cusip in my case).

I would like to keep the top 5 of those observations (percentages held by institution) per Cusip and delete all the other observations for that cusip. But I can't find anywhere the command for this.

Is there someone who could help me with this matter?

Thanks in advance,

Geert
Tags: None
Daniel Bela

Join Date: Apr 2014

Posts: 246
#2

29 Apr 2017, 10:58

Hi Geert,

please have a look at the FAQ, especially section 12.2 regarding data examples, in order to get quick answers to your questions.
Without seeing your data (or a sample from it), it may be very hard to understand some data questions, and find appropriate answers to it.

That said, I suspect something like this could help:

Code:

* input example data clear input cusip share 1 0.1 1 0.2 1 0.3 1 0.4 1 0.5 1 0.6 1 0.7 1 0.8 1 0.9 1 1.0 2 0.1 2 0.2 2 0.3 2 0.4 2 0.5 2 0.6 2 0.7 2 0.8 2 0.9 2 1.0 end * only keep highest 5 observations per cusip bysort cusip (share) : drop if _n<=_N-5

Regards
Bela
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17080
#3

29 Apr 2017, 12:01

Geert.
I do share Bela's advice about posting an example/excerpt of your data via -dataex- (see -search dataex- to install).
From you post, it is not clear whether you consider keeping missing values, too (as Stata ranks missing values as the highest one; hence they were automatically included among the top five percentages per id).
If you would get rid of missing values, you can tweak Bela's code this way:

Code:

bysort cusip (share) : drop if _n<=_N-5 | share==.

Kind regards,
Carlo
(Stata 18.0 SE)
1 like
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#4

29 Apr 2017, 12:09

I agree with Carlo's concern about missing values, but I fear that his approach will include the missing values in the count of the top 5 shares. The following would avoid that by dropping all observations with share missing first, then choosing the top 5 of the remaining observations for each cusip.

Code:

drop if share==. bysort cusip (share) : drop if _n<=_N-5
1 like
Comment
Geert van Hooff

Join Date: Apr 2017

Posts: 11
#5

29 Apr 2017, 12:41

Thanks everyone for the fast response! Sorry I am new here, so my next post will be with an example from -dataex-.

Already dropped all the missing variables before the calculation, therefore it isnt neccessary.
Comment
Geert van Hooff

Join Date: Apr 2017

Posts: 11
#6

29 Apr 2017, 13:13

Quick follow-up question.
This is an example from dataex of my sample.

I would like to add up all the perc2012 for the same cusip. As I looked on other topics, they used the collapse command. Which would be collapse (sum) perc2012, by (cusip) in this case. However, as I do that I'm losing my variable indcode which I still need. Is there a solution in Stata for this, or do I have to download the data again and merge it with the ''new'' file?

Attached Files
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#7

29 Apr 2017, 14:01

To post dataex output, follow the instructions to select and copy from your Stata Results window the lines starting with the line after "copy starting from the next line" and ending with the line before "copy up to and including the previous line", thus

[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input str18 make int price
"AMC Concord" 4099
"AMC Pacer" 4749
"AMC Spirit" 3799
"Buick Century" 4816
"Buick Electra" 7827
"Buick LeSabre" 5788
"Buick Opel" 4453
"Buick Regal" 5189
"Buick Riviera" 10372
"Buick Skylark" 4082
end
[/CODE]

and then paste these lines into your post, where they will appear as

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str18 make int price "AMC Concord" 4099 "AMC Pacer" 4749 "AMC Spirit" 3799 "Buick Century" 4816 "Buick Electra" 7827 "Buick LeSabre" 5788 "Buick Opel" 4453 "Buick Regal" 5189 "Buick Riviera" 10372 "Buick Skylark" 4082 end

from which they can be copied and pasted into Stata to develop and test an answer to your question.

Please try again.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#8

29 Apr 2017, 14:09

Without data to test on, I cannot be sure this will work for you, but I think

Code:

collapse (max) indcode (sum) perc2012, by (cusip)

will work for you, as long as you want the result to be just one line per cusip, and assuming the indcode is the same in all observations with the same cusip. If you would like to preserve your data as it is and just add a new variable with the total, something like

Code:

by cusip, sort: egen sum2012 = total(perc2012)

might work.

Last edited by William Lisowski; 29 Apr 2017, 14:11.
1 like
Comment
Geert van Hooff

Join Date: Apr 2017

Posts: 11
#9

29 Apr 2017, 14:58

That worked! You're a genius. Thank you!
Comment

Announcement

How to keep 5 observations per id?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment