Drop top 1% and 10% of a variable

Ronan Mack

Join Date: Dec 2022

Posts: 12
#1

Drop top 1% and 10% of a variable

19 Dec 2022, 04:28

hello, I am wondering how to go about dropping the top 1% and 10% according to size of a certain variable. It is to be used as a robustness test for my data. Thanks
Tags: None
Daniel Feenberg

Join Date: Oct 2014

Posts: 323
#2

19 Dec 2022, 06:43

Code:

sort var gen sum=sum(1) keep if sum>.99

but see https://www.nber.org/stata/efficient/percentiles.html for some alternatives. The only documentation of sum() I can find at the moment is in https://www.stata.com/manuals/fn.pdf but it is pretty sketchy.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35661
#3

19 Dec 2022, 07:01

The code in #2 will remove all your data. The new variable is by construction integers 1 up and the inequality is the wrong way round too. Further, the code is not subtle about missing values.

I believe that official command cumul offers what Daniel
is thinking of here. But so does summarize.
Comment
Daniel Feenberg

Join Date: Oct 2014

Posts: 323
#4

19 Dec 2022, 08:02

Of course I was too hasty and got it wrong. Should be (hopefully)

Code:

sort var gen sum=sum(1-missing(var)) keep if sum>.99*sum[_N]

but as Nick points out, -cumul- is one step and returns the data to the original sort. It is documented at page 433 (out of 3077!) in https://www.stata.com/manuals/r.pdf
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#5

20 Dec 2022, 01:17

I would do it with -summarize-. Something like:

Code:

summ size, detail keep if size< r(p99)

to drop the top 1% and

Code:

summ size, detail keep if size< r(p90)

to drop the top 10%.
Comment

Announcement

Drop top 1% and 10% of a variable

Comment

Comment

Comment

Comment