Drop entire individual if number of outliers exceeds threshold level

Jonas Boehlke

Join Date: May 2019
Posts: 22

Drop entire individual if number of outliers exceeds threshold level

18 May 2019, 04:08

Hello everyone,

for a dataset that contains many countries, I have generated a tag that shows me outliers by an earlier defined criterion, taking a value of 1 if the value for a variable is an outlier and zero otherwise.

I have counted the number of outliers per country (c_id) with

Code:

bysort c_id: egen n_outliers=count(Z_mean_d_occ_SD2) if Z_mean_d_occ_SD2 == 1

Now I want to drop the entire country (c_id), if the number of outliers is above a certain number. e.g. above 3 or 4.

Here an extract of my data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float c_id str33 country float(Z_mean_d_occ_SD2 n_outliers)
7 "Bangladesh" 0 .
7 "Bangladesh" 0 .
7 "Bangladesh" 0 .
7 "Bangladesh" 1 5
7 "Bangladesh" 1 5
7 "Bangladesh" 1 5
7 "Bangladesh" 1 5
7 "Bangladesh" 0 .
7 "Bangladesh" 1 5
end

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float c_id str33 country float(Z_mean_d_occ_SD2 n_outliers)
41 "Indonesia" 0 .
41 "Indonesia" 0 .
41 "Indonesia" 0 .
41 "Indonesia" 0 .
41 "Indonesia" 0 .
41 "Indonesia" 1 2
41 "Indonesia" 0 .
41 "Indonesia" 0 .
41 "Indonesia" 1 2
end

So I would like to drop Bangladesh completely (5 outliers), but keep Indonesia (only 2 outliers).

Tags: None

Liu Qiang

Join Date: Jun 2018
Posts: 135

18 May 2019, 04:18

hello, Jonas. Hope this is what you want.

Code:

clear
input float c_id str33 country float(Z_mean_d_occ_SD2 n_outliers)
7 "Bangladesh" 0 .
7 "Bangladesh" 0 .
7 "Bangladesh" 0 .
7 "Bangladesh" 1 5
7 "Bangladesh" 1 5
7 "Bangladesh" 1 5
7 "Bangladesh" 1 5
7 "Bangladesh" 0 .
7 "Bangladesh" 1 5
41 "Indonesia" 0 .
41 "Indonesia" 0 .
41 "Indonesia" 0 .
41 "Indonesia" 0 .
41 "Indonesia" 0 .
41 "Indonesia" 1 2
41 "Indonesia" 0 .
41 "Indonesia" 0 .
41 "Indonesia" 1 2
end
drop n_outliers
bysort c_id: egen n_outliers=sum(Z_mean_d_occ_SD2)
drop if n_outliers>3

2B or not 2B, that's a question!

Comment

Jonas Boehlke

Join Date: May 2019

Posts: 22
#3

18 May 2019, 05:44

yes, perfect! Thanks a lot Liu!
Comment

Announcement

Drop entire individual if number of outliers exceeds threshold level

Comment

Comment