Hello everyone,
for a dataset that contains many countries, I have generated a tag that shows me outliers by an earlier defined criterion, taking a value of 1 if the value for a variable is an outlier and zero otherwise.
I have counted the number of outliers per country (c_id) with
Now I want to drop the entire country (c_id), if the number of outliers is above a certain number. e.g. above 3 or 4.
Here an extract of my data:
So I would like to drop Bangladesh completely (5 outliers), but keep Indonesia (only 2 outliers).
for a dataset that contains many countries, I have generated a tag that shows me outliers by an earlier defined criterion, taking a value of 1 if the value for a variable is an outlier and zero otherwise.
I have counted the number of outliers per country (c_id) with
Code:
bysort c_id: egen n_outliers=count(Z_mean_d_occ_SD2) if Z_mean_d_occ_SD2 == 1
Here an extract of my data:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float c_id str33 country float(Z_mean_d_occ_SD2 n_outliers) 7 "Bangladesh" 0 . 7 "Bangladesh" 0 . 7 "Bangladesh" 0 . 7 "Bangladesh" 1 5 7 "Bangladesh" 1 5 7 "Bangladesh" 1 5 7 "Bangladesh" 1 5 7 "Bangladesh" 0 . 7 "Bangladesh" 1 5 end
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float c_id str33 country float(Z_mean_d_occ_SD2 n_outliers) 41 "Indonesia" 0 . 41 "Indonesia" 0 . 41 "Indonesia" 0 . 41 "Indonesia" 0 . 41 "Indonesia" 0 . 41 "Indonesia" 1 2 41 "Indonesia" 0 . 41 "Indonesia" 0 . 41 "Indonesia" 1 2 end
Comment