Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Drop entire individual if number of outliers exceeds threshold level

    Hello everyone,

    for a dataset that contains many countries, I have generated a tag that shows me outliers by an earlier defined criterion, taking a value of 1 if the value for a variable is an outlier and zero otherwise.

    I have counted the number of outliers per country (c_id) with

    Code:
    bysort c_id: egen n_outliers=count(Z_mean_d_occ_SD2) if Z_mean_d_occ_SD2 == 1
    Now I want to drop the entire country (c_id), if the number of outliers is above a certain number. e.g. above 3 or 4.

    Here an extract of my data:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float c_id str33 country float(Z_mean_d_occ_SD2 n_outliers)
    7 "Bangladesh" 0 .
    7 "Bangladesh" 0 .
    7 "Bangladesh" 0 .
    7 "Bangladesh" 1 5
    7 "Bangladesh" 1 5
    7 "Bangladesh" 1 5
    7 "Bangladesh" 1 5
    7 "Bangladesh" 0 .
    7 "Bangladesh" 1 5
    end
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float c_id str33 country float(Z_mean_d_occ_SD2 n_outliers)
    41 "Indonesia" 0 .
    41 "Indonesia" 0 .
    41 "Indonesia" 0 .
    41 "Indonesia" 0 .
    41 "Indonesia" 0 .
    41 "Indonesia" 1 2
    41 "Indonesia" 0 .
    41 "Indonesia" 0 .
    41 "Indonesia" 1 2
    end
    So I would like to drop Bangladesh completely (5 outliers), but keep Indonesia (only 2 outliers).




  • #2
    hello, Jonas. Hope this is what you want.
    Code:
    clear
    input float c_id str33 country float(Z_mean_d_occ_SD2 n_outliers)
    7 "Bangladesh" 0 .
    7 "Bangladesh" 0 .
    7 "Bangladesh" 0 .
    7 "Bangladesh" 1 5
    7 "Bangladesh" 1 5
    7 "Bangladesh" 1 5
    7 "Bangladesh" 1 5
    7 "Bangladesh" 0 .
    7 "Bangladesh" 1 5
    41 "Indonesia" 0 .
    41 "Indonesia" 0 .
    41 "Indonesia" 0 .
    41 "Indonesia" 0 .
    41 "Indonesia" 0 .
    41 "Indonesia" 1 2
    41 "Indonesia" 0 .
    41 "Indonesia" 0 .
    41 "Indonesia" 1 2
    end
    drop n_outliers
    bysort c_id: egen n_outliers=sum(Z_mean_d_occ_SD2)
    drop if n_outliers>3
    2B or not 2B, that's a question!

    Comment


    • #3
      yes, perfect! Thanks a lot Liu!

      Comment

      Working...
      X