Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Deleting the upper and lower 0.5 percentile of a variable from my data set

    Hello everybody,

    I have the following problem. I would like to delete the upper and lower 0.5 percentile of a variable from my data set. I attempted the following approach.

    //Entfernen, falls das oberste oder unterste 0,5 Perzentil bei Gesamtleistung oder Gesamtkosten überschritten wird.

    centile(log_d_k log_d_gsl), centile(0.5, 99.5)

    drop if log_d_k <= -0.5972232 | log_d_k >= 0.4187304 & !missing(log_d_k)

    drop if log_d_gsl <= -0.5456777 | log_d_gsl >= 0.4488837 & !missing(log_d_k)

    From my point of view this approach is not elegant.

    Thank you!
    Attached Files

  • #2
    First, I strongly believe that it is an extremely bad idea to mechanically remove outliers that way. If you want to do something wrong, than here are two ways in which you can do it:

    Code:
    centile(log_d_k ), centile(0.5, 99.5)
    drop if ( price < r(c_1) | price > r(c_2) )
    centile(log_d_gsl), centile(0.5, 99.5)
    drop if !inrange(price,r(c_1), r(c_2))
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Daniel: I strongly agree with Maarten.

      On a different note: Inserting code as code (as Maarten did) makes for material easier to read than attached images.

      Comment

      Working...
      X