Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identifying outliers using as little code

    Hi there,

    I'm trying to create a dummy variable that takes the value of 1 for outliers in GDP growth from 1990-2000 by using the 1.5*IQR rule in at least code as possible. Other than having a very complex set of code I am also having problems with telling stata to replace 0 with 1 when the value >x And <y. Also I have missing data which is being replaced by 1 when using this type of code.

    Thank you in advanced,

    Daniel Greener

  • #2
    There can be a trade-off between the amount of lines and how readable your code is. You can shorten the code below, but that will probably reduce the readability of it.

    Code:
    sysuse nlsw88, clear
    
    // find the bounds
    sum hours, detail
    local p25 = r(p25)
    local p75 = r(p75)
    local iqr = `p75'-`p25'
    local lb = `p25' - 1.5*`iqr'
    local ub = `p75' + 1.5*`iqr'
    
    // make the variable
    gen byte outlier = !inrange(hours,`lb',`ub') if !missing(hours)
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Daniel does not show code so his difficulties defy detailed comment. But a criterion for outliers cannot here be being above some threshold and also below a lower threshold. Perhaps the code needs | not &.

      Comment

      Working...
      X