Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • ADO file to mark extreme values - now working - help

    I am trying to write an ado routine that creates a column that marks extreme values of a variable. But I cannot get it to work. It only works for a single variable.

    Ideally, I would run the ado file on a list of variables, and give it arguments for the upper and lower cut-off percentile.

    Any help is welcome!
    Martien


    Explanation:

    My main variable should have values of -3 to +3, so -99 and 287 are clearly extreme value. These extreme values should be marked. See table below an attachment.

    I want to rely on percentiles and use the sum,detail command. The result is the ado file that I show below, which unfortunately gives the error message ("varname is not a numeric variable and cannot be trimmed") if I want to apply it to 2 variables or more.

    See the code for the ADO below:


    program define trimmer
    foreach var of local 0 {
    capture confirm numeric variable `var'
    if _rc==0 {
    qui sum `var',detail
    gen to_use_`var'= inrange(`var', r(p1), r(p99)) if `var'!=.
    }
    else di "`varname' is not a numeric variable and cannot be trimmed."
    }
    end


    Example data, where the marker is to_use_Variable :

    Variable; to_use_Variable
    -99; 1
    ..
    -2.8; 0
    -1.3; 0
    -1.9; 0
    0.5; 0
    2.0; 0
    ..
    287; 1
    Attached Files
    Last edited by Martien Lubberink; 14 Aug 2015, 20:55.

  • #2
    Stata has a built-in programming feature to trap non-numeric variables in arguments to programs.
    Code:
    help syntax##description_of_varlist
    I'm not sure that the first and ninety-ninth centiles are an infallible way to accomplish what you want to do, but your trimmer program could look something like
    Code:
    program define trimmer
        version 14.0
        syntax varlist(numeric)
    
        foreach var of varlist `varlist' {
            quietly centile `var', centile(1 99)
            generate byte to_use_`var' = inrange(`var', r(c_1), r(c_2))
        }
    end

    Comment


    • #3
      At second reading, you don't want the program to balk, but rather just to warn. In that case you could do something like
      Code:
      version 14.0
      
      clear *
      set more off
      
      program define trimmer
          version 14.0
          syntax varlist
      
          foreach var of varlist `varlist' {
              local variable_type : type `var'
              if strpos("`variable_type'", "str") != 1 {
                  // do your thing here
              }
              else display in smcl as input "`var'" as error " is not a numeric variable and cannot be trimmed."
          }
      end
      
      sysuse auto
      
      trimmer _all
      
      exit
      At any rate, I would recommend taking advantage of Stata's programming syntax rather than relying on parsing the zero local macro.
      Last edited by Joseph Coveney; 14 Aug 2015, 22:16. Reason: Indented a closing brace.

      Comment


      • #4
        Also check out -extremes- from SSC.

        Comment


        • #5
          Dear all, many thanks for your contributions!

          I now succeeded in solving this by writing the attached ado file. Most challenging was the part where I have to convert the inputs on the boundaries 0 (e.g. 1, 99) to variables that can be read in the SUM command. The number of quotes is just staggering!

          For example, it requires this snippet for the upper boundary:

          local i = "r(p`upper')"
          local U =`"`i'"'

          sum `var', detail
          noisily gen `new_var' = inrange(`var', `L', `U') if `var'!=.



          Attached Files

          Comment

          Working...
          X