Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Drop top 1% and 10% of a variable

    hello, I am wondering how to go about dropping the top 1% and 10% according to size of a certain variable. It is to be used as a robustness test for my data. Thanks

  • #2
    Code:
    sort var
    gen sum=sum(1)
    keep if sum>.99
    but see https://www.nber.org/stata/efficient/percentiles.html for some alternatives. The only documentation of sum() I can find at the moment is in https://www.stata.com/manuals/fn.pdf but it is pretty sketchy.

    Comment


    • #3
      The code in #2 will remove all your data. The new variable is by construction integers 1 up and the inequality is the wrong way round too. Further, the code is not subtle about missing values.

      I believe that official command cumul offers what Daniel
      is thinking of here. But so does
      summarize.


      Comment


      • #4
        Of course I was too hasty and got it wrong. Should be (hopefully)
        Code:
        sort var
        gen sum=sum(1-missing(var))
        keep if sum>.99*sum[_N]
        but as Nick points out, -cumul- is one step and returns the data to the original sort. It is documented at page 433 (out of 3077!) in https://www.stata.com/manuals/r.pdf

        Comment


        • #5
          I would do it with -summarize-. Something like:

          Code:
          summ size, detail
          keep if size< r(p99)
          to drop the top 1% and

          Code:
          summ size, detail
          keep if size< r(p90)
          to drop the top 10%.

          Comment

          Working...
          X