Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Scaled Winsorization in Stata

    Hello

    In stata is there any way to treat data with scaled winsorization? Basically what I want to do is instead of winsorizing the outliers to the nth percentile, I want to winsorize the outliers with intervals according to the value of the outliers (lowest outlier is winsorized to nth percentile, higher outlier is winsorized to nth percentile+1 etc.) I am using Stata 13 and any help is appreciated.

  • #2
    I suspect it's programmable, although I don't understand your recipe, as you use terms (e.g.. lowest outlier) and notation you don't define.

    My own winsor (SSC) was written in support of univariate summary; any use of winsorization in several-variable modelling lies far beyond what I want to support. There's an independent winsor2 (SSC) from someone else that I've never used or studied.

    I think you'll need to write your own code if winsor2 doesn't do what you want.

    Using rank, or any function thereof, to define outliers seems to me a confusion of concepts.
    Last edited by Nick Cox; 12 Jan 2017, 02:50.

    Comment


    • #3
      I'm sorry for being not clear enough in my previous post. I hope the following explanation may clear things up. In my data, there are three outliers. So to reduce the effect of these outliers I winsorize them to the 99% percentile. However this sets the value of the outliers to be equal to the value of the 99% percentile which is let's say 10. What I want is to scale the winsorization. So instead of having all 3 outliers have the value of 10 I instead want the outliers to have the value of 10, 11, and 12 depending on the value of the outliers before the winsorization.

      Comment


      • #4
        Sorry, but that is no clearer to me and sounds indefensibly ad hoc in any case.

        Comment


        • #5
          Jonathan - I agree with Nick that this is a terribly ad hoc way to handle outliers. Even conventional winsorizing (which finance scholars use almost automatically) has no statistical properties that I know of and you're deviating from that for reasons that are not apparent.

          However, if you really know you only have 3 observations you want to change, you can easily do it with 3 replace statements e.g., replace x=12 if x==85 (assuming the outlier is 85) or you can identify the observations to replace with normal identifiers, e.g., replace x=12 if year==2017 & firmnum==201.

          If you have a much larger number of "outliers" and you want to automate the entire process, then you've got a much more complex programming problem.

          Comment

          Working...
          X