Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Winsorize in cross-section

    Hi guys,

    I am very new to Stata and have to do a winsorizing on stocks characteristics in each cross-section (month). I have to winsorize at the 0.5 and 99.5 % -tiles.

    Now, I tried using winsor2 variable, replace cuts(0.5, 99.5). However, this variable cannot be combined by "by" and thus does not allow for cross-sectional winsorizingj. by month_id: winsor2 does not work.

    I also tried :

    bys month_id: summarize marketcap,de
    bys month_id: replace marketcap = r(p1) if marketcap < r(p1) & marketcap !=.
    bys month_id: replace marketcap = r(p99) if marketcap > r(p99) & marketcap !=.

    However, here, we can only winsorize at the 1 and 99 % -tiles. I need to winterize at 0.5% and 99.5% though.

    I appreciate ny help provided !

    Best, BAT

  • #2
    Your code is incorrect even for what you say it does. The effect of

    Code:
    bys month_id: summarize marketcap,de
    is just to leave the last summarize result in memory and that result in

    Code:
     
    bys month_id: replace marketcap = r(p1) if marketcap < r(p1) & marketcap !=.
    bys month_id: replace marketcap = r(p99) if marketcap > r(p99) & marketcap !=.
    will not do anything different from

    Code:
     
    replace marketcap = r(p1) if marketcap < r(p1) & marketcap !=.
    replace marketcap = r(p99) if marketcap > r(p99) & marketcap !=.
    as r(p1) and r(p99) are constants, not variables. So, you would be applying the 1% and 99% percentiles for the last month to all months.

    See help for pctile for various relevant commands. But to apply such calculations repeatedly you could adopt technique from http://www.stata.com/support/faqs/st...ing-positions/

    Let's suppose that you work with plotting positions (rank - 0.5)/count and thus look for the smallest value with plotting position >= 0.005 and the largest value with plotting position <= 0.995. Then

    Code:
    egen rank = rank(marketcap), by(month_id) 
    egen count = count(marketcap), by(month_id) 
    gen pp = (rank - 0.5)/count 
    egen p005 = min(marketcap  / (pp >= 0.005)), by(month_id) 
    egen p995 = max(marketcap  / (pp <= 0.995)), by(month_id) 
    gen marketcapW = max(p005, min(marketcap, p995))

    Comment


    • #3
      Thank you very much for your reply, Nick! It helped a lot.

      Comment

      Working...
      X