Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Winsorization in Cross Sectional Data Setup

    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int stock_id str52 stock float mdate byte time float(rt idiovol beta size bmratio mom rev illiq coskew idioskew)
    1 "3M India Ltd." 510 1 .05207785 .0917159 .728636 5.772126 .2301245 -.13974656 -.10039777 .53863716 -9.826381 .741343
    1 "3M India Ltd." 511 2 .05333333 .1186561 .7557594 5.822898 .23466666 -.12463696 .008666431 .08341017 -10.59874 .8733395
    1 "3M India Ltd." 512 3 -.003164557 .0338021 .7641004 5.874875 .2227848 .12028617 .05207785 .11840295 -10.59904 .9627776
    1 "3M India Ltd." 513 4 .003174603 .0558378 .6181145 5.871695 .23796825 .15732026 .05333333 .09225776 -17.38132 1.017631
    1 "3M India Ltd." 514 5 -.008227848 .0225026 .6659399 5.874875 .2372152 .07240003 -.003164557 .48637635 -17.98597 1.058033
    1 "3M India Ltd." 515 6 -.034620292 . .6515402 5.86661 .23918316 .16849355 .003174603 .26876998 -18.51118 1.10018
    1 "3M India Ltd." 516 7 -.022806147 .0447686 .6263238 5.831355 .24785987 .186991 -.008227848 .06716504 -18.73217 1.29784
    1 "3M India Ltd." 517 8 -.06308135 .0643021 .6145906 5.808293 .25364453 .11012773 -.034620292 .13017114 -17.98717 1.527457
    1 "3M India Ltd." 518 9 -.13267148 .084525 .59209937 5.743131 .27072203 -.05206249 -.022806147 2.3642542 -12.11243 1.13165
    1 "3M India Ltd." 519 10 .1862643 .1171338 .55323493 5.60079 .3351925 -.12617134 -.06308135 1.9620322 -15.2693 .28583
    1 "3M India Ltd." 520 11 .10526316 .1154546 .7139889 5.771597 .2825614 -.1627062 -.13267148 2.644915 -17.68667 .38381
    1 "3M India Ltd." 521 12 .06095238 .0747316 .7005213 5.871695 .2556508 -.000526177 .1862643 .07487273 -21.36877 .3219087
    2 "A B B India Ltd." 510 1 .031069767 .0905763 .50186515 7.037915 .3767442 -.024300825 -.07641757 .06449259 -7.053874 .9268938
    2 "A B B India Ltd." 511 2 .01930711 .0664654 .46918815 7.068512 .3931433 .16202657 .09270177 .03930691 -7.208749 .9409047
    2 "A B B India Ltd." 512 3 -.004956629 .0326465 .4646652 7.087632 .3856966 .3383867 .031069767 .03146813 -7.526149 .9585953
    2 "A B B India Ltd." 513 4 -.14659314 .047887 .3562272 7.082666 .4021704 .3880917 .01930711 .1093973 -2.476258 1.032589
    2 "A B B India Ltd." 514 5 -.000208464 .1514549 .4090988 6.92414 .4712529 .3162546 -.004956629 .07628962 -2.743639 .0579131
    2 "A B B India Ltd." 515 6 .036905754 .0800709 .4863302 6.923934 .4713511 .16189374 -.14659314 .02386482 -1.431137 .1245402
    2 "A B B India Ltd." 516 7 .12306455 .039354 .4671087 6.960177 .4709431 .07487814 -.000208464 .02468353 -2.03507 .11236
    2 "A B B India Ltd." 517 8 .14753805 .078451 .5041491 7.07624 .4193375 -.03244597 .036905754 .05760874 -2.584214 .2062848
    2 "A B B India Ltd." 518 9 -.1015759 .0698819 .4845561 7.213857 .3654236 .05030429 .12306455 .009260001 16.52197 -.3169694
    2 "A B B India Ltd." 519 10 .07033692 .0813391 .4809295 7.106745 .41392845 .18510276 .14753805 .04597291 9.059702 -.3749099
    2 "A B B India Ltd." 520 11 .11487912 .0719442 .53784853 7.174716 .3867272 .15748493 -.1015759 .0758109 16.52873 -.5729252
    2 "A B B India Ltd." 521 12 .07335177 .0411989 .6065983 7.283462 .3468782 .1368051 .07033692 .02363835 20.45452 -.6339179
    3 "A C C Ltd." 510 1 -.1629956 .0494383 1.0135643 7.906322 .3757709 .08446016 .003982741 .000169294 -3.780895 .6208696
    3 "A C C Ltd." 511 2 .03759399 .057254 1.0164183 7.728468 .4576692 .2166355 .05057851 .000326287 -3.636427 .7166465
    3 "A C C Ltd." 512 3 -.001811594 .0478862 1.0125653 7.76537 .44108695 .066043414 -.1629956 .000302232 -3.862317 .7233284
    3 "A C C Ltd." 513 4 -.036660615 .0545035 .9185891 7.763659 .4440653 .035401925 .03759399 .000416217 -.9943126 .4085064
    3 "A C C Ltd." 514 5 .23247927 .0545389 .9776259 7.726332 .4609646 -.21417423 -.001811594 .00040734 -.620329 .0993094
    3 "A C C Ltd." 515 6 .009477224 .0679345 .9839109 7.93543 .3740752 -.13447292 -.036660615 .000301316 -.0955215 .1465145
    3 "A C C Ltd." 516 7 -.13416111 .048411 .9351818 7.944861 .3777105 .02194494 .23247927 .000308467 -.9959579 .2074354
    3 "A C C Ltd." 517 8 .07764953 .0739988 .9043338 7.801068 .43609655 .009738361 .009477224 .000375364 -1.213868 .0164527
    3 "A C C Ltd." 518 9 -.10094126 .085914 .9575355 7.875917 .4047387 -.07413296 -.13416111 .000380308 1.900243 .3630781
    3 "A C C Ltd." 519 10 -.05487365 .0731907 .8840985 7.769539 .4547292 .02231796 .07764953 .000468042 -6.269318 .4172227
    3 "A C C Ltd." 520 11 .11000764 .0545616 .8759854 7.713101 .4811306 -.08806377 -.10094126 .000592079 -3.216002 .2910195
    3 "A C C Ltd." 521 12 .14934617 .0546525 .8888509 7.817489 .43344805 -.1938414 -.05487365 .000360854 2.097508 .2552451
    4 "A D C India Communications Ltd." 510 1 -.1504425 .0897897 1.527426 3.663049 .8875516 -.04868094 -.10838446 .15264645 -10.04435 1.010176
    4 "A D C India Communications Ltd." 511 2 .022916667 .0696889 1.4991747 3.500137 1.0645833 -.26892313 -.02809633 .478931 -9.47834 1.120804
    4 "A D C India Communications Ltd." 512 3 -.09029192 .0908449 1.473083 3.522825 1.0407332 -.07940148 -.1504425 .7972757 -8.492423 1.342241
    4 "A D C India Communications Ltd." 513 4 -.08955224 . 1.2794074 3.428164 1.1616418 -.024808513 .022916667 1.8302515 -12.26315 1.393121
    4 "A D C India Communications Ltd." 514 5 .09180328 . 1.321978 3.334345 1.2759017 -.3187251 -.09029192 .568933 -11.6449 1.470866
    4 "A D C India Communications Ltd." 515 6 .013513514 .0816319 1.2205865 3.422306 1.1686186 -.3550343 -.08955224 1.403591 -15.83261 1.53574
    4 "A D C India Communications Ltd." 516 7 -.1525926 .075157 1.1036876 3.435599 1.1797037 -.2498118 .09180328 .5731305 -20.72038 1.794577
    4 "A D C India Communications Ltd." 517 8 -.08216783 .0859583 1.1581147 3.269949 1.392133 -.20006445 .013513514 2.918923 -20.01536 1.855321
    4 "A D C India Communications Ltd." 518 9 -.0895238 .1204352 1.2924528 3.1842844 1.516762 -.36199296 -.1525926 1.1755368 -16.89581 1.834721
    4 "A D C India Communications Ltd." 519 10 .17573223 .1484415 1.321028 3.090588 1.7414227 -.6221114 -.08216783 .414616 -10.60291 2.17644
    4 "A D C India Communications Ltd." 520 11 .18683274 .1644031 1.1657108 3.2523105 1.481139 -.6011787 -.0895238 .26050207 5.980072 1.166037
    4 "A D C India Communications Ltd." 521 12 .11244377 .1989271 .9353671 3.423611 1.247976 -.410789 .17573223 .11136226 .8642985 1.321339
    5 "A G C Networks Ltd." 510 1 -.2367297 .071817 1.7776214 5.452411 .39926785 .8099551 -.10675381 .012668496 -2.008504 1.368494
    5 "A G C Networks Ltd." 511 2 .11510792 .1417156 1.8288053 5.182289 .53197443 .8759325 -.000609756 .019332485 -2.207755 1.411894
    5 "A G C Networks Ltd." 512 3 -.06050179 .1398576 1.847679 5.291192 .4770609 .9781151 -.2367297 .02580632 -2.206439 1.394687
    5 "A G C Networks Ltd." 513 4 -.1571799 .1039934 1.9271867 5.228753 .5256772 .9490805 .11510792 .05270022 -16.98932 1.389085
    5 "A G C Networks Ltd." 514 5 .1819663 .0866766 1.8660675 5.057582 .6238117 .7598754 -.06050179 .033067085 -14.41975 1.471493
    5 "A G C Networks Ltd." 515 6 .13311887 .1175809 1.7479874 5.224725 .5277671 .434956 -.1571799 .022889167 -18.17974 1.451653
    5 "A G C Networks Ltd." 516 7 -.153035 .0672816 1.8438486 5.349865 .4909091 .12951286 .1819663 .008367621 -15.47684 1.404069
    5 "A G C Networks Ltd." 517 8 -.07214685 .1157435 1.771812 5.183692 .57964885 .11400487 .13311887 .02172558 -10.7831 1.559955
    5 "A G C Networks Ltd." 518 9 -.176845 .0434618 1.7264304 5.108729 .6247742 -.3036748 -.153035 .0481924 -1.276651 1.030801
    5 "A G C Networks Ltd." 519 10 .065412745 .0687537 1.6099724 4.914198 .7905956 -.4569304 -.07214685 .09379279 -8.940264 .9531597
    5 "A G C Networks Ltd." 520 11 .20047078 .1616858 1.618795 4.977492 .7421285 -.5386481 -.176845 .03792356 -8.306302 .8918644
    5 "A G C Networks Ltd." 521 12 -.001633987 .1845315 1.5032448 5.160262 .6181372 -.4746759 .065412745 .017859912 -18.83033 .9789386
    6 "Aarti Industries Ltd." 510 1 -.11272727 .0736996 . 4.383151 1.5577272 .7606374 .10981309 3.5199125 . .
    6 "Aarti Industries Ltd." 511 2 .07172131 .1305479 . 4.263384 1.873954 .7152069 .15789473 8.524462 . .
    6 "Aarti Industries Ltd." 512 3 -.05927342 .0837324 .5340315 4.332705 1.748526 .59189343 -.11272727 4.852062 -4.298619 .2721486
    6 "Aarti Industries Ltd." 513 4 .067073174 .0846716 .29095903 4.2727695 1.9989848 .5726061 .07172131 19.9523 -2.924355 .2739745
    6 "Aarti Industries Ltd." 514 5 -.034285713 .0633012 .25289702 4.3374214 1.873751 .51489896 -.05927342 7.869869 -4.783156 .265698
    6 "Aarti Industries Ltd." 515 6 .025641026 . .2859934 4.3019004 1.9414955 .6321865 .067073174 5.380274 -4.501383 .2803007
    6 "Aarti Industries Ltd." 516 7 .023076924 .105984 .24630985 4.3262496 2.0012832 .5517572 -.034285713 8.844972 -6.962467 .2519754
    6 "Aarti Industries Ltd." 517 8 -.07518797 .1041367 .20042898 4.349245 1.9557993 .4190181 .025641026 5.612284 -9.001595 -.0277025
    6 "Aarti Industries Ltd." 518 9 -.034552846 .051413 .344461 4.2710953 2.1149154 .424441 .023076924 7.338286 -11.18496 .0615053
    6 "Aarti Industries Ltd." 519 10 .069473684 .1425339 .44265455 4.236567 2.103158 .13935553 -.07518797 20.2322 -11.70739 -.0012475
    6 "Aarti Industries Ltd." 520 11 .07480315 . .4229978 4.304335 1.965246 0 -.034552846 7.427134 -11.24708 .1291737
    6 "Aarti Industries Ltd." 521 12 .2765568 .0532721 .41676605 4.3755054 1.830229 -.07943683 .069473684 6.172074 -19.85964 -.0126953
    7 "Aban Offshore Ltd." 510 1 -.01454716 .1616635 .7934055 4.364499 1.677616 .536468 .16863905 .27568695 1.907376 .3076815
    7 "Aban Offshore Ltd." 511 2 .11904762 .1593376 .9350693 4.3497615 1.752 .839973 .3487342 .12449858 2.9744 .3753761
    7 "Aban Offshore Ltd." 512 3 -.0719149 .1595069 .930231 4.462223 1.565617 1.0971848 -.01454716 .111181 3.17497 .396613
    7 "Aban Offshore Ltd." 513 4 .07290234 .0699518 1.0041386 4.3876357 1.738377 1.147445 .11904762 .27724665 17.25444 .4062088
    7 "Aban Offshore Ltd." 514 5 .14358975 .0848001 1.1255673 4.4579453 1.6202564 1.0372597 -.0719149 .2621013 16.49516 .4453102
    7 "Aban Offshore Ltd." 515 6 .107997 .1112041 1.1992325 4.5921865 1.416816 1.137833 .07290234 .240048 17.54756 .4030659
    7 "Aban Offshore Ltd." 516 7 .11298482 .0938857 1.205394 4.6947365 1.244317 1.3128272 .14358975 .16456696 19.93164 .4283593
    7 "Aban Offshore Ltd." 517 8 -.08424243 .0936676 1.2734697 4.801805 1.118 1.3100206 .107997 .18487276 19.32786 .4419501
    7 "Aban Offshore Ltd." 518 9 -.1528789 .0507527 1.4282002 4.7137556 1.220847 .8553097 .11298482 .6507711 .6497346 .6223291
    7 "Aban Offshore Ltd." 519 10 .0609375 .0702426 1.445556 4.5478587 1.4485937 .8043339 -.08424243 .7413275 1.864021 .5414243
    7 "Aban Offshore Ltd." 520 11 .4554492 .1707218 1.4533854 4.6069684 1.3653903 .4825824 -.1528789 .4639569 -.6797072 .769806
    7 "Aban Offshore Ltd." 521 12 .495067 .1588599 1.377454 4.982305 .9381229 .24256885 .0609375 .14630084 -14.73992 .8392541
    8 "Abbott India Ltd." 510 1 -.0880134 .037752 .47158355 6.180348 .3274099 .1107169 -.02148787 .08508382 -6.301948 1.068925
    8 "Abbott India Ltd." 511 2 .013786765 .0385153 .4842104 6.088228 .3590074 .06510867 -.007652637 .12055866 -6.190862 1.265891
    8 "Abbott India Ltd." 512 3 -.004533092 . .49346435 6.10191 .3541251 .10863384 -.0880134 .1603456 -5.911369 1.40919
    8 "Abbott India Ltd." 513 4 .0564663 .0438339 .3424161 6.097377 .3832787 .11660513 .013786765 1.718447 -4.470796 1.451965
    8 "Abbott India Ltd." 514 5 -.010344828 .0468644 .27984938 6.152307 .3627931 .06160415 -.004533092 .14157175 -4.913676 1.464376
    8 "Abbott India Ltd." 515 6 -.02543554 .0409994 .27989966 6.141908 .3665854 .2003451 .0564663 .1517245 -3.645518 1.707177
    8 "Abbott India Ltd." 516 7 .09617447 .0455453 .28707138 6.116135 .3687165 .15171462 -.010344828 .21580985 -2.411963 1.620872
    8 "Abbott India Ltd." 517 8 .02609263 .0724428 .29192433 6.207966 .3363666 -.09638219 -.02543554 .015271345 -2.543012 1.889616
    8 "Abbott India Ltd." 518 9 -.13715829 .0446494 .2225017 6.233724 .3480293 .000326211 .09617447 .012206397 -3.96243 .6325883
    8 "Abbott India Ltd." 519 10 -.04715417 .0464689 .29565212 6.027748 .425419 .023965763 .02609263 .11409093 -4.465439 .5295605
    8 "Abbott India Ltd." 520 11 .04774792 .0514049 .303146 5.979443 .4464721 -.10183617 -.13715829 .03045167 -2.773615 .5889145
    8 "Abbott India Ltd." 521 12 .0789668 .0499418 .3377377 6.026083 .4261255 -.14245626 -.04715417 .018171808 -2.487709 .5782627
    9 "Accelya Solutions India Ltd." 510 1 -.352459 .0902148 2.4291296 4.345881 .6719821 .7689072 .007930786 .007841749 -.5605962 .8591197
    9 "Accelya Solutions India Ltd." 511 2 .25431532 .0791339 2.419326 3.911423 1.040046 .7498674 -.04005722 .02595398 .0380721 .8658273
    9 "Accelya Solutions India Ltd." 512 3 -.13486238 .1268215 2.450457 4.1378827 .8291743 .6382929 -.352459 .017579073 -.179331 .8659876
    9 "Accelya Solutions India Ltd." 513 4 -.14103924 .0755205 2.5982895 3.99305 .9664899 .3645697 .25431532 .015291488 -3.949402 .8701398
    end
    format %tm mdate


    Appended above is the data file containing details of my variables in the cross sectional mode. I want to winsorize the variables in example data set in each cross-section (month) at the 0.5 and 99.5 % -tiles. I found a code that is suggested by Nick Cox. I modified this code to suit one of my variables: rt. You can see that below. In this code, the mdate is the month year variable. However, the original code uses month_id.

    So, i have two queries about this code: One, that will this code serve my above stated purpose of winsorisation. Second, in this code if a month_id is required to be there instead of mdate, then how can the code be modified to create such id or else will mdate work in the given case?

    egen rank = rank(rt), by(mdate)
    egen count = count(rt), by(mdate)
    gen pp = (rank - 0.5)/count
    egen p005 = min(rt / (pp >= 0.005)), by(mdate)
    egen p995 = max(rt / (pp <= 0.995)), by(mdate)
    gen rtW = max(p005, min(rt, p995))

    Thanks in anticipation!

  • #2
    The code will work exactly as is. It is a correct winsorization, and mdate will serve just fine as the month identifier.

    The construction -egen result = egen_function(input/condition)- is equivalent to the more transparent -egen result = egen_function(cond(condition, input, .))- It works because Stata evaluates logical expressions as 0 if false and 1 if true. So input/condition = input when condition is true, and is missing when input is false. I think the former was very popular here on Statalist several years back, but is less commonly used nowadays. In any case, they are equivalent.

    And -gen result = max(lower_bound, min(input, upper_bound))- has the desired effect of restrict trimming the value of input to lie between the specified lower and upper bounds.

    Comment


    • #3
      Thank you Clyde. I have another query. If i run the following code on the data it gives month wise cross-sectional descriptive statistics. However, i want a summary or average of all the cross sections or months at the end. How can i get that please.

      bys mdate: summarize rt idiovol beta size bmratio mom rev illiq coskew idioskew, de

      Comment


      • #4
        The command you show in #3 will give you a summary month by month. For an overall summary, just the same -summarize- command without the -bys mdate:- prefix.

        Comment


        • #5
          But i doubt summarize without bys-mdate will not give average of month by month values. Will it give or will it give a pooled statistics of all the data. Kindly clarify. I basically want to look at final average of month by month summary statistics.

          or else can i have output of bys-mdate: summarize in the data editor mode so that i can calculate average separately.
          Last edited by Sartaj Hussain; 06 Jul 2021, 10:02.

          Comment


          • #6
            It gives pooled statistics of all the data. That is what I thought you meant by "i want a summary or average of all the cross sections or months at the end." To get a summary of all the monthly averages is a little different:

            Code:
            collapse rt idiovol beta size bmratio mom rev illiq coskew idioskew, by(mdate)
            summ, detail

            Comment


            • #7
              Doesn't work. Error: invalid 'detail' r(198).

              Moreover, I need mean value of cross sectional summary statistics not the mean of cross sectional variables.
              Last edited by Sartaj Hussain; 07 Jul 2021, 00:01.

              Comment


              • #8
                Doesn't work. Error: invalid 'detail' r(198).
                I have no idea what that's about. I ran the code using your example data, and it gave me no error messages. Check your code carefully--make sure it is exactly the same as posted.

                I need mean value of cross sectional summary statistics not the mean of cross sectional variables
                And that is exactly what the code in #6 will give you. The -collapse- command creates the monthly summary statistics, and then -summ- calculates their averages (and standard deviations, etc.)

                Comment


                • #9
                  Thanks, It worked now. I was really committing mistake in running the code. Thanks for pointing out.

                  Comment


                  • #10
                    If the code in #1 were to be applied on daily data instead of monthly data, what changes would it require to have.

                    Comment


                    • #11
                      Nothing in the code in #1 is specific to the unit of time in the date variable. It would work equally well with a date variable in any unit. In fact, the code doesn't even depend on mdate being a time variable at all. The same code would also work for a variable denoting spatial units, or any other grouping of the observations.

                      Comment


                      • #12
                        So irrespective of daily or monthly date, it will do same thing and yield same results in a cross section of daily data. Is that what it means.

                        Comment


                        • #13
                          Yes.

                          Comment


                          • #14
                            One more thing. When code #1 is run, it also replaces missing observations in data, how can it leave missing observations as it is.

                            Comment


                            • #15
                              I don't understand. The code in #1 doesn't replace anything anywhere. It only creates new variables.

                              Comment

                              Working...
                              X