Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Maximum Daily Return

    I use below code to generate maximum daily return for a stock over the past month. However, i get repeated values in many cases. Just wanted to get it checked if the code that is being used is correct. Below is the illustrative data set and the code:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int stock_id str10 month str52 stock float pr
    1 "31-05-2002" "3M India Ltd."  282.7
    1 "03-06-2002" "3M India Ltd." 275.25
    1 "04-06-2002" "3M India Ltd."  288.5
    1 "05-06-2002" "3M India Ltd."  282.6
    1 "06-06-2002" "3M India Ltd."  284.9
    1 "07-06-2002" "3M India Ltd." 276.05
    1 "10-06-2002" "3M India Ltd."      .
    1 "11-06-2002" "3M India Ltd."  287.9
    1 "12-06-2002" "3M India Ltd."    285
    1 "13-06-2002" "3M India Ltd."  278.7
    1 "14-06-2002" "3M India Ltd."    283
    1 "17-06-2002" "3M India Ltd."  287.9
    1 "18-06-2002" "3M India Ltd." 294.95
    1 "19-06-2002" "3M India Ltd."  294.5
    1 "20-06-2002" "3M India Ltd."    294
    1 "21-06-2002" "3M India Ltd."    284
    1 "24-06-2002" "3M India Ltd." 290.05
    1 "25-06-2002" "3M India Ltd."  289.1
    1 "26-06-2002" "3M India Ltd."  282.5
    1 "27-06-2002" "3M India Ltd."    290
    1 "28-06-2002" "3M India Ltd." 285.15
    1 "01-07-2002" "3M India Ltd."  287.2
    1 "02-07-2002" "3M India Ltd." 285.05
    1 "03-07-2002" "3M India Ltd."    295
    1 "04-07-2002" "3M India Ltd."  324.5
    1 "05-07-2002" "3M India Ltd."  318.3
    1 "08-07-2002" "3M India Ltd."    318
    1 "09-07-2002" "3M India Ltd."  323.6
    1 "10-07-2002" "3M India Ltd."  313.4
    1 "11-07-2002" "3M India Ltd."    302
    1 "12-07-2002" "3M India Ltd."  301.7
    1 "15-07-2002" "3M India Ltd."    295
    1 "16-07-2002" "3M India Ltd."    300
    1 "17-07-2002" "3M India Ltd."    292
    1 "18-07-2002" "3M India Ltd." 295.85
    1 "19-07-2002" "3M India Ltd."  303.5
    1 "22-07-2002" "3M India Ltd."    291
    1 "23-07-2002" "3M India Ltd." 302.75
    1 "24-07-2002" "3M India Ltd." 299.45
    1 "25-07-2002" "3M India Ltd."    301
    1 "26-07-2002" "3M India Ltd."    300
    1 "29-07-2002" "3M India Ltd."  301.7
    1 "30-07-2002" "3M India Ltd."    300
    1 "31-07-2002" "3M India Ltd."    300
    1 "01-08-2002" "3M India Ltd."    300
    1 "02-08-2002" "3M India Ltd."  302.9
    1 "05-08-2002" "3M India Ltd." 303.75
    1 "06-08-2002" "3M India Ltd."    304
    1 "07-08-2002" "3M India Ltd."  306.8
    1 "08-08-2002" "3M India Ltd."    300
    1 "09-08-2002" "3M India Ltd." 300.75
    1 "12-08-2002" "3M India Ltd."  300.4
    1 "13-08-2002" "3M India Ltd."    300
    1 "14-08-2002" "3M India Ltd."    300
    1 "16-08-2002" "3M India Ltd."  304.1
    1 "19-08-2002" "3M India Ltd."    300
    1 "20-08-2002" "3M India Ltd."    303
    1 "21-08-2002" "3M India Ltd." 301.95
    1 "22-08-2002" "3M India Ltd."  302.1
    1 "23-08-2002" "3M India Ltd."  303.4
    1 "26-08-2002" "3M India Ltd."      .
    1 "27-08-2002" "3M India Ltd."    305
    1 "28-08-2002" "3M India Ltd." 305.25
    1 "29-08-2002" "3M India Ltd."    305
    1 "30-08-2002" "3M India Ltd."    316
    1 "02-09-2002" "3M India Ltd."    315
    1 "03-09-2002" "3M India Ltd."  313.5
    1 "04-09-2002" "3M India Ltd."    315
    1 "05-09-2002" "3M India Ltd."    315
    1 "06-09-2002" "3M India Ltd."  310.4
    1 "09-09-2002" "3M India Ltd."  306.5
    1 "11-09-2002" "3M India Ltd."    310
    1 "12-09-2002" "3M India Ltd."      .
    1 "13-09-2002" "3M India Ltd."    310
    1 "16-09-2002" "3M India Ltd."    310
    1 "17-09-2002" "3M India Ltd." 328.35
    1 "18-09-2002" "3M India Ltd."    325
    1 "19-09-2002" "3M India Ltd." 320.05
    1 "20-09-2002" "3M India Ltd."  316.9
    1 "23-09-2002" "3M India Ltd." 315.35
    1 "24-09-2002" "3M India Ltd."  312.6
    1 "25-09-2002" "3M India Ltd."  314.4
    1 "26-09-2002" "3M India Ltd." 316.75
    1 "27-09-2002" "3M India Ltd."  315.1
    1 "30-09-2002" "3M India Ltd."    315
    1 "01-10-2002" "3M India Ltd."    311
    1 "03-10-2002" "3M India Ltd."    315
    1 "04-10-2002" "3M India Ltd."    312
    1 "07-10-2002" "3M India Ltd."      .
    1 "08-10-2002" "3M India Ltd."  311.8
    1 "09-10-2002" "3M India Ltd."  315.1
    1 "10-10-2002" "3M India Ltd."    315
    1 "11-10-2002" "3M India Ltd."    315
    1 "14-10-2002" "3M India Ltd."    315
    1 "16-10-2002" "3M India Ltd."    315
    1 "17-10-2002" "3M India Ltd."    315
    1 "18-10-2002" "3M India Ltd."    315
    1 "21-10-2002" "3M India Ltd."    314
    1 "22-10-2002" "3M India Ltd."    313
    1 "23-10-2002" "3M India Ltd."  313.2
    end
    Code:
    gen date1=date(month,"DMY")
    format date1 %td
    drop month
    rename date1 date
    gen mdate= mofd(date)
    format mdate %tm
    bysort stock_id:gen rt =((pr[_n]-pr[_n-1])/pr[_n-1])
    rangestat (max) rt, int(mdate -1 0) by(stock_id)
    collapse(mean) rt_max, by(stock_id stock mdate)
    rename rt_max max
    Last edited by Sartaj Hussain; 20 Nov 2021, 10:22.

  • #2
    I use appended code to generate for each stock and for each month maximum of daily return within each month.
    I don't understand the rationale for the line -rangestat (max) rt, int(mdate -1 0) by(stock_id)- which calculates the maximum value of rt for each stock over the current and immediately preceding month. So I think your code is not doing what you want.

    Also, your line -bysort stock_id:gen rt =((pr[_n]-pr[_n-1])/pr[_n-1])- is also incorrect in a subtle way. The problem is that by sorting on stock_id without also specifying sorting on mdate within stock_id, the data for each stock_id will be sorted in random and irreproducible order, so that your variable rt is not based on the chronologically previous month but just some other month that happens to now be earlier in the data set but might well be in the future compared to the current observation.

    So, I suggest
    Code:
    gen date1=date(month,"DMY")
    format date1 %td
    drop month
    rename date1 date
    gen mdate= mofd(date)
    format mdate %tm
    bysort stock_id (mdate): gen rt =((pr[_n]-pr[_n-1])/pr[_n-1])
    //  NO -rangestat- COMMAND
    collapse(mean) rt, by(stock_id stock mdate)

    Comment


    • #3
      Thanks indeed for pointing out. I was doing it wrongly by having current and previous month as condition while it should have been maximum rt value observed during the preceding month. I basically need to calculate for each stock maximum value of rt observed over the preceding or previous month. And i feel that
      Code:
      collapse(mean) rt, by(stock_id stock mdate)
      will not yield that. Then what to do to get maximum rt for a stock over the previous month?

      Moreover, putting (mdate) condition in rt calculation doesn't make any difference than without it. Results are similar in both.
      Last edited by Sartaj Hussain; 20 Nov 2021, 12:00.

      Comment


      • #4
        However, putting (mdate) condition in rt calculation doesn't make any difference than without it. Results are similar in both.
        That may be true in this case, but you cannot rely on it. The general rule about -sort- is that it sorts on the variables given but randomizes the order otherwise. So if it matters that the order be chronological, you have to write (mdate) in the code. Maybe you will get away with omitting it sometimes, but you cannot count on it.

        Now, there is serious ambiguity in the way the problem is stated. I don't know if English is your native language or not. It is mine. And "over the previous month," when uttered in a given month, normally means that given month, not the month before it. That seems paradoxical, but it is so. I think the notion is that it is interpreted as "the month, meaning previous days this month up to today" or something like that. Anyway, based on your restatement in #3, I'm thinking you actually intend the month before it. Moreover, the use of -int(mdate -1 0)- in your original code in #1 suggess that maybe you mean to include both the current month and the month before it. Rather than go through several rounds back and forth, here is the code for both possibilities. Choose the one that suits your need:

        Code:
        // FOR THE MONTH BEFORE ONLY
        rangestat (max) rt, by(stock_id) interval(mdate -1 -1)
        Code:
        FOR THE CURRENT MONTH AND THE PREVIOUS MONTH COMBINED
        rangestat (max) rt, by(stock_id) interval(mdate -1 0)
        And on the chance that you actually did mean the current month:
        Code:
        FOR THE CURRENT MONTH ONLY
        by stock_id, sort: egen rt_max = max(rt)
        Following any of those, you can reduce to one observation per month with:
        Code:
        by stock_id mdate: keep if _n == 1

        Comment


        • #5
          That may be true in this case, but you cannot rely on it. The general rule about -sort- is that it sorts on the variables given but randomizes the order otherwise. So if it matters that the order be chronological, you have to write (mdate) in the code. Maybe you will get away with omitting it sometimes, but you cannot count on it.
          Well chronological order is must in rt calculation. But i have done calculations of rt in many cases ignoring (mdate) in command and only taking stock_id as sort variable. Do i require to cross check all?

          Comment


          • #6
            Yes, I think you should. I'll tell you one group you can skip over. If you have -xtset- the data with a time variable before doing the calculation and used the lag operator instead of [_n-1] to access the preceding observation, then you don't have to worry, because if the data had been sorted into non-chronological order, the presence of the lag-operator would cause Stata to throw an error message and abort execution.

            Here's another group you can skip: if the data had been previously sorted by stock_id and chronologically within stock_id, and if nothing after that changed the sort order of the data, then -bysort stock_id:- will be OK. The reason is that Stata will notice that the data are already sorted by stock_id and will not re-sort the data. Consequently, the previously existing chronological order is left undisturbed. But if the data were not already sorted by stock_id at the time of the -bysort-, Stata would invoke -sort stock_id-, and this could randomize the order within stock_id. I suspect this is why your data didn't get scrambled in this case. I'm guessing that the data had been sorted by stock_id and date1 or mdate already, so the sort in -bysort stock_id- resulted in no action at all.

            Comment


            • #7
              The rule given in first case is not followed.

              As far as second case is concerned, To this, i would like to mention that the data is by default in chronological order before importing it to STATA. After import, date is changed into STATA readable format. Then following command is applied to generate returns:

              Code:
               bysort stock_id: gen rt =((pr[_n]-pr[_n-1])/pr[_n-1])
              That is it. Actually, it was mentioned in some other STATALIST post. Hope you get my point well.
              Last edited by Sartaj Hussain; 20 Nov 2021, 12:43.

              Comment


              • #8
                the data is by default in chronological order before importing it to STATA
                So, it depends on what you mean by this. If the original data is from some other sources, say an Excel file, and you import it into Stata, perhaps you then save it as a .dta file, and perhaps before doing that you have a -sort stock_id date- command, then that sort order and the fact that it is sorted that way are preserved in the saved .dta file. When you subsequently -use- that file, the data starts out sorted by stock_id and chronologically within stock_id, and Stata is on notice of that fact. When you then -bysort stock_id-, Stata notes that the data is already sorted on stock_id, so no re-sorting is done.

                If, however, you mean that you begin your code by importing some non-dta data source into Stata, which happens to come into Stata chronologically ordered within stock_ids, then Stata does not know that the data are already sorted on stock_id, and if you do not issue a -sort- command, then as far as Stata knows, the data are unsorted. When you then -bysort stock_id-, Stata does not recognize that the data are already sorted on stock_id (even though, in fact, they are) and it will run the -sort- command, which may shuffle the order of the data within stock_ids.

                Here is the relevant quote from the help file of the -sort- command (which is called by bysort):
                Option

                stable specifies that observations with the same values of the variables in varlist keep the same relative order in the sorted data that they had previously. For
                instance, consider the following data:

                x b
                3 1
                1 2
                1 1
                1 3
                2 4

                Typing sort x without the stable option produces one of the following six orderings:

                x b | x b | x b | x b | x b | x b
                1 2 | 1 2 | 1 1 | 1 1 | 1 3 | 1 3
                1 1 | 1 3 | 1 3 | 1 2 | 1 1 | 1 2
                1 3 | 1 1 | 1 2 | 1 3 | 1 2 | 1 1
                2 4 | 2 4 | 2 4 | 2 4 | 2 4 | 2 4
                3 1 | 3 1 | 3 1 | 3 1 | 3 1 | 3 1

                Without the stable option, the ordering of observations with equal values of varlist is randomized. With sort x, stable, you will always get the first ordering
                and never the other five.

                If your intent is to have the observations sorted first on x and then on b within tied values of x (the fourth ordering above), you should type sort x b rather
                than sort x, stable.

                stable is seldom used, and, when specified, causes sort to execute more slowly.

                Comment


                • #9
                  Well, I understand your explanations. For return calculations, I used the code with stock_id followed by date as sort variable. However, i get similar results if i do not use date as within stock_id sort condition. I am in a flux and wonder why it doesn't lead to any difference. Just need to be clear about it. I append the data file and the code so that you can also verify and get the insight why different conditions do not lead to different results.

                  Code:
                  * Example generated by -dataex-. For more info, type help dataex
                  clear
                  input int stock_id str10 month str56 stock float(pr mk rf smb hml)
                  1 "29-05-2001" "3M India Ltd." 430.05 1172.16 .000207292   -.0054214   .011421468
                  1 "30-05-2001" "3M India Ltd."    428 1146.01  .00019764  .011475002   .014434357
                  1 "31-05-2001" "3M India Ltd."  424.9 1137.03  .00019764  -.00026487   .016957238
                  1 "01-06-2001" "3M India Ltd."    422 1121.17  .00019764  -.00808847   .015388196
                  1 "04-06-2001" "3M India Ltd."    420 1096.77  .00019764  .006565923   .003483099
                  1 "05-06-2001" "3M India Ltd."    415 1085.55  .00019764 -.008254495   .010829374
                  1 "06-06-2001" "3M India Ltd."    414 1091.81 .000200859 -.004299154  -.002903867
                  1 "07-06-2001" "3M India Ltd." 399.35 1088.13 .000200859 -.010484487   .003137065
                  1 "08-06-2001" "3M India Ltd."    395 1101.86 .000200859 -.003384564  -.009378028
                  1 "11-06-2001" "3M India Ltd."  396.8 1105.58 .000200859 -.005761349   .005517818
                  1 "12-06-2001" "3M India Ltd."    370 1105.33 .000200859   .00774814  -.008734042
                  1 "13-06-2001" "3M India Ltd."  393.1 1110.46 .000194416  .001998395   .015888773
                  1 "14-06-2001" "3M India Ltd."  385.1  1094.7 .000194416  .001650696    .01361049
                  1 "15-06-2001" "3M India Ltd."  385.2 1068.32 .000194416  -.00225018    .02104307
                  1 "18-06-2001" "3M India Ltd."    380 1061.35 .000194416 -.005109954   .005487084
                  1 "19-06-2001" "3M India Ltd." 371.75 1075.31 .000194416   -.0124533   .004721273
                  1 "20-06-2001" "3M India Ltd." 383.75 1073.99 .000191194  .001233963   .010340611
                  1 "21-06-2001" "3M India Ltd." 386.75 1067.85 .000191194  .007179361  -.018780584
                  1 "22-06-2001" "3M India Ltd." 374.25 1048.61 .000191194  .010892835  -.002877336
                  1 "25-06-2001" "3M India Ltd."      . 1017.11 .000191194  .007288968   .028876463
                  1 "26-06-2001" "3M India Ltd."    373 1043.58 .000191194  -.01553188  -.026826525
                  1 "27-06-2001" "3M India Ltd."    373 1043.73 .000190118 -.005021413  -.003626265
                  1 "28-06-2001" "3M India Ltd."    380 1040.34 .000190118  .006743999  -.005887604
                  1 "29-06-2001" "3M India Ltd."  380.1 1050.43 .000190118  .006906632   -.02635773
                  1 "02-07-2001" "3M India Ltd."    380  1044.7 .000190118    .0243053   .012395788
                  1 "03-07-2001" "3M India Ltd."    382 1016.12 .000190118  .009193858   .011030175
                  1 "04-07-2001" "3M India Ltd."    373 1012.19 .000189043 -.003779778   .002102887
                  1 "05-07-2001" "3M India Ltd."    370 1016.32 .000189043 -.002112604  -.009983404
                  1 "06-07-2001" "3M India Ltd."    363 1008.44 .000189043 -.004746309   .004176213
                  1 "09-07-2001" "3M India Ltd."  370.1 1005.61 .000189043  .004967289  -.000901557
                  1 "10-07-2001" "3M India Ltd."    365 1015.86 .000189043 -.004156345  -.005973499
                  1 "11-07-2001" "3M India Ltd."    365  1023.1 .000189043 -.009570085 -.0000385042
                  1 "12-07-2001" "3M India Ltd."      . 1045.89 .000189043 -.008317466  -.014657998
                  1 "13-07-2001" "3M India Ltd."  379.8 1047.82 .000189043 -.009258101   .003756839
                  1 "16-07-2001" "3M India Ltd."  374.5 1043.17 .000189043  .008141486   .001153436
                  1 "17-07-2001" "3M India Ltd."    384 1038.48 .000189043  .002734654  -.006272981
                  1 "18-07-2001" "3M India Ltd."      . 1035.52 .000191194  .002666798  -.007612133
                  1 "19-07-2001" "3M India Ltd."    380 1029.28 .000191194 -.010764934   .002164563
                  1 "20-07-2001" "3M India Ltd."    367 1021.48 .000191194 -.003531434   .006274182
                  1 "23-07-2001" "3M India Ltd."      . 1017.91 .000191194  .003957119    .00298848
                  1 "24-07-2001" "3M India Ltd."    364 1007.34 .000191194  .003522038  -.002557836
                  1 "25-07-2001" "3M India Ltd."  362.1  992.87 .000190118 -.005112115  -.003178912
                  1 "26-07-2001" "3M India Ltd."    345  983.87 .000190118  .008406641  -.007936762
                  1 "27-07-2001" "3M India Ltd."    335  985.08 .000190118  .007523834  -.007662709
                  1 "30-07-2001" "3M India Ltd."      .  997.06 .000190118 -.001052119   .003796941
                  1 "31-07-2001" "3M India Ltd."  325.1 1007.38 .000190118 -.006281636  -.006013084
                  1 "01-08-2001" "3M India Ltd." 321.35 1002.07 .000187969 -.003827866   .007759471
                  1 "02-08-2001" "3M India Ltd."    318 1007.21 .000187969 -.002262957  -.006467539
                  1 "03-08-2001" "3M India Ltd."    323 1016.31 .000187969  .002335253  -.005333087
                  1 "06-08-2001" "3M India Ltd."  332.5 1016.84 .000187969  .013860058  -.002403292
                  1 "07-08-2001" "3M India Ltd."    332  1009.8 .000187969 .0000310135   .008616665
                  1 "08-08-2001" "3M India Ltd."    330 1005.47 .000186894  .002830373   .006939544
                  1 "09-08-2001" "3M India Ltd."    330 1005.16 .000186894  .000680501 -.0000935619
                  1 "10-08-2001" "3M India Ltd."  325.9 1001.29 .000186894 -.003242821   .004484232
                  1 "13-08-2001" "3M India Ltd."    330  991.02 .000186894  .016223654  -.003393324
                  1 "14-08-2001" "3M India Ltd."    325 1004.51 .000184742 -.009961346  -.007970894
                  1 "16-08-2001" "3M India Ltd."    320 1006.83 .000184742 -.001116778    .00184716
                  1 "17-08-2001" "3M India Ltd."    320  998.73 .000184742  .003973659     .0040762
                  1 "20-08-2001" "3M India Ltd."  301.3  993.36 .000184742    .0070569  -.002350738
                  1 "21-08-2001" "3M India Ltd." 309.55  998.08 .000184742  .003531019  -.011190037
                  1 "23-08-2001" "3M India Ltd."    305  998.48 .000184742 -.004440534   .004637646
                  1 "24-08-2001" "3M India Ltd."  302.5 1001.45 .000184742  .002634362  -.011495728
                  1 "27-08-2001" "3M India Ltd."    315 1005.27 .000184742  .017009705  -.007403183
                  1 "28-08-2001" "3M India Ltd."    315 1003.13 .000184742 -.002180334   .002409128
                  1 "29-08-2001" "3M India Ltd."      . 1001.18 .000183666  .006914516   .001008481
                  1 "30-08-2001" "3M India Ltd."    323  999.71 .000183666  .007821609  -.001851162
                  1 "31-08-2001" "3M India Ltd."      .  986.25 .000183666  .005843469   .008782397
                  1 "03-09-2001" "3M India Ltd." 311.05  976.87 .000183666  .018127613  -.008090637
                  1 "04-09-2001" "3M India Ltd."    324  980.81 .000183666  .000421051  -.007871736
                  1 "05-09-2001" "3M India Ltd."    321  976.21 .000183666  .001085125    .00383226
                  1 "06-09-2001" "3M India Ltd."  316.9  967.46 .000183666  .001295311  -.002122955
                  1 "07-09-2001" "3M India Ltd."    318  968.13 .000183666 -.005481656  -.000656931
                  1 "10-09-2001" "3M India Ltd."    317  965.93 .000183666  .008949677   .004532822
                  1 "11-09-2001" "3M India Ltd."    310  959.32 .000183666  .004254708   .002196711
                  1 "12-09-2001" "3M India Ltd." 303.85  924.96 .000187969  -.00078408   .005701145
                  1 "13-09-2001" "3M India Ltd."  305.4  915.84 .000187969 -.002050521   .014715027
                  1 "14-09-2001" "3M India Ltd."    290  861.02 .000187969  .006616689   .013136523
                  1 "17-09-2001" "3M India Ltd."  257.3  804.92 .000187969  .006880768   .001179116
                  1 "18-09-2001" "3M India Ltd."    280   836.2 .000187969 -.008791488   .004700812
                  1 "19-09-2001" "3M India Ltd."    290  840.08 .000193344  .007614173   .017786305
                  1 "20-09-2001" "3M India Ltd." 278.95  826.37 .000193344 -.004123228   .008266608
                  1 "21-09-2001" "3M India Ltd."    272  792.18 .000193344   .01756763    .02199402
                  1 "24-09-2001" "3M India Ltd."    271  815.18 .000193344 -.006995669  -.017225754
                  1 "25-09-2001" "3M India Ltd."    270  807.94 .000193344  .004815123   .009335892
                  1 "26-09-2001" "3M India Ltd."  261.5  821.67 .000187969 -.008984162  -.012087896
                  1 "27-09-2001" "3M India Ltd."    260  825.81 .000187969 -.004142652   .006669711
                  1 "28-09-2001" "3M India Ltd."    266  850.56 .000187969 -.017421316   .000376217
                  1 "01-10-2001" "3M India Ltd."  276.5   844.2 .000187969  .005426327   .001641476
                  1 "03-10-2001" "3M India Ltd."    270  832.46 .000184742  .003100564   .014084384
                  1 "04-10-2001" "3M India Ltd."  263.8  842.15 .000184742  -.00491618  -.009198776
                  1 "05-10-2001" "3M India Ltd."      .  844.87 .000184742  .002996849   -.00724678
                  1 "08-10-2001" "3M India Ltd."  273.5  831.16 .000184742  .003219688   .010580573
                  1 "09-10-2001" "3M India Ltd."    270  842.27 .000184742 -.012359478   .002310888
                  1 "10-10-2001" "3M India Ltd."    278  861.97 .000185818  -.00611104  -.014635766
                  1 "11-10-2001" "3M India Ltd."    270  875.14 .000185818 -.008688306  -.003316985
                  1 "12-10-2001" "3M India Ltd."  265.5  878.54 .000185818  .000993069  -.005601964
                  1 "15-10-2001" "3M India Ltd."  262.2  880.39 .000185818   .01196225   .001652932
                  1 "16-10-2001" "3M India Ltd." 259.85  891.59 .000185818  .001527667  -.007085373
                  1 "17-10-2001" "3M India Ltd."    260  906.58 .000187969 -.003437033  -.001419346
                  1 "18-10-2001" "3M India Ltd."  261.9  890.81 .000187969  .001909316    .01943403
                  end

                  Code:
                  * Date formatting and returns calculation
                  gen date1=date(month,"DMY")
                  format date1 %td
                  drop month
                  rename date1 date
                  gen mdate= mofd(date)
                  format mdate %tm
                  bysort stock_id (date):gen rt_r =((pr[_n]-pr[_n-1])/pr[_n-1])
                  bysort stock_id (date):gen mkt_r =((mk[_n]-mk[_n-1])/mk[_n-1])
                  gen rt = rt_r-rf
                  gen mkt = mkt_r-rf
                  drop if mdate<ym(2001,6)
                  gen month = month(date)
                  gen year = year(date)
                  drop pr mk rf rt_r mkt_r
                  * Idiosyncratic Volatility estimation
                  capture program drop one_stock_month
                  program define one_stock_month
                      capture regress rt mkt smb hml
                      if c(rc) == 0 {
                          predict resid, resid
                          summ resid
                          if `r(N)' >= 17 {
                              gen idio_vol = r(sd)
                              gen n_obs = e(N)
                          }
                      }
                      else if !inlist(c(rc), 2000, 2001) { // ERROR OTHER THAN INSUFFICIENT OBSERVATIONS
                          gen unexpected_error_code = c(rc)
                          assert 0
                      }
                      exit
                  end
                  
                  //gen mdate = ym(year, month)
                  assert missing(mdate) == missing(month, year)
                  format mdate %tm
                  runby one_stock_month, by(stock mdate)
                  gen idiovol_m = idio_vol * sqrt(n_obs)
                  collapse(mean) idio_vol idiovol_m n_obs, by (stock mdate year month)
                  Last edited by Sartaj Hussain; 20 Nov 2021, 20:33.

                  Comment


                  • #10
                    Well, the code you show does sort on date, so it is fine.

                    When I rerun the code removing the (date) from the -bysort- commands, so that we are only sorting on stock_id, I do find that the results are the same. This surprises me. I also tried running with a separate -sort stock_id- command followed by -by stock_id- (not -bysort-) on the theory that perhaps -bysort- has its own, stable, sort that would preserve the initial chronological order of the data (even though it is not Stata-sorted on that). But even doing it that way leads to the same results. Interestingly, when I query Stata about what the sort order of the data is at the outset, and just before your first -bysort- command, it tells me that the data is unsorted. And when I remove (date) from the -bysort- command and ask Stata what the sort order of the data is after -bysort-, it tells me it is sorted on stock_id and makes no mention of any other variable. So Stata does not think the data is sorted on date, even though apparently it is in chronological order within stock_id.

                    I have to say I don't understand why this is happening. I have, in the past, personally been bitten by the bug that results from failing to specify secondary sort variables and getting wrong and irreproducible results from random shuffling within the groups. Perhaps Stata has changed its sort program so that it defaults to stable sorting--but I would think they would have changed their documentation accordingly.

                    All I can tell you is that either by luck relating to some aspect of the particular data in question, or due to some undocumented change in Stata, your code works with this data as intended even if you omit the (date) from the -bysort- commands. But I would still advise you, going forward, not to do that. Because this favorable behavior is not documented, and, in fact, contradicts the documentation, you cannot rely on it. Even if it is some change that StataCorp is considering implementing and eventually making "official," there is no guarantee they won't change their minds and revert to the older, documented behavior. If that happens, your previous code relying on stability of the sort order would be broken and re-running the code could yield different, incorrect results.

                    tl;dr You've been lucky so far. I don't know why. But don't press your luck in the future--it may not hold up.

                    Comment


                    • #11
                      Well, I am really pleased to go through your elaborate explanation. In future, i will surely take care of it. But since, it doesn't seem to make difference as of now which is also pointed out by you as well. I wanted to ask you that should i retain the previous calculations done following the code without this within stock_id: date sort condition? Or re-do the calculations after inclusion of this condition as well. This will then consume some more days for me.

                      Comment


                      • #12
                        I would redo them. Now, if there were some that represent discarded work that you no longer consider of any importance, it probably isn't worth the trouble of fixing them up for archival purposes. Purpose just adding a comment in the file that there is this problem in the event the code ever needs to be resurrected. But I would redo the runs for anything where correctness of the results still matters.

                        Comment


                        • #13
                          Thanks. I appreciate your response. Will follow in letter and spirit. Well, i have used a few more codes that i want to share in a new thread for your green signal. This is important so that no such back turn is needed in my work again.

                          Comment


                          • #14
                            In thread #2, we used mdate, however, these are daily returns, i feel it should have been date (daily date) instead.

                            Comment


                            • #15
                              Yes, for daily returns it is -by stock_id (date), sort: gen rt = (pr-pr[_n-1])/pr[_n-1]-.

                              Comment

                              Working...
                              X