Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Rolling mean with condition

    I want to find rolling (24-60 months, as available) mean of the variable rateofreturn when it falls below the variable pct5. To be precise, the new variable, say rmean2, will have its first value in row 24; the value will be the mean of those observations(row1-24) of rateofreturn which fall below the value of pct5 in row 24. Similarly, in row 25 the variable rmean2 will contain the mean of rateofreturn (row1-25) which fall below the value of pct5 in row 25. The number of observations will continue to increase until it reaches 60.

    Based on answers of my previous post, which was similar in nature, I, as a neophyte, tried the following the code.
    Code:
    program below_pct5
        sum rateofreturn if rateofreturn < rpct5
        gen rmean2 = r(mean)
    end
    rangerun below_pct5, by(id) interval(mdate -59 0) use(rateofreturn pct5) sprefix(rr_)
    Unofortuantely, the above code didn't work.

    Thank you for your attention.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(id mdate) double(rateofreturn n1) float pct5
    1 420  .48  1     .
    1 421  .89  2     .
    1 422 1.18  3     .
    1 423    1  4     .
    1 424  1.5  5     .
    1 425  1.6  6     .
    1 426  1.3  7     .
    1 427 1.01  8     .
    1 428 1.44  9     .
    1 429 1.55 10     .
    1 430  1.2 11     .
    1 431 1.51 12     .
    1 432  .97 13     .
    1 433 1.21 14     .
    1 434  .42 15     .
    1 435  .82 16     .
    1 436 1.14 17     .
    1 437 1.09 18     .
    1 438  .18 19     .
    1 439  .62 20     .
    1 440 1.24 21     .
    1 441 1.18 22     .
    1 442  .85 23     .
    1 443    1 24   .42
    1 444   .7 25   .42
    1 445  .84 26   .42
    1 446  .66 27   .42
    1 447  .82 28   .42
    1 448  .71 29   .42
    1 449 1.38 30   .42
    1 450 2.28 31   .42
    1 451  .86 32   .42
    1 452  .66 33   .42
    1 453  .96 34   .42
    1 454  .75 35   .42
    1 455  .76 36   .42
    1 456  .77 37   .42
    1 457  .78 38   .42
    1 458 1.03 39   .42
    1 459  .39 40  .405
    1 460  .43 41   .42
    1 461  .34 42   .39
    1 462  .58 43   .39
    1 463 -.43 44   .34
    1 464 -.55 45   .18
    1 465 -.23 46  -.23
    1 466 1.27 47  -.23
    1 467 1.05 48  -.23
    1 468 1.38 49  -.23
    1 469  .75 50  -.23
    1 470 1.53 51  -.23
    1 471 1.61 52  -.23
    1 472  .51 53  -.23
    1 473  .53 54  -.23
    1 474  .71 55  -.23
    1 475  .14 56  -.23
    1 476   .4 57  -.23
    1 477  .45 58  -.23
    1 478  .89 59  -.23
    1 479 1.53 60 -.045
    1 480   .9 60 -.045
    1 481 1.98 60 -.045
    1 482  .21 60 -.045
    1 483 -.09 60  -.16
    1 484   .1 60  -.16
    1 485 1.01 60  -.16
    1 486  .67 60  -.16
    1 487  .48 60  -.16
    1 488    1 60  -.16
    1 489  .28 60  -.16
    1 490  .05 60  -.16
    1 491   .5 60  -.16
    1 492 1.94 60  -.16
    1 493 1.44 60  -.16
    1 494 1.64 60  -.16
    1 495 1.36 60  -.16
    1 496 -.31 60  -.27
    1 497 -.07 60  -.27
    1 498  .85 60  -.27
    1 499 1.53 60  -.27
    1 500   .4 60  -.27
    1 501  .92 60  -.27
    1 502  .58 60  -.27
    1 503  .21 60  -.27
    1 504  .91 60  -.27
    1 505 -.31 60  -.31
    1 506  .56 60  -.31
    1 507  .75 60  -.31
    1 508 -.13 60  -.31
    1 509  .32 60  -.31
    1 510 -1.6 60  -.37
    1 511 -.77 60  -.49
    1 512 1.57 60  -.49
    1 513  .77 60  -.49
    1 514    2 60  -.49
    1 515 2.02 60  -.49
    1 516 3.45 60  -.49
    1 517 1.02 60  -.49
    1 518 1.09 60  -.49
    1 519 1.22 60  -.49
    end
    format %tm mdate

  • #2
    Your program below_pct5 tries to summarize those values of rateofreturn which are < rpct5. But there is no variable rpct5 in the data. So the program just generates an error message (which is not shown because -rangerun- suppresses output of the program it runs) and that's the end of it. I think you mean pct5, not rpct5. If you make that change to the program, then it runs.

    When working with -rangerun-, if you get no results, or strange results, the first step is to add the -verbose- option to your -rangerun- command. That will keep Stata from suppressing the output of the program: you will see any error messages that are generated. In this case, you would see a long series of "rpct5 not found" error messages and you would have immediately known how to fix this problem. So the first step in troubleshooting problems with -rangerun- is to use the -verbose- option.

    Comment


    • #3
      Clyde Schechter thank you for your response. The code runs after writing pct5. Could you please confirm me if the code is correct in terms of my requirement? Thanks.

      Comment


      • #4
        Yes, it appears to be suitable for your stated purpose.

        By the way, one tip about posting. It's not a good idea to ask for help by saying that some code "didn't work." In this case, it was pretty obvious what was wrong, but often it is not. And without knowing in which of the many ways something could "not work" the code actually failed, it becomes difficult or impossible to troubleshoot. So if you have code that "doesn't work," always show the actual results you got from it (including any error messages) and, then, if it isn't obvious, indicate what you were hoping to get. In general, just saying "didn't work" will get you the only reasonable response: a reply asking "in what way did it not work."

        Comment


        • #5
          Thanks for the tip Clyde Schechter . I am afraid I am not getting the desired result. For example, in row 24 the value of pct5 is .42 and rmean2 is 1.06. This is strange. I want mean of those observations of rateofreturn which fall below pct5. So, rmean2 should be smaller than pct5. In fact, in row 1-24, there is only one observation (.18) which is less than .42. Please have a look. Thanks.
          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input float(id mdate) double(rateofreturn n1) float(pct5 rmean2)
          1 420  .48  1     .        .48
          1 421  .89  2     .       .685
          1 422 1.18  3     .        .85
          1 423    1  4     .      .8875
          1 424  1.5  5     .       1.01
          1 425  1.6  6     .  1.1083333
          1 426  1.3  7     .  1.1357143
          1 427 1.01  8     .       1.12
          1 428 1.44  9     .  1.1555556
          1 429 1.55 10     .      1.195
          1 430  1.2 11     .  1.1954546
          1 431 1.51 12     .  1.2216667
          1 432  .97 13     .  1.2023077
          1 433 1.21 14     .   1.202857
          1 434  .42 15     .  1.1506667
          1 435  .82 16     .       1.13
          1 436 1.14 17     .  1.1305883
          1 437 1.09 18     .  1.1283333
          1 438  .18 19     .   1.078421
          1 439  .62 20     .     1.0555
          1 440 1.24 21     .  1.0642858
          1 441 1.18 22     .  1.0695455
          1 442  .85 23     .       1.06
          1 443    1 24   .42       1.06
          1 444   .7 25   .42       1.06
          1 445  .84 26   .42       1.06
          1 446  .66 27   .42       1.06
          1 447  .82 28   .42       1.06
          1 448  .71 29   .42       1.06
          1 449 1.38 30   .42       1.06
          1 450 2.28 31   .42       1.06
          1 451  .86 32   .42       1.06
          1 452  .66 33   .42       1.06
          1 453  .96 34   .42       1.06
          1 454  .75 35   .42       1.06
          1 455  .76 36   .42       1.06
          1 456  .77 37   .42       1.06
          1 457  .78 38   .42       1.06
          1 458 1.03 39   .42       1.06
          1 459  .39 40  .405  1.0320834
          1 460  .43 41   .42  1.0320834
          1 461  .34 42   .39     1.0044
          1 462  .58 43   .39     1.0044
          1 463 -.43 44   .34   .9492308
          1 464 -.55 45   .18   .8937037
          1 465 -.23 46  -.23   .8937037
          1 466 1.27 47  -.23   .8937037
          1 467 1.05 48  -.23   .8937037
          1 468 1.38 49  -.23   .8937037
          1 469  .75 50  -.23   .8937037
          1 470 1.53 51  -.23   .8937037
          1 471 1.61 52  -.23   .8937037
          1 472  .51 53  -.23   .8937037
          1 473  .53 54  -.23   .8937037
          1 474  .71 55  -.23   .8937037
          1 475  .14 56  -.23   .8937037
          1 476   .4 57  -.23   .8937037
          1 477  .45 58  -.23   .8937037
          1 478  .89 59  -.23   .8937037
          1 479 1.53 60 -.045   .8937037
          1 480   .9 60 -.045   .9096154
          1 481 1.98 60 -.045      .9104
          1 482  .21 60 -.045   .8991666
          1 483 -.09 60  -.16   .8947826
          1 484   .1 60  -.16   .8672727
          1 485 1.01 60  -.16    .832381
          1 486  .67 60  -.16       .809
          1 487  .48 60  -.16    .798421
          1 488    1 60  -.16   .7627778
          1 489  .28 60  -.16   .7164706
          1 490  .05 60  -.16     .68625
          1 491   .5 60  -.16   .6313334
          1 492 1.94 60  -.16   .6071429
          1 493 1.44 60  -.16  .56076926
          1 494 1.64 60  -.16      .5725
          1 495 1.36 60  -.16        .55
          1 496 -.31 60  -.27   .4181818
          1 497 -.07 60  -.27       .351
          1 498  .85 60  -.27        .37
          1 499 1.53 60  -.27     .33875
          1 500   .4 60  -.27        .21
          1 501  .92 60  -.27  .04833333
          1 502  .58 60  -.27      -.112
          1 503  .21 60  -.27      -.112
          1 504  .91 60  -.27      -.112
          1 505 -.31 60  -.31      -.112
          1 506  .56 60  -.31      -.112
          1 507  .75 60  -.31      -.112
          1 508 -.13 60  -.31      -.112
          1 509  .32 60  -.31      -.112
          1 510 -1.6 60  -.37       -.36
          1 511 -.77 60  -.49  -.4185714
          1 512 1.57 60  -.49  -.4185714
          1 513  .77 60  -.49  -.4185714
          1 514    2 60  -.49  -.4185714
          1 515 2.02 60  -.49  -.4185714
          1 516 3.45 60  -.49  -.4185714
          1 517 1.02 60  -.49  -.4185714
          1 518 1.09 60  -.49  -.4185714
          1 519 1.22 60  -.49 -.55333334
          end
          format %tm mdate

          Comment


          • #6
            Ah, your explanation is much clearer this time. The program in #1, as corrected, calculates the mean of those observations within the -59 to 0 window where rateofreturn is less than the value of pct5 in its own observation. It is now clear that you want to calculate the mean of those observations within that window where rateofreturn is less than the value of pct5 in the last observation of the window. The fix is easy:

            Code:
            capture program drop below_pct5
            program below_pct5
                sum rateofreturn if rateofreturn < pct5[_N]
                gen rmean2 = r(mean)
            end
            (Change italicized.)
            ​​​​​​​

            Comment


            • #7
              Thanks a lot Clyde Schechter . It works perfectly.

              Comment


              • #8
                I haven't read this thread carefully but if the intent is to use the value of pct5 for the current observation, you can use a scalar to pass along the value to the program:

                Code:
                capture program drop below_pct5
                program below_pct5
                    sum rateofreturn if rateofreturn < rr_pct5
                    gen rmean = r(mean)
                end
                
                rangerun below_pct5, by(id) interval(mdate -59 0) use(rateofreturn pct5) sprefix(rr_)

                Comment


                • #9
                  Robert Picard raises a good point. In this particular application, it makes no difference, because the value pct5 that serves as the cutoff point is from an observation that is included within the window, and, in fact, is predictably the last such observation in the window. But if the window had been defined as ending in the month before the current observation (-interval(mdate -60 -1)-) then the relevant value of pct5 would not even be in the data when program below_pct5 runs. In that situation, using scalar rr_pct5 would be essential.

                  Comment


                  • #10
                    Clyde Schechter and Robert Picard I need your help again. I tried your codes using another data, and got strange results. Here is the problem-- In row 24 the value of pct5 is -7.573072. Between row 1 and 24, there is only one value of rateofreturn ,-8.166529225, which is less than -7.573072. Since the goal is to find mean of those rateofreturns which are less than pct5, in row 24 the variable rmean(or rmean2) should have the value ,-8.166529225. However, the following codes give -7.869801. Please advise. Thanks.

                    Code:
                    *----------------------
                    *Clyde Schechter
                    *----------------------
                    capture program drop below_pct5
                    program below_pct5
                        sum rateofreturn if rateofreturn < pct5[_N]
                        gen rmean2 = r(mean)
                    end
                    rangerun below_pct5, by(id) interval(mdate -59 0) use(rateofreturn pct5) sprefix(rr_)
                    *----------------------
                    * Robert Picard
                    *----------------------
                    capture program drop below_pct5
                    program below_pct5
                        sum rateofreturn if rateofreturn < rr_pct5
                        gen rmean = r(mean)
                    end
                    rangerun below_pct5, by(id) interval(mdate -59 0) use(rateofreturn pct5) sprefix(rr_)
                    ​​​​​​​

                    Code:
                    * Example generated by -dataex-. To install: ssc install dataex
                    clear
                    input double rateofreturn float(mdate id pct5 rmean2 rmean)
                       10.86090420925 552 1          .  10.860904  10.860904
                            1.2890625 553 1          .   6.074983   6.074983
                     3.15850366370999 554 1          .   5.102823   5.102823
                     7.56663800515908 555 1          .   5.718777   5.718777
                    -7.57307197720085 556 1          .   3.060407   3.060407
                    -2.60209069714974 557 1          .  2.1166577  2.1166577
                    -.405374102385899 558 1          .  1.7563673  1.7563673
                     1.64747838895996 559 1          .  1.7427562  1.7427562
                     1.81526962092899 560 1          .  1.7508132  1.7508132
                     2.86163757584837 561 1          .  1.8618957  1.8618957
                      6.0847716845095 562 1          .  2.2457936  2.2457936
                     1.28033501527478 563 1          .  2.1653388  2.1653388
                      1.9216430556497 564 1          .  2.1465929  2.1465929
                     2.88630997905098 565 1          .  2.1994298  2.1994298
                     3.57454510196826 566 1          .   2.291104   2.291104
                     3.98789278247573 567 1          .  2.3971534  2.3971534
                     7.52588147036759 568 1          .   2.698843   2.698843
                      5.9135433817989 569 1          .   2.877438   2.877438
                     5.44108347386173 570 1          .  3.0123665  3.0123665
                    -8.16652922507934 571 1          .  2.4534216  2.4534216
                     6.10356744401208 572 1          .   2.627238   2.627238
                      5.8653057037341 573 1          .   2.774423   2.774423
                    -6.44395455316263 574 1          .   2.373624   2.373624
                    -.147595743028041 575 1  -7.573072  -7.869801  -7.869801
                    -11.6280275919299 576 1   -8.16653 -11.628028 -11.628028
                     5.12940900287576 577 1   -8.16653 -11.628028 -11.628028
                    -5.40110534248869 578 1   -8.16653 -11.628028 -11.628028
                     2.71753562892803 579 1   -8.16653 -11.628028 -11.628028
                     .856026657474439 580 1   -8.16653 -11.628028 -11.628028
                    -7.76416975220736 581 1   -8.16653 -11.628028 -11.628028
                    -5.23715415019762 582 1   -8.16653 -11.628028 -11.628028
                    -7.69681960375391 583 1   -8.16653 -11.628028 -11.628028
                    -13.9553766857304 584 1 -11.628028 -13.955377 -13.955377
                    -30.6732860132113 585 1 -13.955377  -22.31433  -22.31433
                    -35.8347351577565 586 1 -30.673286 -35.834736 -35.834736
                     8.01202122897884 587 1 -30.673286 -35.834736 -35.834736
                    -5.33980582524271 588 1 -30.673286 -35.834736 -35.834736
                    -4.49030644152596 589 1 -30.673286 -35.834736 -35.834736
                     5.97826086956522 590 1 -30.673286 -35.834736 -35.834736
                     12.6166203274637 591 1  -22.31433  -33.25401  -33.25401
                     21.5833653371372 592 1 -13.955377  -26.82113  -26.82113
                    -1.52971436307026 593 1 -13.955377  -26.82113  -26.82113
                     13.4222344423059 594 1 -13.955377  -26.82113  -26.82113
                      1.9029534160236 595 1 -13.955377  -26.82113  -26.82113
                     7.31107763063993 596 1 -13.955377  -26.82113  -26.82113
                     .232764353801816 597 1 -13.955377  -26.82113  -26.82113
                     4.74031479228868 598 1 -13.955377  -26.82113  -26.82113
                     2.78726024986803 599 1 -13.955377  -26.82113  -26.82113
                    -2.43777176704215 600 1 -13.955377  -26.82113  -26.82113
                     2.98648885769433 601 1 -13.955377  -26.82113  -26.82113
                     8.58038574252028 602 1 -13.955377  -26.82113  -26.82113
                      2.5012553351745 603 1 -13.955377  -26.82113  -26.82113
                    -8.65558311135606 604 1 -13.955377  -26.82113  -26.82113
                     2.53402158610981 605 1 -13.955377  -26.82113  -26.82113
                     6.58711997384767 606 1 -13.955377  -26.82113  -26.82113
                     2.12544088330011 607 1 -13.955377  -26.82113  -26.82113
                     8.25274791278754 608 1 -13.955377  -26.82113  -26.82113
                     4.54696776341343 609 1 -13.955377  -26.82113  -26.82113
                    -2.53947193843706 610 1 -13.955377  -26.82113  -26.82113
                     4.48703986059682 611 1 -12.791702  -26.82113  -26.82113
                     -1.8501146549927 612 1 -12.791702  -26.82113  -26.82113
                    -2.21419848138906 613 1 -12.791702  -26.82113  -26.82113
                     4.17300173761946 614 1 -12.791702  -26.82113  -26.82113
                      4.6860746957179 615 1 -12.791702  -26.82113  -26.82113
                    -1.69791122065377 616 1 -12.791702  -26.82113  -26.82113
                    -.529314929720155 617 1 -12.791702  -26.82113  -26.82113
                     3.58488644464813 618 1 -12.791702  -26.82113  -26.82113
                     -9.9695211876905 619 1 -12.791702  -26.82113  -26.82113
                    -14.4561537621492 620 1 -14.205765  -26.98806  -26.98806
                     8.25327929020522 621 1 -14.205765  -26.98806  -26.98806
                    -4.83799640319586 622 1 -14.205765  -26.98806  -26.98806
                     .229258318359257 623 1 -14.205765  -26.98806  -26.98806
                     8.57752225519288 624 1 -14.205765  -26.98806  -26.98806
                     6.36262703902981 625 1 -14.205765  -26.98806  -26.98806
                    -.960869332476856 626 1 -14.205765  -26.98806  -26.98806
                     1.31340702105235 627 1 -14.205765  -26.98806  -26.98806
                    -10.4777401371069 628 1 -14.205765  -26.98806  -26.98806
                      1.8950567623134 629 1 -14.205765  -26.98806  -26.98806
                     1.34514723513758 630 1 -14.205765  -26.98806  -26.98806
                     1.38788700695386 631 1 -14.205765  -26.98806  -26.98806
                     5.04866526267858 632 1 -14.205765  -26.98806  -26.98806
                     1.37353706111833 633 1 -14.205765  -26.98806  -26.98806
                     2.49873059141079 634 1 -14.205765  -26.98806  -26.98806
                     3.55895082651092 635 1 -14.205765  -26.98806  -26.98806
                     6.51325562073567 636 1 -14.205765  -26.98806  -26.98806
                     4.69436959296554 637 1 -14.205765  -26.98806  -26.98806
                      1.4110900388332 638 1 -14.205765  -26.98806  -26.98806
                     1.94581116281141 639 1 -14.205765  -26.98806  -26.98806
                     4.02262453320521 640 1 -14.205765  -26.98806  -26.98806
                    -8.13301702601138 641 1 -14.205765  -26.98806  -26.98806
                                 2.12 621 2          .       2.12       2.12
                    -6.51194672933804 622 2          . -2.1959734 -2.1959734
                     1.87493453440872 623 2          .   -.839004   -.839004
                     6.31297552950853 624 2          .   .9489908   .9489908
                     2.91102514506769 625 2          .  1.3413976  1.3413976
                     .742411427497422 626 2          .  1.2415667  1.2415667
                     1.13805970149254 627 2          .    1.22678    1.22678
                    -5.55248109204943 628 2          .   .3793723   .3793723
                          1.435546875 629 2          .  .49672505  .49672505
                     4.78482718783094 630 2          .   .9255353   .9255353
                    end
                    format %tm mdate

                    Comment


                    • #11
                      Between row 1 and 24, there is only one value of rateofreturn ,-8.166529225, which is less than -7.573072.
                      Actually this is not true. Look at rateofreturn in observation 5. When displayed in the browser or listed, it appears to be equal to the value of pct5 being used as the threshold. But that is an illusion. rateofreturn is a double, whereas pct5 is a float. They are equal to the number of decimal places listed. But if you set the display format to %17.15f, you will see that they only agree to 7 decimal places. After that, they disagree, and rateofreturn[5] is actually smaller then pct5[24]. So the code is working correctly.

                      This is a precision problem. You need to rebuild your data set so that both rateofreturn and pct5 are floats or both are doubles. My guess is that the -7.5730719... value of pct5[24] was imported into your data from one source, and the value of rateofreturn variable was imported from another source, and the importation process treated them differently, so that they are rounded/truncated differently. Or perhaps they were calculated, and one was generated as a double and the other as a float. In any case comparisons of floating point numbers that are this close is always a treacherous undertaking and, at the least, requires great care.

                      Comment


                      • #12
                        Thanks for your feedback Clyde Schechter. Indeed, the pct5 was calculated; I used the following code:
                        Code:
                         
                        tsegen pct5 = rowpctile(L(0/59).rateofreturn,24),p(5)
                        Now, I need your suggestion. To overcome this precision problem, should I go for floats or doubles? Note that my other calculations, besides pct5 and rmean, are based on rateofreturn. In addition, I request you to kindly provide codes for double to float and float to double (I tried but failed; only managed to change the display format). Many thanks.

                        Comment


                        • #13
                          If you make it
                          Code:
                          tsegen  double pct5 = rowpctile(L(0/59).rateofreturn,24),p(5)
                          pct5 will be calculated as a double.

                          You can change existing -double- variables to -floats- by using the -recast- command: see the help file for details. You will lose precision if you do this. In particular, it is possible that -recast-ing the variable to float will not produce exactly the same result as you would have gotten from calculating the variable as a float in the first place. At each step in calculations there is the potential for some rounding or truncation to occur. Typically those differences arise in distant figures that are lost when the final result is stored as a float, but there is no guarantee of that.

                          While you can freely -recast- floats to doubles, that is accomplished by simply padding the low order digits with zeroes, so, for example -recast double pct5- will not fix the problem you encountered in that example, because the resulting pct5 double will just be extended with extra zeroes and will not change its value.

                          I think the safest way to handle this is to regenerate the data set calculating all of the variables as doubles in the first place. (i.e., use -gen double var = expression-, rather than -gen var = expression-, similarly for -egen-, -tsegen-.) This will give you your best shot at having things come out the way you expect them to. But at the end of the day, floating-point calculations are always treacherous, and code that relies on determining exact equality of two quantities, or distinguishing very, very small differences is always risky.


                          Comment


                          • #14
                            Thanks a lot Clyde Schechter! I tried the "double" code; it works perfectly.

                            Comment

                            Working...
                            X