Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating max with a rolling window

    Dear Statalist,

    I am trying to generate the maximum of the variable ret on a two year rolling window. So for year 1979, I would like to have 0.1135 (as this is the max of ret of year 1977&1978), for year 1980 I would like it to be 0.1135(as this is max of ret of year 1978&1979) and for year 1981 if would like it to be 0.1092(as this is the max of 1979&1980) ect..

    In the process of achieving this I started with forvalues, but this does not seem to work with year==`i'[_n+1], I get the error '1977 invalid name', so after some research I have now discovered tsegen and rangestat.

    I have tried the following codes that produced ret_max, max2 and max3:

    Code:
    rangestat (max) ret, interval(year 0 2)
    Code:
    tsegen max2 = rowmax(L(0/2).ret, 2)
    Code:
    tsegen max3 = rowmax(L(1/3).ret, 2)
    I understand that my code with forvalues is not correct at all, and I don't understand how I could make it work for multiple years when it doesn't work with [_n+1]. For forvalues I tried with this code:

    Code:
    gen maxret = .
    gen temp = 0
    forvalues i=1977/2016 {
    replace temp = 0
    replace temp = 1 if year==`i'
    replace temp = 1 if year==`i'[_n+1]
    egen max = max(ret) if temp==1
    replace maxret = max if year ==`i'
    drop max
    }
    I hope someone out there can help me, thank you all Statalist experts!


    My data:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double year float ret double ret_max float(max2 max3)
    1975            .                   .        .        .
    1976            .                   .        .        .
    1977       -.0082  .11349999904632568        .        .
    1978        .1135  .11349999904632568    .1135        .
    1979        .1092  .33730000257492065    .1135    .1135
    1980        .0624  .33730000257492065    .1135    .1135
    1981        .3373   1.242900013923645    .3373    .1135
    1982        .1269   1.242900013923645    .3373    .3373
    1983       1.2429   1.242900013923645   1.2429    .3373
    1984        .4863  .49630001187324524   1.2429   1.2429
    1985        .4963  .49630001187324524   1.2429   1.2429
    1986    .10867027   .3816371262073517    .4963   1.2429
    1987   .011902425  1.0858999490737915    .4963    .4963
    1988     .3816371  1.0858999490737915 .3816371    .4963
    1989       1.0859  1.0858999490737915   1.0859 .3816371
    1990        .0029 .019600000232458115   1.0859   1.0859
    1991  -.011679173  1.0448999404907227   1.0859   1.0859
    1992        .0196  1.0448999404907227    .0196   1.0859
    1993       1.0449  1.0448999404907227   1.0449    .0196
    1994        .1093  .48660001158714294   1.0449   1.0449
    1995        .1048  .48660001158714294   1.0449   1.0449
    1996        .4866  .48660001158714294    .4866   1.0449
    1997        .4115   .7723000049591065    .4866    .4866
    1998   -.02070992   .7723000049591065    .4866    .4866
    1999        .7723   .7723000049591065    .7723    .4866
    2000        .3607  .36070001125335693    .7723    .7723
    2001 -.0026761396   .6405687928199768    .7723    .7723
    2002  -.002508564   .6405687928199768    .3607    .7723
    2003     .6405688   .6405687928199768 .6405688    .3607
    2004        .3816   .5199000239372253 .6405688 .6405688
    2005        .5199   .5199000239372253 .6405688 .6405688
    2006        .2403   .2402999997138977    .5199 .6405688
    2007        .1723   .8766999840736389    .5199    .5199
    2008  -.020267397   .8766999840736389    .2403    .5199
    2009        .8767   .8766999840736389    .8767    .2403
    2010        .2325  .24420000612735748    .8767    .8767
    2011        .0616   .2897999882698059    .8767    .8767
    2012        .2442   .2897999882698059    .2442    .8767
    2013        .2898   .2897999882698059    .2898    .2442
    2014  .0036222886   .3154999911785126    .2898    .2898
    2015   .007412713   .3154999911785126    .2898    .2898
    2016        .3155   .3154999911785126    .3155    .2898
    end
    Last edited by Linn Hoil; 15 Jul 2017, 17:47.

  • #2
    You were close with your first attempt at -rangestat-. You just need to understand the -interval()- option. You want the window to extend from current year minus 1 to current year minus 2, inclusive. So the code is:

    Code:
    rangestat (max) max_ret_2_yrs = ret, interval(year -2 -1)

    Comment


    • #3
      Thank you, Clyde, that worked perfectly! You are right, its my understanding of rangestat that is a little off

      Comment


      • #4
        I have an additional question, would it be possible to extract the names with the max var, such that for the maxvar of year 1979 it would say port2 in portchoice1, and for 1980 it would say port2 in portchoice1 ect.?

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input double year str20 portchoice float ret double maxret2yr str7 portchoice1
        1975 ""                   .                    . ""
        1976 ""                   .                    . ""
        1977 " port1"        -.0082                    . ""
        1978 " port2"         .1135                    . ""
        1979 " port2"         .1092   .11349999904632568 ""
        1980 " port1"         .0624   .11349999904632568 ""
        1981 " port2"         .3373   .10920000076293945 ""
        1982 " port1"         .1269   .33730000257492065 ""
        1983 " port1"        1.2429   .33730000257492065 ""
        1984 " port1"         .4863    1.242900013923645 ""
        1985 " port1"         .4963    1.242900013923645 ""
        1986 " port5"     .10867027   .49630001187324524 ""
        1987 " port4"    .011902425   .49630001187324524 ""
        1988 " port5"      .3816371   .10867027193307877 ""
        1989 " port1"        1.0859    .3816371262073517 ""
        1990 " port2"         .0029   1.0858999490737915 ""
        1991 " port8"   -.011679173   1.0858999490737915 ""
        1992 " port1"         .0196  .002899999963119626 ""
        1993 " port2"        1.0449  .019600000232458115 ""
        1994 " port1"         .1093   1.0448999404907227 ""
        1995 " port2"         .1048   1.0448999404907227 ""
        1996 " port2"         .4866   .10930000245571136 ""
        1997 " port2"         .4115   .48660001158714294 ""
        1998 " port9"    -.02070992   .48660001158714294 ""
        1999 " port1"         .7723    .4115000069141388 ""
        2000 " port1"         .3607    .7723000049591065 ""
        2001 " port7"  -.0026761396    .7723000049591065 ""
        2002 " port7"   -.002508564   .36070001125335693 ""
        2003 " port5"      .6405688 -.002508563920855522 ""
        2004 " port2"         .3816    .6405687928199768 ""
        2005 " port1"         .5199    .6405687928199768 ""
        2006 " port2"         .2403    .5199000239372253 ""
        2007 " port1"         .1723    .5199000239372253 ""
        2008 " port14"  -.020267397    .2402999997138977 ""
        2009 " port2"         .8767   .17229999601840973 ""
        2010 " port1"         .2325    .8766999840736389 ""
        2011 " port2"         .0616    .8766999840736389 ""
        2012 " port1"         .2442   .23250000178813934 ""
        2013 " port2"         .2898   .24420000612735748 ""
        2014 " port6"   .0036222886    .2897999882698059 ""
        2015 " port8"    .007412713    .2897999882698059 ""
        2016 " port2"         .3155   .00741271348670125 ""
        end

        Comment


        • #5
          OK. This is actually trickier than it seemed at first glance. We actually have to go back and re-do the calculation of maxret2yr. Here's why. Your original data has ret as a float variable. -rangestat- does its calculations using doubles. In converting your values of ret to doubles, some of the values change slightly due to the inexactitude of binary representations with a finite number of digits. Consequently, if I simply go forward and ask -rangestat- to again go back and find matches to maxret2yr, some of them go unmatched. So we have to recast ret to double first and then calculate maxret2yr.

          Once we have that, there is another obstacle. portchoice is a string variable, and -rangestat- will not work with it. In fact, even in the pre-rangestat era, the various ways one would have gone about solving this problem all tend to choke on string variables. So I -encode- portchoice to create a labeled numeric variable that is easier to work with.

          Finally, your question is actually ill-posed. It is possible that the values of ret in the two year window will actually be tied, so that either of them could be considered "the one" that maximizes return. And if they are associated with different values of portchoice, there is no guidance as to which to choose. Admittedly this will occur with low probability, but not zero probability. To deal with this, the code below actually checks for the possibility and leaves a missing value for portchoice1 if that happens. (If the two values of ret are tied but also are associated with the same value of portchoice, then there is no problem and that value of portchoice is used.)

          Note that the final result, variable portchoice1, is a numeric variable with value labels. If you need it to be a string variable, use the -decode- command.

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input double year str20 portchoice float ret
          1975 ""                           .
          1976 ""                           .
          1977 " port1"  -.008200000040233135
          1978 " port2"    .11349999904632568
          1979 " port2"    .10920000076293945
          1980 " port1"    .06239999830722809
          1981 " port2"    .33730000257492065
          1982 " port1"    .12690000236034393
          1983 " port1"     1.242900013923645
          1984 " port1"      .486299991607666
          1985 " port1"    .49630001187324524
          1986 " port5"    .10867027193307877
          1987 " port4"   .011902425438165665
          1988 " port5"     .3816370964050293
          1989 " port1"    1.0858999490737915
          1990 " port2"   .002899999963119626
          1991 " port8"  -.011679173447191715
          1992 " port1"   .019600000232458115
          1993 " port2"    1.0448999404907227
          1994 " port1"    .10930000245571136
          1995 " port2"    .10480000078678131
          1996 " port2"    .48660001158714294
          1997 " port2"     .4115000069141388
          1998 " port9"  -.020709920674562454
          1999 " port1"     .7723000049591065
          2000 " port1"    .36070001125335693
          2001 " port7"  -.002676139585673809
          2002 " port7"  -.002508563920855522
          2003 " port5"     .6405687928199768
          2004 " port2"     .3815999925136566
          2005 " port1"     .5199000239372253
          2006 " port2"     .2402999997138977
          2007 " port1"    .17229999601840973
          2008 " port14" -.020267397165298462
          2009 " port2"     .8766999840736389
          2010 " port1"    .23250000178813934
          2011 " port2"    .06159999966621399
          2012 " port1"    .24420000612735748
          2013 " port2"     .2897999882698059
          2014 " port6"  .0036222885828465223
          2015 " port8"   .007412713021039963
          2016 " port2"     .3154999911785126
          end
          
          //    GET TWO YEAR WINDOW MAX RETURN
          recast double ret
          rangestat (max) maxret2yr = ret, interval(year -2 -1)
          
          encode portchoice, gen(n_portchoice)
          
          rangestat (min) candidate1 = n_portchoice (max) candidate2 = n_portchoice, ///
              interval(ret maxret2yr maxret2yr)
          gen portchoice1 = candidate1 if candidate1 == candidate2
          label values portchoice1 n_portchoice

          Comment


          • #6
            Again, thank you so much Clyde, this is something I would have never been able to figure out myself! I cannot express how much I appreciate all the help I can get in here!

            Comment


            • #7
              I agree with Clyde that this can lead to ties and this problem should be addressed. I also fear that Clyde's approach is vulnerable to precision issues and a fatal problem if there are duplicate values for maxret2yr. Here's another approach that uses rangerun (also from SSC). You also have the same issue with using a string variable with rangerun so you need to associate numeric values to each string values of portchoice.

              For each observation, rangerun will create a dataset with the observations that fall within the range specified in the interval(). All that your program needs to do is to order these observation by ret (and the numeric value of portchoice to break ties). Since numeric missing values are stored as numbers larger than any number that can be stored in a variable, you need to drop these if these are not to be considered a maximum value. The desired results are the ones in the last observation in memory and simply require variable renaming so that rangerun can return these results. Finally, you attach label values to the npc_max variable, based on the associations created when portchoice was encoded.

              Code:
              * associate numeric values to portchoice string values
              encode portchoice, gen(nportchoice) label(lportchoice)
              
              * define a program that sorts observations
              program doit
                  drop if mi(ret)
                  sort ret nportchoice
                  rename ret ret_max
                  rename nportchoice npc_max
              end
              
              rangerun doit, interval(year -2 -1) use(ret nportchoice)
              label values npc_max lportchoice
              Last edited by Robert Picard; 16 Jul 2017, 11:26.

              Comment


              • #8
                I also fear that Clyde's approach is vulnerable to precision issues and a fatal problem if there are duplicate values for maxret2yr.
                Robert Picard , could you elaborate?

                Comment


                • #9
                  My issue with precision is based on the observation that ret is stored as a float while rangestat returns maxret2yr in a double. Since there is no arithmetics here, just comparing values, I'll concede this is probably not a real issue once everything is upconverted to doubles in Mata (but it still triggers my spidey sense).

                  On the other hand, if there are two identical returns, your approach will fail because of how you disambiguate ties within each window. Here is an example:

                  Code:
                  clear all
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input double year str6 portchoice float ret
                  1975 ""            .
                  1976 ""            .
                  1977 " port1" -.0082
                  1978 " port2"  .1135
                  1979 " port3"  .1092
                  1980 " port4"  .0624
                  1981 " port5"  .1135
                  1982 " port6"  .1269
                  1983 " port7" 1.2429
                  1984 " port8"  .4863
                  end
                  
                  * rangerun solution
                  encode portchoice, gen(nportchoice) label(lportchoice)
                  program doit
                      drop if mi(ret)
                      sort ret nportchoice
                      rename ret ret_max
                      rename nportchoice npc_max
                  end
                  rangerun doit, interval(year -2 -1) use(ret nportchoice)
                  label values npc_max lportchoice
                  
                  * rangestat solution
                  recast double ret
                  rangestat (max) maxret2yr = ret, interval(year -2 -1)
                  encode portchoice, gen(n_portchoice)
                  rangestat (min) candidate1 = n_portchoice (max) candidate2 = n_portchoice, ///
                      interval(ret maxret2yr maxret2yr)
                  gen portchoice1 = candidate1 if candidate1 == candidate2
                  label values portchoice1 n_portchoice
                  
                  list, noobs
                  and the results:
                  Code:
                  . list, noobs
                  
                    +----------------------------------------------------------------------------------------------------------------+
                    | year   portch~e      ret   nportc~e   ret_max   npc_max   maxret~r   n_port~e   candid~1   candid~2   portch~1 |
                    |----------------------------------------------------------------------------------------------------------------|
                    | 1975                   .          .         .         .          .          .          .          .          . |
                    | 1976                   .          .         .         .          .          .          .          .          . |
                    | 1977      port1   -.0082      port1         .         .          .      port1          1          8          . |
                    | 1978      port2    .1135      port2    -.0082     port1     -.0082      port2          1          1      port1 |
                    | 1979      port3    .1092      port3     .1135     port2      .1135      port3          2          5          . |
                    |----------------------------------------------------------------------------------------------------------------|
                    | 1980      port4    .0624      port4     .1135     port2      .1135      port4          2          5          . |
                    | 1981      port5    .1135      port5     .1092     port3      .1092      port5          3          3      port3 |
                    | 1982      port6    .1269      port6     .1135     port5      .1135      port6          2          5          . |
                    | 1983      port7   1.2429      port7     .1269     port6      .1269      port7          6          6      port6 |
                    | 1984      port8    .4863      port8    1.2429     port7     1.2429      port8          7          7      port7 |
                    +----------------------------------------------------------------------------------------------------------------+
                  Last edited by Robert Picard; 16 Jul 2017, 12:17.

                  Comment


                  • #10
                    Ah, I see! I didn't consider what would happen if the tie for the max value of ret appeared outside the window. Thank you.

                    Comment

                    Working...
                    X