Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • tsegen : rolling standard deviation with condition

    The panel data I am using has 4 variables: mdate, rateofreturn, name, and id. I have already calculated rolling (24-60 months, as available) mean and standard deviation using the 'tsegen' package.

    Codes:.
    tsegen rmean = rowmean(L(0/59).rateofreturn,24)
    tsegen rstd = rowsd(L(0/59).rateofreturn,24)

    Now, with the same rolling condition, how can I find standard deviation of those observations that fall below the mean?

    Many thanks for considering my request.

  • #2
    What's "the mean"? There are all sorts of possibilities ranging from the overall mean to the mean over windows of length 60 calculated by your statement.

    Comment


    • #3
      Sorry for the lack of clarification. The mean is the "rolling mean" generated by the command tsegen rmean = rowmean(L(0/59).rateofreturn,24). To be specific, in row 24 the new variable, say rstd2, will contain the standard deviation of rateofreturn (row1-24) which fall below the rolling mean in row 24 i.e. the value of the variable rmean in row 24. Similarly, in row 25 the variable rstd2 will contain the standard deviation of rateofreturn (row1-25) which fall below the value of rmean in row 25. The number of observations will continue to increase until it reaches 60. For example, in row 61 rstd2 will consider previous 59 observations and the current observation of rateofreturn, and the value of rmean in row 61.

      Thank you again Nick Cox for considering my request.

      Comment


      • #4
        I can't think of a completely simple way to do it. But I believe the following works:

        Code:
        //    CREATE TOY DATA SET
        clear*
        set obs 100
        gen mdate = tm(2004m12) + _n
        expand 10
        by mdate, sort: gen id = _n
        gen rateofreturn = rgamma(1, .02)
        isid id mdate, sort
        
        //    DO THE CALCULATIONS
        rangestat (count) rateofreturn (mean) mean = rateofreturn (sd) rstd = rateofreturn, ///
            by(id) interval(mdate -59 0)
        foreach v of varlist mean rstd {
            replace `v' = . if rateofreturn_count < 24
        }
        
        tempfile copy
        save `copy'
        
        rangejoin mdate  -59 0 using `copy', by(id)
        keep if rateofreturn_U < mean
        
        by id mdate, sort: egen rstd2 = sd(rateofreturn_U)
        
        by id mdate: keep if _n == 1
        replace rstd2 = . if rateofreturn_count < 24
        drop rateofreturn_count *_U
        Note that I have redone your initial calculations of mean and rstd by using -rangestat- instead of -tsegen-. That's completely optional: there's nothing wrong with doing it with -tsegen- at that point. But since you will need to install -rangestat- in order to run -rangejoin- (both available from SSC*), which is the key here, I figure you may as well use it. (Also, I've always been more comfortable with -rangestat- than with -tsegen-; just a matter of taste.)

        The difficulty arises because rstd2 is defined by the conjunction of two range conditions: mdate within 59 months, and rateofreturn < mean. I don't know of any single command that will apply two such conditions simultaneously. So we do it by first letting -rangejoin- pick out the observations that meet the mdate condition, and then from there -keep- only those that also satisfy the rateofreturn < mean condition. From there it's just routine calculation of group standard deviations. Then we reduce to one observation per id mdate combination.

        Additional points:
        1. I wouldn't use mean as the name of a variable in Stata, because it is also the name of a command. I followed your lead in this response, but I would probably call it rmean in my own program.

        2. In the future, whenever you want help with code, always post an example of the data to which you want to apply the code. A problem like this is complicated enough that few people would be able to follow the transformations of the data in their heads, so testing out the code before posting it is important. You can't write code for non-existent data. I was intrigued enough by the problem to make a set of toy data for the purpose. But it is possible that my toy data deviate from what you have in a way that will break my code. If that's true, we've both wasted our time here. So, always show example data when you want code. Moreover, be sure to do that in the most helpful way, which is by using the -dataex-* command (also from SSC if it is not already part of your official Stata). Please read FAQ #12 for more information about this.

        *-rangejoin- and -dataex- are both written by Robert Picard. -rangestat- is by Robert Picard, Nick Cox, and Roberto Ferrer.

        Comment


        • #5
          Clyde beat me to it but I think the following is a cleaner and more efficient solution.

          You cannot do what you want with tsegen (or rangestat, both from SSC) as you need to apply two conditions to determine the sample in terms of calculating the statistic for the current observation (time window and below the mean threshold). You can however apply the below the mean condition in a program using rangerun (from SSC). rangerun works like rangestat but instead of using built-in statistics, rangerun loads in memory only the observations within range of the current observation and then runs a user-defined program to calculate results. Here's a quick example of how to calculate the initial rolling mean and sd using rangestat and then calculate the sd using only observations below the rolling mean for the current observation (passed along to the program using the rr_rmean1 scalar). I did not impose a minimum of 24 months; if you want to, all you need to do is replace results with missing if n1 is less than 24.

          Code:
          * create a demonstration dataset
          clear all
          set obs 300
          gen long id = _n
          expand runiformint(90,120)
          bysort id: gen mdate = _n
          format %td mdate
          gen rateofreturn = runiform(-1,1) if runiform() < .9
          
          * initial rolling window statistics
          rangestat (mean) rmean1=rateofreturn (sd) rsd1=rateofreturn ///
              (count) n1=rateofreturn, by(id) interval(mdate -59 0)
          
          * define a program to calculate the sd for obs within the window
          program below_mean
              sum rateofreturn if rateofreturn < rr_rmean1
              gen rstd2 = r(sd)
          end
          rangerun below_mean, by(id) interval(mdate -59 0) ///
              use(rateofreturn rmean1) sprefix(rr_)
              
          * spot check a few examples
          list in 500
          sum rateofreturn if id == id[500] & inrange(mdate, mdate[500]-59, mdate[500]) & ///
              rateofreturn <  rmean1[500]
          
          list in 999
          sum rateofreturn if id == id[999] & inrange(mdate, mdate[999]-59, mdate[999]) & ///
              rateofreturn <  rmean1[999]
          and the spot check results:
          Code:
          . list in 500
          
               +-------------------------------------------------------------------+
               | id       mdate   rateof~n      rmean1        rsd1   n1      rstd2 |
               |-------------------------------------------------------------------|
          500. |  5   31mar1960          .   .05171968   .53416681   49   .2739688 |
               +-------------------------------------------------------------------+
          
          . sum rateofreturn if id == id[500] & inrange(mdate, mdate[500]-59, mdate[500]) & ///
          >         rateofreturn <  rmean1[500]
          
              Variable |        Obs        Mean    Std. Dev.       Min        Max
          -------------+---------------------------------------------------------
          rateofreturn |         24   -.4129939    .2739688  -.9221312   .0016056
          
          . 
          . list in 999
          
               +------------------------------------------------------------------+
               | id       mdate   rateof~n       rmean1       rsd1   n1     rstd2 |
               |------------------------------------------------------------------|
          999. | 10   07apr1960   .0459386   -.12599504   .5781599   54   .248878 |
               +------------------------------------------------------------------+
          
          . sum rateofreturn if id == id[999] & inrange(mdate, mdate[999]-59, mdate[999]) & ///
          >         rateofreturn <  rmean1[999]
          
              Variable |        Obs        Mean    Std. Dev.       Min        Max
          -------------+---------------------------------------------------------
          rateofreturn |         29   -.5861604     .248878  -.9932978  -.1394511
          
          .

          Comment


          • #6
            On further thought, you can calculate all the statistics within the user-defined program so this reduces to (including a 24 observation minimum):

            Code:
            * create a demonstration dataset
            clear all
            set obs 300
            gen long id = _n
            expand runiformint(90,120)
            bysort id: gen mdate = _n
            format %td mdate
            gen rateofreturn = runiform(-1,1) if runiform() < .9
            
            * define a program to calculate stats within the window
            program my_stats
                sum rateofreturn
                if r(N) < 24 exit
                gen n1     = r(N)
                gen rmean1 = r(mean)
                gen rsd1   = r(sd)
                sum rateofreturn if rateofreturn < r(mean)
                gen rstd2 = r(sd)
            end
            rangerun my_stats, by(id) interval(mdate -59 0) use(rateofreturn)
                
            * spot check a few examples
            list in 500
            sum rateofreturn if id == id[500] & inrange(mdate, mdate[500]-59, mdate[500]) & ///
                rateofreturn <  rmean1[500]
            
            list in 999
            sum rateofreturn if id == id[999] & inrange(mdate, mdate[999]-59, mdate[999]) & ///
                rateofreturn <  rmean1[999]
            and the spot-check results:
            Code:
            . list in 500
            
                 +------------------------------------------------------------------+
                 | id       mdate   rateof~n   n1      rmean1       rsd1      rstd2 |
                 |------------------------------------------------------------------|
            500. |  5   17feb1960   .5779151   43   -.1668841   .5639926   .2827553 |
                 +------------------------------------------------------------------+
            
            . sum rateofreturn if id == id[500] & inrange(mdate, mdate[500]-59, mdate[500]) & ///
            >         rateofreturn <  rmean1[500]
            
                Variable |        Obs        Mean    Std. Dev.       Min        Max
            -------------+---------------------------------------------------------
            rateofreturn |         24    -.590686    .2827553  -.9997969  -.1754138
            
            . 
            . list in 999
            
                 +------------------------------------------------------------------+
                 | id       mdate   rateof~n   n1      rmean1       rsd1      rstd2 |
                 |------------------------------------------------------------------|
            999. | 10   05mar1960   .9466972   54   -.1367374   .5228593   .2683297 |
                 +------------------------------------------------------------------+
            
            . sum rateofreturn if id == id[999] & inrange(mdate, mdate[999]-59, mdate[999]) & ///
            >         rateofreturn <  rmean1[999]
            
                Variable |        Obs        Mean    Std. Dev.       Min        Max
            -------------+---------------------------------------------------------
            rateofreturn |         29   -.5322129    .2683297  -.9695668  -.1769639

            Comment


            • #7
              Many thanks Clyde Schechter and Robert Picard for solving my problem. You guys rock!

              Comment

              Working...
              X