tsegen : rolling standard deviation with condition

Budu Gulo

Join Date: Feb 2018

Posts: 238
#1

tsegen : rolling standard deviation with condition

17 Feb 2018, 15:23

The panel data I am using has 4 variables: mdate, rateofreturn, name, and id. I have already calculated rolling (24-60 months, as available) mean and standard deviation using the 'tsegen' package.

Codes:.
tsegen rmean = rowmean(L(0/59).rateofreturn,24)
tsegen rstd = rowsd(L(0/59).rateofreturn,24)

Now, with the same rolling condition, how can I find standard deviation of those observations that fall below the mean?

Many thanks for considering my request.
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35730
#2

18 Feb 2018, 06:05

What's "the mean"? There are all sorts of possibilities ranging from the overall mean to the mean over windows of length 60 calculated by your statement.
Comment
Budu Gulo

Join Date: Feb 2018

Posts: 238
#3

18 Feb 2018, 07:18

Sorry for the lack of clarification. The mean is the "rolling mean" generated by the command tsegen rmean = rowmean(L(0/59).rateofreturn,24). To be specific, in row 24 the new variable, say rstd2, will contain the standard deviation of rateofreturn (row1-24) which fall below the rolling mean in row 24 i.e. the value of the variable rmean in row 24. Similarly, in row 25 the variable rstd2 will contain the standard deviation of rateofreturn (row1-25) which fall below the value of rmean in row 25. The number of observations will continue to increase until it reaches 60. For example, in row 61 rstd2 will consider previous 59 observations and the current observation of rateofreturn, and the value of rmean in row 61.

Thank you again Nick Cox for considering my request.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30121
#4

18 Feb 2018, 10:45

I can't think of a completely simple way to do it. But I believe the following works:

Code:

// CREATE TOY DATA SET clear* set obs 100 gen mdate = tm(2004m12) + _n expand 10 by mdate, sort: gen id = _n gen rateofreturn = rgamma(1, .02) isid id mdate, sort // DO THE CALCULATIONS rangestat (count) rateofreturn (mean) mean = rateofreturn (sd) rstd = rateofreturn, /// by(id) interval(mdate -59 0) foreach v of varlist mean rstd { replace `v' = . if rateofreturn_count < 24 } tempfile copy save `copy' rangejoin mdate -59 0 using `copy', by(id) keep if rateofreturn_U < mean by id mdate, sort: egen rstd2 = sd(rateofreturn_U) by id mdate: keep if _n == 1 replace rstd2 = . if rateofreturn_count < 24 drop rateofreturn_count *_U

Note that I have redone your initial calculations of mean and rstd by using -rangestat- instead of -tsegen-. That's completely optional: there's nothing wrong with doing it with -tsegen- at that point. But since you will need to install -rangestat- in order to run -rangejoin- (both available from SSC*), which is the key here, I figure you may as well use it. (Also, I've always been more comfortable with -rangestat- than with -tsegen-; just a matter of taste.)

The difficulty arises because rstd2 is defined by the conjunction of two range conditions: mdate within 59 months, and rateofreturn < mean. I don't know of any single command that will apply two such conditions simultaneously. So we do it by first letting -rangejoin- pick out the observations that meet the mdate condition, and then from there -keep- only those that also satisfy the rateofreturn < mean condition. From there it's just routine calculation of group standard deviations. Then we reduce to one observation per id mdate combination.

Additional points:
1. I wouldn't use mean as the name of a variable in Stata, because it is also the name of a command. I followed your lead in this response, but I would probably call it rmean in my own program.

2. In the future, whenever you want help with code, always post an example of the data to which you want to apply the code. A problem like this is complicated enough that few people would be able to follow the transformations of the data in their heads, so testing out the code before posting it is important. You can't write code for non-existent data. I was intrigued enough by the problem to make a set of toy data for the purpose. But it is possible that my toy data deviate from what you have in a way that will break my code. If that's true, we've both wasted our time here. So, always show example data when you want code. Moreover, be sure to do that in the most helpful way, which is by using the -dataex-* command (also from SSC if it is not already part of your official Stata). Please read FAQ #12 for more information about this.

*-rangejoin- and -dataex- are both written by Robert Picard. -rangestat- is by Robert Picard, Nick Cox, and Roberto Ferrer.
Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

18 Feb 2018, 11:22

Clyde beat me to it but I think the following is a cleaner and more efficient solution.

You cannot do what you want with tsegen (or rangestat, both from SSC) as you need to apply two conditions to determine the sample in terms of calculating the statistic for the current observation (time window and below the mean threshold). You can however apply the below the mean condition in a program using rangerun (from SSC). rangerun works like rangestat but instead of using built-in statistics, rangerun loads in memory only the observations within range of the current observation and then runs a user-defined program to calculate results. Here's a quick example of how to calculate the initial rolling mean and sd using rangestat and then calculate the sd using only observations below the rolling mean for the current observation (passed along to the program using the rr_rmean1 scalar). I did not impose a minimum of 24 months; if you want to, all you need to do is replace results with missing if n1 is less than 24.

Code:

* create a demonstration dataset
clear all
set obs 300
gen long id = _n
expand runiformint(90,120)
bysort id: gen mdate = _n
format %td mdate
gen rateofreturn = runiform(-1,1) if runiform() < .9

* initial rolling window statistics
rangestat (mean) rmean1=rateofreturn (sd) rsd1=rateofreturn ///
    (count) n1=rateofreturn, by(id) interval(mdate -59 0)

* define a program to calculate the sd for obs within the window
program below_mean
    sum rateofreturn if rateofreturn < rr_rmean1
    gen rstd2 = r(sd)
end
rangerun below_mean, by(id) interval(mdate -59 0) ///
    use(rateofreturn rmean1) sprefix(rr_)
    
* spot check a few examples
list in 500
sum rateofreturn if id == id[500] & inrange(mdate, mdate[500]-59, mdate[500]) & ///
    rateofreturn <  rmean1[500]

list in 999
sum rateofreturn if id == id[999] & inrange(mdate, mdate[999]-59, mdate[999]) & ///
    rateofreturn <  rmean1[999]

and the spot check results:

Code:

. list in 500

     +-------------------------------------------------------------------+
     | id       mdate   rateof~n      rmean1        rsd1   n1      rstd2 |
     |-------------------------------------------------------------------|
500. |  5   31mar1960          .   .05171968   .53416681   49   .2739688 |
     +-------------------------------------------------------------------+

. sum rateofreturn if id == id[500] & inrange(mdate, mdate[500]-59, mdate[500]) & ///
>         rateofreturn <  rmean1[500]

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
rateofreturn |         24   -.4129939    .2739688  -.9221312   .0016056

. 
. list in 999

     +------------------------------------------------------------------+
     | id       mdate   rateof~n       rmean1       rsd1   n1     rstd2 |
     |------------------------------------------------------------------|
999. | 10   07apr1960   .0459386   -.12599504   .5781599   54   .248878 |
     +------------------------------------------------------------------+

. sum rateofreturn if id == id[999] & inrange(mdate, mdate[999]-59, mdate[999]) & ///
>         rateofreturn <  rmean1[999]

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
rateofreturn |         29   -.5861604     .248878  -.9932978  -.1394511

.

Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

18 Feb 2018, 11:44

On further thought, you can calculate all the statistics within the user-defined program so this reduces to (including a 24 observation minimum):

Code:

* create a demonstration dataset
clear all
set obs 300
gen long id = _n
expand runiformint(90,120)
bysort id: gen mdate = _n
format %td mdate
gen rateofreturn = runiform(-1,1) if runiform() < .9

* define a program to calculate stats within the window
program my_stats
    sum rateofreturn
    if r(N) < 24 exit
    gen n1     = r(N)
    gen rmean1 = r(mean)
    gen rsd1   = r(sd)
    sum rateofreturn if rateofreturn < r(mean)
    gen rstd2 = r(sd)
end
rangerun my_stats, by(id) interval(mdate -59 0) use(rateofreturn)
    
* spot check a few examples
list in 500
sum rateofreturn if id == id[500] & inrange(mdate, mdate[500]-59, mdate[500]) & ///
    rateofreturn <  rmean1[500]

list in 999
sum rateofreturn if id == id[999] & inrange(mdate, mdate[999]-59, mdate[999]) & ///
    rateofreturn <  rmean1[999]

and the spot-check results:

Code:

. list in 500

     +------------------------------------------------------------------+
     | id       mdate   rateof~n   n1      rmean1       rsd1      rstd2 |
     |------------------------------------------------------------------|
500. |  5   17feb1960   .5779151   43   -.1668841   .5639926   .2827553 |
     +------------------------------------------------------------------+

. sum rateofreturn if id == id[500] & inrange(mdate, mdate[500]-59, mdate[500]) & ///
>         rateofreturn <  rmean1[500]

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
rateofreturn |         24    -.590686    .2827553  -.9997969  -.1754138

. 
. list in 999

     +------------------------------------------------------------------+
     | id       mdate   rateof~n   n1      rmean1       rsd1      rstd2 |
     |------------------------------------------------------------------|
999. | 10   05mar1960   .9466972   54   -.1367374   .5228593   .2683297 |
     +------------------------------------------------------------------+

. sum rateofreturn if id == id[999] & inrange(mdate, mdate[999]-59, mdate[999]) & ///
>         rateofreturn <  rmean1[999]

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
rateofreturn |         29   -.5322129    .2683297  -.9695668  -.1769639

Comment

Budu Gulo

Join Date: Feb 2018

Posts: 238
#7

18 Feb 2018, 11:57

Many thanks Clyde Schechter and Robert Picard for solving my problem. You guys rock!
Comment

Announcement