Calculating 36 month Volatility from CRSP dataset

Ronald Verhagen

Join Date: Aug 2016

Posts: 5
#1

Calculating 36 month Volatility from CRSP dataset

15 Aug 2016, 12:41

First post, first STATA project. I'm hoping to find some help here. Thanks in advance. I've searched the forums and other answers but they didn't seem to work for me.

My objective is to calculate 36 month volatility from monthly stock returns.

Sample of my dataset:

permno date prc ret
10001 30sep1986 6.375 -.003076924
10001 31oct1986 6.625 .039215688
10001 28nov1986 7 .056603774
10001 31dec1986 7 .015
10001 30jan1987 6.75 -.035714287
10001 27feb1987 6.25 -.074074075
10001 31mar1987 6.375 .036800001
10001 30jun1987 5.875 .051428571
10001 31jul1987 6 .021276595
10001 31aug1987 6.5 .083333336
10001 30sep1987 6.25 -.022307692
10001 30oct1987 6.375 .02
10001 30nov1987 6.1875 -.029411765
10001 31dec1987 5.875 -.033535354
10001 29jan1988 6.25 .063829787
10001 29feb1988 6.75 .079999998
10001 31mar1988 6.125 -.0762963
10001 30jun1988 6.25 -.012038835
10001 31aug1988 6.625 .029126214
10001 31oct1988 6.625 .039215688
10001 30nov1988 6.625 0

Following this topic I am able to proces this code:

Code:

by permno (date): gen ret_sq = ret^2 by permno (date): sort: gen sum_ret_sq = sum(ret_sq) by permno (date): gen int n_obs = sum(!missing(ret)) gen monthdate = month(date) gen yeardate = year(date) gen int date2 = ym(yeardate, monthdate) format date2 %tm xtset permno date2

But with the next step when I generate variance, the results are very scrambled, a lot of missing results. This is the line I've used:

Code:

gen variance = (L1.sum_ret_sq - L37.sum_ret_sq)/(L1.n_obs-L37.n_obs) - ((L1.sum_ret - L37.sum_ret)/(L1.n_obs-L37.n_obs))^2

It's a very large dataset, I am computing this for all listed companies from 1960 to 2015.

What did I do wrong?

Last edited by Ronald Verhagen; 15 Aug 2016, 12:43.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30103
#2

15 Aug 2016, 13:14

As for the missing values, remember that a rolling 36 month variance will be undefined in the first 36 observations for each permno, because there aren't enough preceding observations to fill the window.

Next, your calculation of the monthly date is correct, but needlessly complicated.

Next, while your code seeks to calculate the variance from first principles, it is easier to do this using one of the newer programs from Robert Picard: -tsegen- or -rangestat-. So here's how I might do this:

Code:

gen monthlydate = mofd(date) xtset permno monthlydate tsegen sd_ret = rowsd(L(1/36).ret), by(permno) gen variance_ret= sd_ret^2

Alternatively, you can get the variance with:

Code:

rangestat (variance) variance_ret= ret, interval(monthlydate, -1, -36) by(permno)

Note: both -tsegen- and -rangestat- are available from SSC.

Added: While Robert Picard was the lead author of both of these programs, I neglected to mention co-authors Nick Cox (-tsegen-) and Roberto Ferrer (-rangestat-). Apologies to both.
Comment
Ronald Verhagen

Join Date: Aug 2016

Posts: 5
#3

15 Aug 2016, 14:03

Thanks for your help.

Originally posted by Clyde Schechter View Post

As for the missing values, remember that a rolling 36 month variance will be undefined in the first 36 observations for each permno, because there aren't enough preceding observations to fill the window.

I am aware of this, I meant there are missing gaps within the results which are not explained by this.

Originally posted by Clyde Schechter View Post

Next, while your code seeks to calculate the variance from first principles, it is easier to do this using one of the newer programs from Robert Picard: -tsegen- or -rangestat-. So here's how I might do this:

I've installed Tsegen and applied your code. Help Tsegen states that it does not support a -by- option.

When executing the code without the by suffix I am getting stdev results at the 3th row instead of the 37th:

permno date ret monthlydate sd_ret
10001 30sep1986 -.003076924 1986m9
10001 31oct1986 .039215688 1986m10
10001 28nov1986 .056603774 1986m11 .0299054
10001 31dec1986 .015 1986m12 .0306942
10001 30jan1987 -.035714287 1987m1 .0262946
10001 27feb1987 -.074074075 1987m2 .0361048

How is this possible?

Originally posted by Clyde Schechter View Post

Alternatively, you can get the variance with:
Code:
rangestat (variance) variance_ret= ret, interval(monthlydate, -1, -36) by(permno)
Note: both -tsegen- and -rangestat- are available from SSC.

With Rangestat I am also having some trouble:

warning: -1 > -36 in 2949220 observations; observations will be ignored
no observations
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30103
#4

15 Aug 2016, 14:22

Right, -tsegen- does not support a -by()- option; my error. It gets the grouping information from the -xtset- panel variable. So running it without -by(permno)- is correct.

In -rangestat-, I made another error: the interval option should be -interval(monthlydate, -36, -1)- because -36 < -1. By the way, -rangestat- does have a -by()- option, and you definitely need it here.

The reason you are getting results starting at the third observation in each group is that neither-tsegen- nor -rangestat- requires the full window be present to do the calculations. So in the third month, you are getting the standard deviation (or variance) for months 1 and 2. In the fourth month, you are getting the standard deviation (resp. variance) for months 1, 2, and 3, etc. You can eliminate those, of course with:

Code:

by permno (monthlydate), sort: replace variance_ret = . if _n < 37 // SIMILARLY FOR std_ret

Sorry about my errors on -tsegen- and -rangestat-.
Comment
Ronald Verhagen

Join Date: Aug 2016

Posts: 5
#5

15 Aug 2016, 14:40

Thanks Professor Schechter.
Comment
Saad Aftab

Join Date: Dec 2020

Posts: 6
#6

30 Jul 2021, 23:54

Clyde Schechter I had the same query and ran the same code:
tsegen sd_ret = rowsd(L(1/36).ret) Only the first two months of std was missing for each firm. I do not understand how it could compute the std of the first 36 months without having the returns information for those months.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30103
#7

31 Jul 2021, 10:18

-tsegen- will calculate the requested statistic incorporating whatever observations within the specified lags/leads is available. It does not check whether some of them are missing or not. So if you specify L(1/36) but the first two months have missing values or are non-existent, then you will get a standard deviation based on the 34 months that are available. That's what it does. If you want results only when all 36 months of data are present, you have to check for that separately and remove the unwanted results based on incomplete data.
1 like
Comment

Announcement

Calculating 36 month Volatility from CRSP dataset

Comment

Comment

Comment

Comment

Comment

Comment