Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating 36 month Volatility from CRSP dataset

    First post, first STATA project. I'm hoping to find some help here. Thanks in advance. I've searched the forums and other answers but they didn't seem to work for me.

    My objective is to calculate 36 month volatility from monthly stock returns.

    Sample of my dataset:

    permno date prc ret
    10001 30sep1986 6.375 -.003076924
    10001 31oct1986 6.625 .039215688
    10001 28nov1986 7 .056603774
    10001 31dec1986 7 .015
    10001 30jan1987 6.75 -.035714287
    10001 27feb1987 6.25 -.074074075
    10001 31mar1987 6.375 .036800001
    10001 30jun1987 5.875 .051428571
    10001 31jul1987 6 .021276595
    10001 31aug1987 6.5 .083333336
    10001 30sep1987 6.25 -.022307692
    10001 30oct1987 6.375 .02
    10001 30nov1987 6.1875 -.029411765
    10001 31dec1987 5.875 -.033535354
    10001 29jan1988 6.25 .063829787
    10001 29feb1988 6.75 .079999998
    10001 31mar1988 6.125 -.0762963
    10001 30jun1988 6.25 -.012038835
    10001 31aug1988 6.625 .029126214
    10001 31oct1988 6.625 .039215688
    10001 30nov1988 6.625 0


    Following this topic I am able to proces this code:

    Code:
    by permno (date): gen ret_sq = ret^2
    by permno (date): sort: gen sum_ret_sq = sum(ret_sq)
    by permno (date): gen int n_obs = sum(!missing(ret))
    gen monthdate = month(date)
    gen yeardate = year(date)
    gen int date2 = ym(yeardate, monthdate)
    format date2 %tm
    xtset permno date2
    But with the next step when I generate variance, the results are very scrambled, a lot of missing results. This is the line I've used:
    Code:
    gen variance = (L1.sum_ret_sq - L37.sum_ret_sq)/(L1.n_obs-L37.n_obs) - ((L1.sum_ret - L37.sum_ret)/(L1.n_obs-L37.n_obs))^2
    It's a very large dataset, I am computing this for all listed companies from 1960 to 2015.

    What did I do wrong?




    Last edited by Ronald Verhagen; 15 Aug 2016, 12:43.

  • #2
    As for the missing values, remember that a rolling 36 month variance will be undefined in the first 36 observations for each permno, because there aren't enough preceding observations to fill the window.

    Next, your calculation of the monthly date is correct, but needlessly complicated.

    Next, while your code seeks to calculate the variance from first principles, it is easier to do this using one of the newer programs from Robert Picard: -tsegen- or -rangestat-. So here's how I might do this:

    Code:
    gen monthlydate = mofd(date)
    xtset permno monthlydate
    tsegen sd_ret = rowsd(L(1/36).ret), by(permno)
    gen variance_ret= sd_ret^2
    Alternatively, you can get the variance with:
    Code:
    rangestat (variance) variance_ret= ret, interval(monthlydate, -1, -36) by(permno)
    Note: both -tsegen- and -rangestat- are available from SSC.

    Added: While Robert Picard was the lead author of both of these programs, I neglected to mention co-authors Nick Cox (-tsegen-) and Roberto Ferrer (-rangestat-). Apologies to both.

    Comment


    • #3
      Thanks for your help.

      Originally posted by Clyde Schechter View Post
      As for the missing values, remember that a rolling 36 month variance will be undefined in the first 36 observations for each permno, because there aren't enough preceding observations to fill the window.
      I am aware of this, I meant there are missing gaps within the results which are not explained by this.

      Originally posted by Clyde Schechter View Post
      Next, while your code seeks to calculate the variance from first principles, it is easier to do this using one of the newer programs from Robert Picard: -tsegen- or -rangestat-. So here's how I might do this:
      I've installed Tsegen and applied your code. Help Tsegen states that it does not support a -by- option.

      When executing the code without the by suffix I am getting stdev results at the 3th row instead of the 37th:

      permno date ret monthlydate sd_ret
      10001 30sep1986 -.003076924 1986m9
      10001 31oct1986 .039215688 1986m10
      10001 28nov1986 .056603774 1986m11 .0299054
      10001 31dec1986 .015 1986m12 .0306942
      10001 30jan1987 -.035714287 1987m1 .0262946
      10001 27feb1987 -.074074075 1987m2 .0361048

      How is this possible?

      Originally posted by Clyde Schechter View Post
      Alternatively, you can get the variance with:
      Code:
      rangestat (variance) variance_ret= ret, interval(monthlydate, -1, -36) by(permno)
      Note: both -tsegen- and -rangestat- are available from SSC.
      With Rangestat I am also having some trouble:

      warning: -1 > -36 in 2949220 observations; observations will be ignored
      no observations

      Comment


      • #4
        Right, -tsegen- does not support a -by()- option; my error. It gets the grouping information from the -xtset- panel variable. So running it without -by(permno)- is correct.

        In -rangestat-, I made another error: the interval option should be -interval(monthlydate, -36, -1)- because -36 < -1. By the way, -rangestat- does have a -by()- option, and you definitely need it here.

        The reason you are getting results starting at the third observation in each group is that neither-tsegen- nor -rangestat- requires the full window be present to do the calculations. So in the third month, you are getting the standard deviation (or variance) for months 1 and 2. In the fourth month, you are getting the standard deviation (resp. variance) for months 1, 2, and 3, etc. You can eliminate those, of course with:

        Code:
        by permno (monthlydate), sort: replace variance_ret = . if _n < 37 // SIMILARLY FOR std_ret
        Sorry about my errors on -tsegen- and -rangestat-.

        Comment


        • #5
          Thanks Professor Schechter.

          Comment


          • #6
            Clyde Schechter I had the same query and ran the same code:
            tsegen sd_ret = rowsd(L(1/36).ret) Only the first two months of std was missing for each firm. I do not understand how it could compute the std of the first 36 months without having the returns information for those months.

            Comment


            • #7
              -tsegen- will calculate the requested statistic incorporating whatever observations within the specified lags/leads is available. It does not check whether some of them are missing or not. So if you specify L(1/36) but the first two months have missing values or are non-existent, then you will get a standard deviation based on the 34 months that are available. That's what it does. If you want results only when all 36 months of data are present, you have to check for that separately and remove the unwanted results based on incomplete data.

              Comment

              Working...
              X