Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Standard deviation of returns in stata

    Hi all,

    I am trying to calculate standard deviation of returns in stata, and have come up with the following code:

    Code:
    rangestat (mean) return, interval (n_obs 0 3047)
    gen diff=return - return_mean
    gen diff2 = (diff^2)
    by company_id (bcal_date): gen sumdiff2 = sum(diff2)
    gen var= (sumdiff2/3047)
    gen sd= sqrt(var)


    I am using this as opposed to the following, as when I enter the L numbers, I do not understand why but only observations after the latter L number show the value for variance:

    Code:
    // CALCULATE SOME RUNNING TOTALS OF RET AND RET^2*

    by company_id (bcal_date), sort: gen sum_ret = sum(return)
    by company_id (bcal_date): gen sum_ret_sq = sum(return^2)

    // AND A RUNNING COUNT OF NON MISSING OBSERVATIONS*
    by company_id (bcal_date): gen int n_obs = sum(!missing(return))

    sort company_id bcal_date
    // NOW CALCULATE RUNNING STANDARD DEVIATIONS
    gen variance = (L1.sum_ret_sq-L3048.sum_ret_sq)/(L1.n_obs-L3048.n_obs)-((L1.sum_ret - L3048.sum_ret)/(L1.n_obs-L3048.n_obs))^2
    gen sd = sqrt(variance)

    I wanted to ask if my version of the code will also correctly let me reach the standard deviation I am looking for, or if someone could explain what the L numbers mean and how I can input them in the second code I showed, to allow all values of variance and ultimately standard deviation to appear.

    Thanks so much!

    Cen
    Last edited by Cen Sophia; 28 Jul 2023, 07:23. Reason: Needed to add tags

  • #2
    You do not show your -xtset- command, but I'll guess that it was -xtset company_id bcal_date-. If that is the case, then L1.x refers to the value of variable x on the business calendar date immediately preceding the business calendar date of the current observation. Similarly, L3048.x is the value of the variable x 3048 business days before the current observation. It is also the convention that if there is no observation with the date that the L operator is seeking to use, then it is treated as a missing value. A consequence of that is that the formula you show will result in missing value for an observation whenever the data set contains no observation dated exactly 1 or exactly 3048 days earlier.

    The formula given is one way of calculating a standard deviation. But the post you took that from is a rather old one and predates the availability of -rangestat-. It is much easier to get the standard deviation using -rangestat-, which, it appears, you have already installed.

    That said, I have some questions about the way you are using -rangestat- in your calculation of mean return, as it does not seem to correspond to what you are trying to do in calculating the variance. First, -rangestat (mean) return, interval (n_obs 0 3047)- does not mention company_id, so this mean value includes values of return from all companies on dates within the specified interval of n_obs. Next, n_obs is rather different from bcal_date. And -interval(n_obs 0 3047) means looking forward 3047, whereas the code with the L operators looks back 3047 days. So I'm not sure what you're actually trying to do. Perhaps you can explain more clearly exactly what you need to calculate the standard deviation of.

    Comment


    • #3
      Hi Clyde, thank you so much for this response. I understand what you mean by the L.1x and L3048x now.

      I am currently doing an event study, of 4 events: the last 4 national elections, and studying their effect on stock market returns and volatility. I have completed the return side, and am now trying to make sense of the volatility side. I am trying to calculate the volatility for each of my 10 companies, to then estimate a garch model and attempt to estimate the 'abnormal' volatility inside the event window, compared to the estimation window.

      With my rangestat command, I was attempting to calculate the standard deviation of each of the companies' returns within a certain timeframe. The command produced results very similar to the aforementioned code, but the numbers weren't 100% accurate. How would you suggest using the rangestat command so that I am able to generate an accurate standard deviation for the companies? Or, How would you suggest inputting a variable that allows L.3048x to be backward looking instead of forward?

      Apologies if this is a simple question, I am fairly new to stata and and still working my way around it. If you have any further advice, I would be ever so grateful.

      Thank you,

      Cen


      Last edited by Cen Sophia; 28 Jul 2023, 12:25. Reason: I needed to amend my question.

      Comment


      • #4
        but the numbers weren't 100% accurate
        That is, to say the very least, strange. It's been around a long time now and is written by some of the best Stata programmers in the world. I use it almost every day in my work and have never encountered this kind of problem with it. Of course, there might still be a bug, but I think it is very unlikely. Can you show example data where it gives the wrong numbers? (Use the -dataex- command to do that.*) I think it is more likely you are not using it quite correctly.

        There are a couple of issues to note with it. It calculates the standard deviation over the interval you specify--even if there aren't always a complete set of data for that. That is, unlike using L operators, if you specify -interval(date -3047 0)- you get the standard deviation calculated on whatever values of date fall between 3047 days before and the current date. Even if there is only one such date. So if you want to restrict to observations where there is some minimum number of dates in the interval, you should also ask for the count of the number of observations in -rangestat- and then you can get rid of the ones where there aren't enough. For example:
        Code:
        rangestat (sd) return (count) return, by(company_id) interval(date -3047 0)
        replace return_sd = . if return_count < 1000
        will give you a result for every observation where the company has data on at least 1000 dates between 3047 days before and the current date.

        There is also the issue of the standard deviation calculated with denominator N (population) or N-1 (sample estimate). -rangestat-, like -summarize- uses the N-1 denominator.

        How would you suggest inputting a variable that allows L.3048x to be backward looking instead of forward?
        The L operators do look backwards. And, it's L3048.x, not L.3048x. If you want to use these operators to look forward, use F instead of L.

        *If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

        Comment


        • #5
          Hi Clyde,

          I managed to get the variance, thank you so much. Your explanations were so clear and have helped me tremendously, the rangestat code was much easier for me to follow thank you so much!

          Comment

          Working...
          X