Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel data: From quarterly to multi-year returns

    I'm writing my thesis on the relationship between a firm characteristic (let's say size) and subsequent 3 year stock returns. I have unbalanced panel data with gaps and firm characteristic data is in a separate data file (I will merge the two files later on). The data shown below contains quarterly returns of a number of firms. For each quarter, I want to construct two new variables: The first is the annualized return over the subsequent 3 years. Thus, for a given quarter and a firm on which I have complete data and no missing values, I want to take that quarter's return plus the sum of the 11 subsequent quarter returns and divide this by 3 to get the corresponding annualized 3-year holding period return. In the second variable, I want to show the annualized standard deviation of this return. How to do so?

    Given that some firms go bankrupt, merge or get acquired, I often have less than 12 quarters of returns per stock. I can't exclude these firms from the analysis because this creates a bias in the data. Furthermore, for some stocks, there is some missing data, in which I sometimes miss a few quarters of the required 20. How can I properly deal with this issue and come up with annualized returns anyway?

    (In my data file the qdate shows as 2012q1 for example, here it is in numbers)

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long gvkey float qdate double logreturn
    1001  94                   0
    1001  95   .1686227098107338
    1001  96 -.12861737795174122
    1001  97 -.10318423807621002
    1001  98 -.21772348880767822
    1001  99 -.20972052961587906
    1001 100   .3829922750592232
    1001 101  .16705408319830894
    1001 102  .20763936638832092
    1001 103  .23556607961654663
    1001 104    .190518319606781
    1001 105  .20252425968647003
    1001 106 -.06899286806583405
    1003  95                   0
    1003  96 -.04879016429185867
    1003  97  -.2231435477733612
    1003  98 -.20763938128948212
    1003  99 -.16705410182476044
    1003 100  .08701137453317642
    1003 101  .15415067225694656
    1003 102   .3566749393939972
    1003 103   -.133531391620636
    1003 104 -.08961215987801552
    1003 105   .0606246180832386
    1003 106  .11122563481330872
    1003 107 -.11122563481330872
    1003 108                   0
    1003 109  .05715841054916382
    1003 110  .10536051541566849
    1003 111   -.597837008535862
    1003 112                   0
    1003 113  .31015492230653763
    1003 114 -.06899286806583405
    1003 115  -.6931471824645996
    1003 116                   0
    1003 117  -.8472978472709656
    1003 118 -.40546509623527527
    1003 119 -1.3862943947315216
    end
    format %tq qdate
    The reason I want to do this: If I manage to get annualized x-year holding period returns for any given stock for any given quarter, I will later merge this file with the firm characteristics file by gvkey (firm identifier) and qdate. Then, for each (or some) quarters, I want to form decile portfolios based on the firm characteristic and see if there are any return differentials.

    I am quite unskilled in stata and studied many topics on related issues in the past weeks, but can't figure out what I should do. Any input is much appreciated.



  • #2
    I have seen different approaches to calculating returns, or cumulative returns on this forum. As I don't work in finance, when responding to a question involving returns, I usually follow the approach the original person posting describes. I'm going to depart from that here and use a different approach to calculating annualized returns over a three year horizon. Summing quarterly returns doesn't make sense to me, nor dividing them by three at the end Moreover, your data don't have the quarterly returns in them: you have log quarterly returns. Perhaps it is those that you were planning to add and then divide by three. That makes more sense, and is what I will code for.

    If there were no missing data, this would be the same as just averaging all 12 quarterly log returns in the three year window, then multiplying by 4 to go from quarterly to annual average log return.. Fortunately, means are relatively robust to some missing values thrown in, if you are willing to assume that the missingness is itself uninformative. So the same thing works:

    Code:
    rangestat (mean) avg_quarterly_log_return = logreturn, by(gvkey) interval(qdate 0 12)
    gen avg_annual_log_return = 4*avg_quarterly_log_return
    gen annualized_return = exp(avg_annual_log_return)-1
    Note: -rangestat- is not part of official Stata. It was written by Robert Picard, Roberto Ferrer, and Nick Cox. You can get it by running -ssc install rangestat-.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      I have seen different approaches to calculating returns, or cumulative returns on this forum. As I don't work in finance, when responding to a question involving returns, I usually follow the approach the original person posting describes. I'm going to depart from that here and use a different approach to calculating annualized returns over a three year horizon. Summing quarterly returns doesn't make sense to me, nor dividing them by three at the end Moreover, your data don't have the quarterly returns in them: you have log quarterly returns. Perhaps it is those that you were planning to add and then divide by three. That makes more sense, and is what I will code for.

      If there were no missing data, this would be the same as just averaging all 12 quarterly log returns in the three year window, then multiplying by 4 to go from quarterly to annual average log return.. Fortunately, means are relatively robust to some missing values thrown in, if you are willing to assume that the missingness is itself uninformative. So the same thing works:

      Code:
      rangestat (mean) avg_quarterly_log_return = logreturn, by(gvkey) interval(qdate 0 12)
      gen avg_annual_log_return = 4*avg_quarterly_log_return
      gen annualized_return = exp(avg_annual_log_return)-1
      Note: -rangestat- is not part of official Stata. It was written by Robert Picard, Roberto Ferrer, and Nick Cox. You can get it by running -ssc install rangestat-.
      I was indeed planning to sum and average log returns instead of just normal returns. The code works perfectly, but I noticed that that using a 0-12 interval takes 13 quarters into account, thus (qdate 0 11) should do it for me. I see that rangestat has variance/stdev built in as well which is great. Thanks a lot!

      Comment


      • #4
        Yes, that was an error on my part to say interval(qdate 0 12). You are quite correct that it should be interval(qdate 0 11) to capture a total of 12 quarters.

        You will find that -rangestat- can do just about anything that requires doing some kind of statistics over a "window." But, surprisingly, it has many other applications that are not obviously designed in that way.

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          Yes, that was an error on my part to say interval(qdate 0 12). You are quite correct that it should be interval(qdate 0 11) to capture a total of 12 quarters.

          You will find that -rangestat- can do just about anything that requires doing some kind of statistics over a "window." But, surprisingly, it has many other applications that are not obviously designed in that way.
          Yes, I tried some different options and I see that it can be quite useful later on in my analysis. Interesting package indeed.

          Comment

          Working...
          X