I'm writing my thesis on the relationship between a firm characteristic (let's say size) and subsequent 3 year stock returns. I have unbalanced panel data with gaps and firm characteristic data is in a separate data file (I will merge the two files later on). The data shown below contains quarterly returns of a number of firms. For each quarter, I want to construct two new variables: The first is the annualized return over the subsequent 3 years. Thus, for a given quarter and a firm on which I have complete data and no missing values, I want to take that quarter's return plus the sum of the 11 subsequent quarter returns and divide this by 3 to get the corresponding annualized 3-year holding period return. In the second variable, I want to show the annualized standard deviation of this return. How to do so?
Given that some firms go bankrupt, merge or get acquired, I often have less than 12 quarters of returns per stock. I can't exclude these firms from the analysis because this creates a bias in the data. Furthermore, for some stocks, there is some missing data, in which I sometimes miss a few quarters of the required 20. How can I properly deal with this issue and come up with annualized returns anyway?
(In my data file the qdate shows as 2012q1 for example, here it is in numbers)
The reason I want to do this: If I manage to get annualized x-year holding period returns for any given stock for any given quarter, I will later merge this file with the firm characteristics file by gvkey (firm identifier) and qdate. Then, for each (or some) quarters, I want to form decile portfolios based on the firm characteristic and see if there are any return differentials.
I am quite unskilled in stata and studied many topics on related issues in the past weeks, but can't figure out what I should do. Any input is much appreciated.
Given that some firms go bankrupt, merge or get acquired, I often have less than 12 quarters of returns per stock. I can't exclude these firms from the analysis because this creates a bias in the data. Furthermore, for some stocks, there is some missing data, in which I sometimes miss a few quarters of the required 20. How can I properly deal with this issue and come up with annualized returns anyway?
(In my data file the qdate shows as 2012q1 for example, here it is in numbers)
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long gvkey float qdate double logreturn 1001 94 0 1001 95 .1686227098107338 1001 96 -.12861737795174122 1001 97 -.10318423807621002 1001 98 -.21772348880767822 1001 99 -.20972052961587906 1001 100 .3829922750592232 1001 101 .16705408319830894 1001 102 .20763936638832092 1001 103 .23556607961654663 1001 104 .190518319606781 1001 105 .20252425968647003 1001 106 -.06899286806583405 1003 95 0 1003 96 -.04879016429185867 1003 97 -.2231435477733612 1003 98 -.20763938128948212 1003 99 -.16705410182476044 1003 100 .08701137453317642 1003 101 .15415067225694656 1003 102 .3566749393939972 1003 103 -.133531391620636 1003 104 -.08961215987801552 1003 105 .0606246180832386 1003 106 .11122563481330872 1003 107 -.11122563481330872 1003 108 0 1003 109 .05715841054916382 1003 110 .10536051541566849 1003 111 -.597837008535862 1003 112 0 1003 113 .31015492230653763 1003 114 -.06899286806583405 1003 115 -.6931471824645996 1003 116 0 1003 117 -.8472978472709656 1003 118 -.40546509623527527 1003 119 -1.3862943947315216 end format %tq qdate
I am quite unskilled in stata and studied many topics on related issues in the past weeks, but can't figure out what I should do. Any input is much appreciated.
Comment