Dear Stata list users:
I need to create a moving-average over 5 years for a (ratio-)variable in my panel data time series for each company and each year.
My panel data is sorted by year (time variable) and company_id (panel variable) and declared as a time series with xtset.
At this, the rolling average over the preceding 5 years for each company in each year, should be only created if at least 3 out of the 5 needed observations are not missing.
If less than 5 observations are available the average should be computed without the number of missing observations e.g. if 3 observations (x) are available: (x1+x2+x3)/3 (instead of dividing by 5)
Here is a section of my dataset:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input int year float(company_id I dummy I_mov_avg_5y I_ma_500 I_ma_5y) 2003 1 . 1 . . . 2004 1 . 1 . . . 2005 1 .2720588 . . . . 2006 1 .4888889 . . . .2720588 2007 1 .13846155 . . . .38047385 2008 1 0 . . . .2998031 2009 1 0 . . . .2248523 2010 1 0 . . .17988186 .17988186 2011 1 0 . . .12547009 .12547009 2012 1 . 1 . .02769231 .02769231 end format %ty year
These are the commands, which I used trying to build the moving average.
1. Generating of the "ratio variable" R&D Expenditures/Sales (I)
gen I = RnD_expense/net_sales, after(company_name )
2. "First try" using tssmooth
tssmooth ma I_ma_5y = I, window(5 0 0)
Problem here : moving average is always executed,even if 3 (or more) missing observations exist
3. "Second try"
by company_id: generate I_ma_500 = (L5.I + L4.I + L3.I + L2.I + L1.I)/ 5
Problem here: This command only executes if exactly 5 observations in a sequence are available;
if there is a break in the sequence (due to a missing observation) the moving average is not calculated
(even if e.g. 4 remaining observations exist).
Hence I tried to modify, this command by first creating a dummy which indicates a missing value of the variable I (shows1 for missing value)
and then setting the condition, that if the sum of the preceding 5 values of the dummy variable isn't higher than 2 this means, that at least 3 observations are available;
Generating of dummy:
gen dummy =1 if missing(I)
by company_id: generate I_mov_avg_5y = (L5.I + L4.I + L3.I + L2.I + L1.I)/ 5 ///
if (L5.dummy + L4.dummy+ L3.dummy + L2.dummy + L1.dummy)<=2
Problem with this command: all observations of the moving average took the value "."
I also tried to use commands like mdesc or asrol, but couldn't quite figure out, how to
maybe solve this problem with the help of these two commands.
Thus i would appreciate, if some of you could help me in finding a solution.
Thank you very much in advance,
Tamer
Comment