Hi, I am trying to calculate Moving Average in my panel data. I am a bit confused between various stata packages such as tsegen, tssmooth and rangestat. At present, I am using rangestat using the following code:
I have the unbalanced panel of around 3 million observations.
xtset gvkey date
rangestat (mean) mean_class = class, interval(dom -15 -1) by(gvkey)
gen class_delta2 = ln(class+1) - ln(mean_class+1)
replace class_delta2 = 0 if class_delta2 == .
I get the following result:
* Example generated by -dataex-. To install: ssc install dataex
clear
input double gvkey float date double class float(dom class_delta1) double mean_class float class_delta2
1004 18998 1 6 0 7.614545454545454 -1.460305
1004 19002 1 10 0 7.592369477911647 -1.4577273
1004 19005 2 13 0 8.172932330827068 -1.1176447
1004 19008 1 16 0 8.637785800240675 -1.5725442
1004 19010 2 18 0 8.895432692307692 -1.193461
1004 19015 2 23 0 9.40621266427718 -1.2437907
1004 19023 1 31 0 9.390361445783132 -1.6477314
1004 19024 1 1 0 . 0
1004 19026 1 3 0 7.743119266055046 -1.47512
1004 19029 1 6 0 7.614545454545454 -1.460305
1004 19033 1 10 0 7.592369477911647 -1.4577273
1004 19038 3 15 0 8.452319587628866 -.8599658
1004 19040 5 17 0 8.78820697954272 -.4894188
1004 19043 1 20 0 9.221419975932612 -1.6313384
1004 19044 1 21 0 9.277844311377246 -1.6368433
1004 19045 1 22 0 9.342925659472423 -1.6431556
1004 19046 1 23 0 9.40621266427718 -1.649256
1004 19047 3 24 .6931472 9.657517899761336 -.9799712
1004 19050 1 27 0 9.767220902612827 -1.6833593
1004 19051 1 28 0 9.730310262529834 -1.6799253
1004 19053 1 1 0 . 0
1004 19054 1 2 0 7.6 -1.458615
1004 19058 1 6 0 7.614545454545454 -1.460305
1004 19061 3 9 0 7.497737556561086 -.7535056
1004 19062 1 10 -.6931472 7.592369477911647 -1.4577273
1004 19064 2 12 0 7.891268533772652 -1.0864575
1004 19070 1 18 0 8.895432692307692 -1.598926
1004 19071 4 19 .9162908 9.040767386091128 -.6972156
1004 19072 5 20 .18232156 9.221419975932612 -.53272605
1004 19073 8 21 .4054651 9.277844311377246 -.13276596
end
format %td date
Problem: I am not sure whether I am using the right package or not. I just need to calculate the average of previous 15 days, what happens to the last observations in the sample? How their average will be calculated? How to deal with the missing values and dates while calculating the average?
Your time and advise highly appreciated.
I have the unbalanced panel of around 3 million observations.
xtset gvkey date
rangestat (mean) mean_class = class, interval(dom -15 -1) by(gvkey)
gen class_delta2 = ln(class+1) - ln(mean_class+1)
replace class_delta2 = 0 if class_delta2 == .
I get the following result:
* Example generated by -dataex-. To install: ssc install dataex
clear
input double gvkey float date double class float(dom class_delta1) double mean_class float class_delta2
1004 18998 1 6 0 7.614545454545454 -1.460305
1004 19002 1 10 0 7.592369477911647 -1.4577273
1004 19005 2 13 0 8.172932330827068 -1.1176447
1004 19008 1 16 0 8.637785800240675 -1.5725442
1004 19010 2 18 0 8.895432692307692 -1.193461
1004 19015 2 23 0 9.40621266427718 -1.2437907
1004 19023 1 31 0 9.390361445783132 -1.6477314
1004 19024 1 1 0 . 0
1004 19026 1 3 0 7.743119266055046 -1.47512
1004 19029 1 6 0 7.614545454545454 -1.460305
1004 19033 1 10 0 7.592369477911647 -1.4577273
1004 19038 3 15 0 8.452319587628866 -.8599658
1004 19040 5 17 0 8.78820697954272 -.4894188
1004 19043 1 20 0 9.221419975932612 -1.6313384
1004 19044 1 21 0 9.277844311377246 -1.6368433
1004 19045 1 22 0 9.342925659472423 -1.6431556
1004 19046 1 23 0 9.40621266427718 -1.649256
1004 19047 3 24 .6931472 9.657517899761336 -.9799712
1004 19050 1 27 0 9.767220902612827 -1.6833593
1004 19051 1 28 0 9.730310262529834 -1.6799253
1004 19053 1 1 0 . 0
1004 19054 1 2 0 7.6 -1.458615
1004 19058 1 6 0 7.614545454545454 -1.460305
1004 19061 3 9 0 7.497737556561086 -.7535056
1004 19062 1 10 -.6931472 7.592369477911647 -1.4577273
1004 19064 2 12 0 7.891268533772652 -1.0864575
1004 19070 1 18 0 8.895432692307692 -1.598926
1004 19071 4 19 .9162908 9.040767386091128 -.6972156
1004 19072 5 20 .18232156 9.221419975932612 -.53272605
1004 19073 8 21 .4054651 9.277844311377246 -.13276596
end
format %td date
Problem: I am not sure whether I am using the right package or not. I just need to calculate the average of previous 15 days, what happens to the last observations in the sample? How their average will be calculated? How to deal with the missing values and dates while calculating the average?
Your time and advise highly appreciated.
Comment