Dear Stata list community,
I have been computing a moving weighted average in excel, but I am sure there is a more efficient (and less error prone) way to do this in Stata; advice on the matter will be as always very much appreciated. I had a look in https://goo.gl/PkW4mI & https://goo.gl/Jg27wd, where Nick Cox proposed:
I have a multiple-record-per-subject data set; each subject ("id") was observed annually ("duration") from 1998 - 2010; a.k.a. panel or Time Series Cross Section data). There are 50 subjects.
"year_failure" specifies the year in which the dependent variable ("failure") was coded "1"
"failure" binary (dependent) variable
"v1" nummeric covariate
"v1_year_failure" value of "v1" for a given subject, when "failure" was coded as "1". the value of v1_year_failure corresponds to time when the variable "failure" was coded "1".
The formula that I want to use to calculate the weighted moving average is as follows:
abswma1v1 = ABS([(MostRecent "v1_year_failure" + average of previous "v1_year_failure"/2] − v1)
note that in years 1998, 2000, 2004 and 2010, more than one subjects experienced the event (aka tied failures). From those tied failures, I want them to contribute the average of v1_year_failure. I computed this in excel ("averageties") and placed the resulting average in the last cell of the tied events, e.g., subjects 2, 37 & 47 experienced the event in 1998. The average of their corresponding values (v1_year_failure) was placed under "averageties".
in order to compute the absolute weighted moving average for subject one (identiied with var "id_1), in excel I manually computed a moving average of "v1_year_failure" in one variable called "wma1" as follows:
in year 1996, I arbitrarily assigned 0
in 1997, the value of v1_year_failure in the year before (1996), i.e., 35.22383
in 1998, the value of v1_year_failure in the previous years (1996-1997), i.e., 35.22383
in 1999, the value of v1_year_failure in the previous years (1996-1998); however, in 1998 three subjects experienced the event, hence, the formula would be
[(v1_year_failure in 1996 + v1_year_failure in 1997) + (average of "v1_year_failure" in 1998)]/2, i.e., the 3 tied events from 1998 contribute as the average of "v1_year_failure" in 1998
in 2000, the value of v1_year_failure in the previous years (1996-1999); the 3 tied events from 1998 keep contributing as the average of "v1_year_failure" in 1998:
[ [(v1_year_failure in 1996 + (average of "v1_year_failure" in 1998)]/2 + (v1_year_failure in 1999) ]/2
in 2001: [ [(v1_year_failure in 1996 + (average of "v1_year_failure" in 1998) + (v1_year_failure in 1999)/ 3 ] + (average of "v1_year_failure" in 2000 ) ] / 2
v1_year_failure in years 2002 - 2004: same value as in 2001, because there were no events (failure = 0 during this period)
v1_year_failure in year 2005:
[ [(v1_year_failure in 1996 + (average of "v1_year_failure" in 1998) + (v1_year_failure in 1999) + (average of "v1_year_failure" in 2000 ) / 4] + (average of "v1_year_failure" in 2004) ] /2
v1_year_failure in year 2006: same as in 2005, because there was no failure in year 2005
v1_year_failure in year 2007:
[ [(v1_year_failure in 1996 + (average of "v1_year_failure" in 1998) + (v1_year_failure in 1999) + (average of "v1_year_failure" in 2000 ) + (average of "v1_year_failure" in 2004 ) / 5] + v1_year_failure in 2006 ] / 2
v1_year_failure in years 2008 - 2011: formulae follows the same logic as above, omitted to save space.
finally, I substracted v1 from wma1 and then obtained its absolute value. The variable I am aiming at is abswma1v1 (the first value of "wma1v1" was arbitrarily set to 0)
below, the spreadsheet I have been using to calculate this variable in excel

I have been computing a moving weighted average in excel, but I am sure there is a more efficient (and less error prone) way to do this in Stata; advice on the matter will be as always very much appreciated. I had a look in https://goo.gl/PkW4mI & https://goo.gl/Jg27wd, where Nick Cox proposed:
Code:
egen, filter() ///alternative 1 generate movingaverage = (F1.myvar + myvar + L1.myvar) / 3 ///alternative 2 mvsumm /// alternative 3
"year_failure" specifies the year in which the dependent variable ("failure") was coded "1"
"failure" binary (dependent) variable
"v1" nummeric covariate
"v1_year_failure" value of "v1" for a given subject, when "failure" was coded as "1". the value of v1_year_failure corresponds to time when the variable "failure" was coded "1".
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input int year_failure byte(failure id) float v1_year_failure 1996 1 5 35.22383 1998 1 2 47.81606 1998 1 37 60.95007 1998 1 47 57.08948 1999 1 19 69.31047 2000 1 6 7.001259 2000 1 11 88.99662 2000 1 28 27.49802 2004 1 26 24.74232 2004 1 45 65.38895 2006 1 39 69.51917 2007 1 31 80.26433 2008 1 22 69.11573 2010 1 3 4.534889 2010 1 30 49.00176 2011 1 8 76.37936 end
abswma1v1 = ABS([(MostRecent "v1_year_failure" + average of previous "v1_year_failure"/2] − v1)
note that in years 1998, 2000, 2004 and 2010, more than one subjects experienced the event (aka tied failures). From those tied failures, I want them to contribute the average of v1_year_failure. I computed this in excel ("averageties") and placed the resulting average in the last cell of the tied events, e.g., subjects 2, 37 & 47 experienced the event in 1998. The average of their corresponding values (v1_year_failure) was placed under "averageties".
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input int year_failure byte(failure id) float(v1_year_failure averageties) 1996 1 5 35.22383 . 1998 1 2 47.81606 . 1998 1 37 60.95007 . 1998 1 47 57.08948 55.2852 1999 1 19 69.31047 . 2000 1 6 7.001259 . 2000 1 11 88.99662 . 2000 1 28 27.49802 41.1653 2004 1 26 24.74232 . 2004 1 45 65.38895 45.06564 2006 1 39 69.51917 . 2007 1 31 80.26433 . 2008 1 22 69.11573 . 2010 1 3 4.534889 . 2010 1 30 49.00176 26.768324 2011 1 8 76.37936 . end
in year 1996, I arbitrarily assigned 0
in 1997, the value of v1_year_failure in the year before (1996), i.e., 35.22383
in 1998, the value of v1_year_failure in the previous years (1996-1997), i.e., 35.22383
in 1999, the value of v1_year_failure in the previous years (1996-1998); however, in 1998 three subjects experienced the event, hence, the formula would be
[(v1_year_failure in 1996 + v1_year_failure in 1997) + (average of "v1_year_failure" in 1998)]/2, i.e., the 3 tied events from 1998 contribute as the average of "v1_year_failure" in 1998
in 2000, the value of v1_year_failure in the previous years (1996-1999); the 3 tied events from 1998 keep contributing as the average of "v1_year_failure" in 1998:
[ [(v1_year_failure in 1996 + (average of "v1_year_failure" in 1998)]/2 + (v1_year_failure in 1999) ]/2
in 2001: [ [(v1_year_failure in 1996 + (average of "v1_year_failure" in 1998) + (v1_year_failure in 1999)/ 3 ] + (average of "v1_year_failure" in 2000 ) ] / 2
v1_year_failure in years 2002 - 2004: same value as in 2001, because there were no events (failure = 0 during this period)
v1_year_failure in year 2005:
[ [(v1_year_failure in 1996 + (average of "v1_year_failure" in 1998) + (v1_year_failure in 1999) + (average of "v1_year_failure" in 2000 ) / 4] + (average of "v1_year_failure" in 2004) ] /2
v1_year_failure in year 2006: same as in 2005, because there was no failure in year 2005
v1_year_failure in year 2007:
[ [(v1_year_failure in 1996 + (average of "v1_year_failure" in 1998) + (v1_year_failure in 1999) + (average of "v1_year_failure" in 2000 ) + (average of "v1_year_failure" in 2004 ) / 5] + v1_year_failure in 2006 ] / 2
v1_year_failure in years 2008 - 2011: formulae follows the same logic as above, omitted to save space.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input int year_failure byte(failure id) float(v1_year_failure averageties) int duration byte id_1 float wma1 1996 1 5 35.22383 . 1996 1 0 1998 1 2 47.81606 . 1997 1 35.22383 1998 1 37 60.95007 . 1998 1 35.22383 1998 1 47 57.08948 55.2852 1999 1 45.25452 1999 1 19 69.31047 . 2000 1 57.28249 2000 1 6 7.001259 . 2001 1 47.21923 2000 1 11 88.99662 . 2002 1 47.21923 2000 1 28 27.49802 41.1653 2003 1 47.21923 2004 1 26 24.74232 . 2004 1 47.21923 2004 1 45 65.38895 45.06564 2005 1 47.65592 2006 1 39 69.51917 . 2006 1 47.65592 2007 1 31 80.26433 . 2007 1 59.36463 2008 1 22 69.11573 . 2008 1 66.429634 2010 1 3 4.534889 . 2009 1 62.83172 2010 1 30 49.00176 26.768324 2010 1 62.83172 2011 1 8 76.37936 . 2011 1 42.44352 end
finally, I substracted v1 from wma1 and then obtained its absolute value. The variable I am aiming at is abswma1v1 (the first value of "wma1v1" was arbitrarily set to 0)
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input int year_failure byte(failure id) float(v1_year_failure averageties) int duration byte id_1 float(wma1 v1 wma1v1 abswma1v1) 1996 1 5 35.22383 . 1996 1 0 43.95443 0 0 1998 1 2 47.81606 . 1997 1 35.22383 43.95443 -8.7306 8.7306 1998 1 37 60.95007 . 1998 1 35.22383 48.30526 -13.08143 13.08143 1998 1 47 57.08948 55.2852 1999 1 45.25452 79.81371 -34.55919 34.55919 1999 1 19 69.31047 . 2000 1 57.28249 79.81371 -22.53122 22.53122 2000 1 6 7.001259 . 2001 1 47.21923 79.81371 -32.59448 32.59448 2000 1 11 88.99662 . 2002 1 47.21923 79.81371 -32.59448 32.59448 2000 1 28 27.49802 41.1653 2003 1 47.21923 48.30526 -1.08603 1.08603 2004 1 26 24.74232 . 2004 1 47.21923 42.45374 4.76549 4.76549 2004 1 45 65.38895 45.06564 2005 1 47.65592 42.45374 5.20218 5.20218 2006 1 39 69.51917 . 2006 1 47.65592 42.45374 5.20218 5.20218 2007 1 31 80.26433 . 2007 1 59.36463 41.5652 17.79943 17.79943 2008 1 22 69.11573 . 2008 1 66.429634 41.96951 24.46012 24.46012 2010 1 3 4.534889 . 2009 1 62.83172 41.65173 21.17999 21.17999 2010 1 30 49.00176 26.768324 2010 1 62.83172 37.04457 25.78715 25.78715 2011 1 8 76.37936 . 2011 1 42.44352 37.04457 5.39895 5.39895 end
Comment