weighted moving average panel data

Victor Cruz

Join Date: Jun 2014

Posts: 54
#1

weighted moving average panel data

20 Sep 2016, 13:18

Dear Stata list community,

I have been computing a moving weighted average in excel, but I am sure there is a more efficient (and less error prone) way to do this in Stata; advice on the matter will be as always very much appreciated. I had a look in https://goo.gl/PkW4mI & https://goo.gl/Jg27wd, where Nick Cox proposed:

Code:

egen, filter() ///alternative 1 generate movingaverage = (F1.myvar + myvar + L1.myvar) / 3 ///alternative 2 mvsumm /// alternative 3

I have a multiple-record-per-subject data set; each subject ("id") was observed annually ("duration") from 1998 - 2010; a.k.a. panel or Time Series Cross Section data). There are 50 subjects.
"year_failure" specifies the year in which the dependent variable ("failure") was coded "1"
"failure" binary (dependent) variable
"v1" nummeric covariate
"v1_year_failure" value of "v1" for a given subject, when "failure" was coded as "1". the value of v1_year_failure corresponds to time when the variable "failure" was coded "1".

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input int year_failure byte(failure id) float v1_year_failure 1996 1 5 35.22383 1998 1 2 47.81606 1998 1 37 60.95007 1998 1 47 57.08948 1999 1 19 69.31047 2000 1 6 7.001259 2000 1 11 88.99662 2000 1 28 27.49802 2004 1 26 24.74232 2004 1 45 65.38895 2006 1 39 69.51917 2007 1 31 80.26433 2008 1 22 69.11573 2010 1 3 4.534889 2010 1 30 49.00176 2011 1 8 76.37936 end

The formula that I want to use to calculate the weighted moving average is as follows:

abswma1v1 = ABS([(MostRecent "v1_year_failure" + average of previous "v1_year_failure"/2] − v1)

note that in years 1998, 2000, 2004 and 2010, more than one subjects experienced the event (aka tied failures). From those tied failures, I want them to contribute the average of v1_year_failure. I computed this in excel ("averageties") and placed the resulting average in the last cell of the tied events, e.g., subjects 2, 37 & 47 experienced the event in 1998. The average of their corresponding values (v1_year_failure) was placed under "averageties".

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input int year_failure byte(failure id) float(v1_year_failure averageties) 1996 1 5 35.22383 . 1998 1 2 47.81606 . 1998 1 37 60.95007 . 1998 1 47 57.08948 55.2852 1999 1 19 69.31047 . 2000 1 6 7.001259 . 2000 1 11 88.99662 . 2000 1 28 27.49802 41.1653 2004 1 26 24.74232 . 2004 1 45 65.38895 45.06564 2006 1 39 69.51917 . 2007 1 31 80.26433 . 2008 1 22 69.11573 . 2010 1 3 4.534889 . 2010 1 30 49.00176 26.768324 2011 1 8 76.37936 . end

in order to compute the absolute weighted moving average for subject one (identiied with var "id_1), in excel I manually computed a moving average of "v1_year_failure" in one variable called "wma1" as follows:

in year 1996, I arbitrarily assigned 0
in 1997, the value of v1_year_failure in the year before (1996), i.e., 35.22383
in 1998, the value of v1_year_failure in the previous years (1996-1997), i.e., 35.22383
in 1999, the value of v1_year_failure in the previous years (1996-1998); however, in 1998 three subjects experienced the event, hence, the formula would be
[(v1_year_failure in 1996 + v1_year_failure in 1997) + (average of "v1_year_failure" in 1998)]/2, i.e., the 3 tied events from 1998 contribute as the average of "v1_year_failure" in 1998
in 2000, the value of v1_year_failure in the previous years (1996-1999); the 3 tied events from 1998 keep contributing as the average of "v1_year_failure" in 1998:
[ [(v1_year_failure in 1996 + (average of "v1_year_failure" in 1998)]/2 + (v1_year_failure in 1999) ]/2
in 2001: [ [(v1_year_failure in 1996 + (average of "v1_year_failure" in 1998) + (v1_year_failure in 1999)/ 3 ] + (average of "v1_year_failure" in 2000 ) ] / 2
v1_year_failure in years 2002 - 2004: same value as in 2001, because there were no events (failure = 0 during this period)
v1_year_failure in year 2005:
[ [(v1_year_failure in 1996 + (average of "v1_year_failure" in 1998) + (v1_year_failure in 1999) + (average of "v1_year_failure" in 2000 ) / 4] + (average of "v1_year_failure" in 2004) ] /2
v1_year_failure in year 2006: same as in 2005, because there was no failure in year 2005
v1_year_failure in year 2007:
[ [(v1_year_failure in 1996 + (average of "v1_year_failure" in 1998) + (v1_year_failure in 1999) + (average of "v1_year_failure" in 2000 ) + (average of "v1_year_failure" in 2004 ) / 5] + v1_year_failure in 2006 ] / 2
v1_year_failure in years 2008 - 2011: formulae follows the same logic as above, omitted to save space.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input int year_failure byte(failure id) float(v1_year_failure averageties) int duration byte id_1 float wma1 1996 1 5 35.22383 . 1996 1 0 1998 1 2 47.81606 . 1997 1 35.22383 1998 1 37 60.95007 . 1998 1 35.22383 1998 1 47 57.08948 55.2852 1999 1 45.25452 1999 1 19 69.31047 . 2000 1 57.28249 2000 1 6 7.001259 . 2001 1 47.21923 2000 1 11 88.99662 . 2002 1 47.21923 2000 1 28 27.49802 41.1653 2003 1 47.21923 2004 1 26 24.74232 . 2004 1 47.21923 2004 1 45 65.38895 45.06564 2005 1 47.65592 2006 1 39 69.51917 . 2006 1 47.65592 2007 1 31 80.26433 . 2007 1 59.36463 2008 1 22 69.11573 . 2008 1 66.429634 2010 1 3 4.534889 . 2009 1 62.83172 2010 1 30 49.00176 26.768324 2010 1 62.83172 2011 1 8 76.37936 . 2011 1 42.44352 end

finally, I substracted v1 from wma1 and then obtained its absolute value. The variable I am aiming at is abswma1v1 (the first value of "wma1v1" was arbitrarily set to 0)

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input int year_failure byte(failure id) float(v1_year_failure averageties) int duration byte id_1 float(wma1 v1 wma1v1 abswma1v1) 1996 1 5 35.22383 . 1996 1 0 43.95443 0 0 1998 1 2 47.81606 . 1997 1 35.22383 43.95443 -8.7306 8.7306 1998 1 37 60.95007 . 1998 1 35.22383 48.30526 -13.08143 13.08143 1998 1 47 57.08948 55.2852 1999 1 45.25452 79.81371 -34.55919 34.55919 1999 1 19 69.31047 . 2000 1 57.28249 79.81371 -22.53122 22.53122 2000 1 6 7.001259 . 2001 1 47.21923 79.81371 -32.59448 32.59448 2000 1 11 88.99662 . 2002 1 47.21923 79.81371 -32.59448 32.59448 2000 1 28 27.49802 41.1653 2003 1 47.21923 48.30526 -1.08603 1.08603 2004 1 26 24.74232 . 2004 1 47.21923 42.45374 4.76549 4.76549 2004 1 45 65.38895 45.06564 2005 1 47.65592 42.45374 5.20218 5.20218 2006 1 39 69.51917 . 2006 1 47.65592 42.45374 5.20218 5.20218 2007 1 31 80.26433 . 2007 1 59.36463 41.5652 17.79943 17.79943 2008 1 22 69.11573 . 2008 1 66.429634 41.96951 24.46012 24.46012 2010 1 3 4.534889 . 2009 1 62.83172 41.65173 21.17999 21.17999 2010 1 30 49.00176 26.768324 2010 1 62.83172 37.04457 25.78715 25.78715 2011 1 8 76.37936 . 2011 1 42.44352 37.04457 5.39895 5.39895 end

below, the spreadsheet I have been using to calculate this variable in excel

Victor Cruz
Tags: egen, moving average, panel data, weighted moving average
Victor Cruz

Join Date: Jun 2014

Posts: 54
#2

21 Sep 2016, 01:35

Originally posted by Victor Cruz View Post

Dear Stata list community,

I have been computing a moving weighted average in excel, but I am sure there is a more efficient (and less error prone) way to do this in Stata; advice on the matter will be as always very much appreciated. I had a look in https://goo.gl/PkW4mI & https://goo.gl/Jg27wd, where Nick Cox proposed:

Code:

egen, filter() ///alternative 1 generate movingaverage = (F1.myvar + myvar + L1.myvar) / 3 ///alternative 2 mvsumm /// alternative 3

I have a multiple-record-per-subject data set; each subject ("id") was observed annually ("duration") from 1998 - 2010; a.k.a. panel or Time Series Cross Section data). There are 50 subjects.
"year_failure" specifies the year in which the dependent variable ("failure") was coded "1"
"failure" binary (dependent) variable
"v1" nummeric covariate
"v1_year_failure" value of "v1" for a given subject, when "failure" was coded as "1". the value of v1_year_failure corresponds to time when the variable "failure" was coded "1".

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input int year_failure byte(failure id) float v1_year_failure 1996 1 5 35.22383 1998 1 2 47.81606 1998 1 37 60.95007 1998 1 47 57.08948 1999 1 19 69.31047 2000 1 6 7.001259 2000 1 11 88.99662 2000 1 28 27.49802 2004 1 26 24.74232 2004 1 45 65.38895 2006 1 39 69.51917 2007 1 31 80.26433 2008 1 22 69.11573 2010 1 3 4.534889 2010 1 30 49.00176 2011 1 8 76.37936 end

The formula that I want to use to calculate the weighted moving average is as follows:

abswma1v1 = ABS([(MostRecent "v1_year_failure" + average of previous "v1_year_failure"/2] − v1)

note that in years 1998, 2000, 2004 and 2010, more than one subjects experienced the event (aka tied failures). From those tied failures, I want them to contribute the average of v1_year_failure. I computed this in excel ("averageties") and placed the resulting average in the last cell of the tied events, e.g., subjects 2, 37 & 47 experienced the event in 1998. The average of their corresponding values (v1_year_failure) was placed under "averageties".

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input int year_failure byte(failure id) float(v1_year_failure averageties) 1996 1 5 35.22383 . 1998 1 2 47.81606 . 1998 1 37 60.95007 . 1998 1 47 57.08948 55.2852 1999 1 19 69.31047 . 2000 1 6 7.001259 . 2000 1 11 88.99662 . 2000 1 28 27.49802 41.1653 2004 1 26 24.74232 . 2004 1 45 65.38895 45.06564 2006 1 39 69.51917 . 2007 1 31 80.26433 . 2008 1 22 69.11573 . 2010 1 3 4.534889 . 2010 1 30 49.00176 26.768324 2011 1 8 76.37936 . end

in order to compute the absolute weighted moving average for subject one (identiied with var "id_1), in excel I manually computed a moving average of "v1_year_failure" in one variable called "wma1" as follows:

in year 1996, I arbitrarily assigned 0
in 1997, the value of v1_year_failure in the year before (1996), i.e., 35.22383
in 1998, the value of v1_year_failure in the previous years (1996-1997), i.e., 35.22383
in 1999, the value of v1_year_failure in the previous years (1996-1998); however, in 1998 three subjects experienced the event, hence, the formula would be
[(v1_year_failure in 1996 + v1_year_failure in 1997) + (average of "v1_year_failure" in 1998)]/2, i.e., the 3 tied events from 1998 contribute as the average of "v1_year_failure" in 1998
in 2000, the value of v1_year_failure in the previous years (1996-1999); the 3 tied events from 1998 keep contributing as the average of "v1_year_failure" in 1998:
[ [(v1_year_failure in 1996 + (average of "v1_year_failure" in 1998)]/2 + (v1_year_failure in 1999) ]/2
in 2001: [ [(v1_year_failure in 1996 + (average of "v1_year_failure" in 1998) + (v1_year_failure in 1999)/ 3 ] + (average of "v1_year_failure" in 2000 ) ] / 2
v1_year_failure in years 2002 - 2004: same value as in 2001, because there were no events (failure = 0 during this period)
v1_year_failure in year 2005:
[ [(v1_year_failure in 1996 + (average of "v1_year_failure" in 1998) + (v1_year_failure in 1999) + (average of "v1_year_failure" in 2000 ) / 4] + (average of "v1_year_failure" in 2004) ] /2
v1_year_failure in year 2006: same as in 2005, because there was no failure in year 2005
v1_year_failure in year 2007:
[ [(v1_year_failure in 1996 + (average of "v1_year_failure" in 1998) + (v1_year_failure in 1999) + (average of "v1_year_failure" in 2000 ) + (average of "v1_year_failure" in 2004 ) / 5] + v1_year_failure in 2006 ] / 2
v1_year_failure in years 2008 - 2011: formulae follows the same logic as above, omitted to save space.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input int year_failure byte(failure id) float(v1_year_failure averageties) int duration byte id_1 float wma1 1996 1 5 35.22383 . 1996 1 0 1998 1 2 47.81606 . 1997 1 35.22383 1998 1 37 60.95007 . 1998 1 35.22383 1998 1 47 57.08948 55.2852 1999 1 45.25452 1999 1 19 69.31047 . 2000 1 57.28249 2000 1 6 7.001259 . 2001 1 47.21923 2000 1 11 88.99662 . 2002 1 47.21923 2000 1 28 27.49802 41.1653 2003 1 47.21923 2004 1 26 24.74232 . 2004 1 47.21923 2004 1 45 65.38895 45.06564 2005 1 47.65592 2006 1 39 69.51917 . 2006 1 47.65592 2007 1 31 80.26433 . 2007 1 59.36463 2008 1 22 69.11573 . 2008 1 66.429634 2010 1 3 4.534889 . 2009 1 62.83172 2010 1 30 49.00176 26.768324 2010 1 62.83172 2011 1 8 76.37936 . 2011 1 42.44352 end

finally, I substracted v1 from wma1 and then obtained its absolute value. The variable I am aiming at is abswma1v1 (the first value of "wma1v1" was arbitrarily set to 0)

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input int year_failure byte(failure id) float(v1_year_failure averageties) int duration byte id_1 float(wma1 v1 wma1v1 abswma1v1) 1996 1 5 35.22383 . 1996 1 0 43.95443 0 0 1998 1 2 47.81606 . 1997 1 35.22383 43.95443 -8.7306 8.7306 1998 1 37 60.95007 . 1998 1 35.22383 48.30526 -13.08143 13.08143 1998 1 47 57.08948 55.2852 1999 1 45.25452 79.81371 -34.55919 34.55919 1999 1 19 69.31047 . 2000 1 57.28249 79.81371 -22.53122 22.53122 2000 1 6 7.001259 . 2001 1 47.21923 79.81371 -32.59448 32.59448 2000 1 11 88.99662 . 2002 1 47.21923 79.81371 -32.59448 32.59448 2000 1 28 27.49802 41.1653 2003 1 47.21923 48.30526 -1.08603 1.08603 2004 1 26 24.74232 . 2004 1 47.21923 42.45374 4.76549 4.76549 2004 1 45 65.38895 45.06564 2005 1 47.65592 42.45374 5.20218 5.20218 2006 1 39 69.51917 . 2006 1 47.65592 42.45374 5.20218 5.20218 2007 1 31 80.26433 . 2007 1 59.36463 41.5652 17.79943 17.79943 2008 1 22 69.11573 . 2008 1 66.429634 41.96951 24.46012 24.46012 2010 1 3 4.534889 . 2009 1 62.83172 41.65173 21.17999 21.17999 2010 1 30 49.00176 26.768324 2010 1 62.83172 37.04457 25.78715 25.78715 2011 1 8 76.37936 . 2011 1 42.44352 37.04457 5.39895 5.39895 end

below, the spreadsheet I have been using to calculate this variable in excel
[ATTACH=CONFIG]n1357362[/ATTACH]

For clarity, let me summarize what I am trying to do:

I want to capture in a variable (called “absolutedistance”) the absolute value of the distance of a numeric variable (called “v1”) between a given subject (identified with var “id”; total 50 subjects) in a given year (identified with var “duration”; total 13 years) that is susceptible to get sick (to experience the “event”, captured by the dependent/outcome binary var called “sick”; coded 1 when a subject experiences the event in a given year, 0 otherwise) and the value of “v1” of all the subjects that got sick (subjects that have previously experienced the event).

The value for the subjects that previously fell sick is the subjects’ value of “v1” at the time of experiencing the event (in the year when the subjects got sick). The value of “v1” at the time of falling sick is a weighted moving average wherein the most recent subject to fall sick has the same weight as all other sick subjects. Thus, for each year in the data set, I wish to calculate the distance ( “absolutedistance”) for each healthy subject using the following formula:

“sick” can be experienced more than once, but for simplicity, let us focus on a data set where we are interested in the time until the first “sick” (obs. after the first event would be “right censored” and dropped from the analysis; the data will be analyzed with survival/duration techniques).

Now, things get more complicated when we consider that some subjects experienced the event (got sick) in the same year (let´s identify these as “ties”). “ties” contribute to the “average of v1 of all other subjects that fell sick before” with the “average of v1 of those ties”. E.g., three subjects fell sick in year 2000. Those “ties” will contribute to the “average of v1 of all other subjects that fell sick before” from year 2001 onwards with the average of v1 of those three subjects in year 2000.

I will appreciate comments on this.

Last edited by Victor Cruz; 21 Sep 2016, 01:37.

Victor Cruz
Comment
Victor Cruz

Join Date: Jun 2014

Posts: 54
#3

21 Sep 2016, 10:51

cross-posted http://stackoverflow.com/questions/3...age-stata-or-r, to see whether someone proposes a workaround in R

Victor Cruz
Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

21 Sep 2016, 14:54

I'm not sure I understand all the details of your problem but the following code replicates your results for the wma1 variable. You'll need to install rangestat. To do so, type in Stata's Command window:

Code:

ssc install rangestat

I start from the last dataex listing in #1.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int year_failure byte(failure id) float(v1_year_failure averageties) int duration byte id_1 float(wma1 v1 wma1v1 abswma1v1)
1996 1  5 35.22383         . 1996 1         0 43.95443         0        0
1998 1  2 47.81606         . 1997 1  35.22383 43.95443   -8.7306   8.7306
1998 1 37 60.95007         . 1998 1  35.22383 48.30526 -13.08143 13.08143
1998 1 47 57.08948   55.2852 1999 1  45.25452 79.81371 -34.55919 34.55919
1999 1 19 69.31047         . 2000 1  57.28249 79.81371 -22.53122 22.53122
2000 1  6 7.001259         . 2001 1  47.21923 79.81371 -32.59448 32.59448
2000 1 11 88.99662         . 2002 1  47.21923 79.81371 -32.59448 32.59448
2000 1 28 27.49802   41.1653 2003 1  47.21923 48.30526  -1.08603  1.08603
2004 1 26 24.74232         . 2004 1  47.21923 42.45374   4.76549  4.76549
2004 1 45 65.38895  45.06564 2005 1  47.65592 42.45374   5.20218  5.20218
2006 1 39 69.51917         . 2006 1  47.65592 42.45374   5.20218  5.20218
2007 1 31 80.26433         . 2007 1  59.36463  41.5652  17.79943 17.79943
2008 1 22 69.11573         . 2008 1 66.429634 41.96951  24.46012 24.46012
2010 1  3 4.534889         . 2009 1  62.83172 41.65173  21.17999 21.17999
2010 1 30 49.00176 26.768324 2010 1  62.83172 37.04457  25.78715 25.78715
2011 1  8 76.37936         . 2011 1  42.44352 37.04457   5.39895  5.39895
end

isid year_failure id, sort

* mean failure per year_failure
by year_failure: egen mfy = mean(v1_year_failure)

* use only one mfy value per year_failure
by year_failure: gen mfy1 = mfy if _n == 1

* average mfy, from the start to 2 year before duration year
gen low = 0
gen high = duration - 2
rangestat (mean) m2 = mfy1, interval(year_failure low high)

* look-up mfy for year prior to duration year
replace low = duration - 1
replace high = low
rangestat (mean) m1 = mfy1, interval(year_failure low high)

* calculate measure
gen w = (m1 + m2) / 2

* special case for the first two obs
replace w = 0 in 1
replace w = m1 in 2

* carry forward the value of the measure and compare to wma1
replace w = w[_n-1] if mi(w)
gen diff = abs(wma1 - w)

list year_failure id v1_year_failure duration wma1 w diff, sepby(year_failure)

and the results

Code:

. list year_failure id v1_year_failure duration wma1 w diff, sepby(year_failure)

     +----------------------------------------------------------------------+
     | year_f~e   id   v1_yea~e   duration       wma1          w       diff |
     |----------------------------------------------------------------------|
  1. |     1996    5   35.22383       1996          0          0          0 |
     |----------------------------------------------------------------------|
  2. |     1998    2   47.81606       1997   35.22383   35.22383          0 |
  3. |     1998   37   60.95007       1998   35.22383   35.22383          0 |
  4. |     1998   47   57.08948       1999   45.25452   45.25452   3.81e-06 |
     |----------------------------------------------------------------------|
  5. |     1999   19   69.31047       2000   57.28249   57.28249   3.81e-06 |
     |----------------------------------------------------------------------|
  6. |     2000    6   7.001259       2001   47.21923   47.21923   3.81e-06 |
  7. |     2000   11   88.99662       2002   47.21923   47.21923   3.81e-06 |
  8. |     2000   28   27.49802       2003   47.21923   47.21923   3.81e-06 |
     |----------------------------------------------------------------------|
  9. |     2004   26   24.74232       2004   47.21923   47.21923   3.81e-06 |
 10. |     2004   45   65.38895       2005   47.65592   47.65591   3.81e-06 |
     |----------------------------------------------------------------------|
 11. |     2006   39   69.51917       2006   47.65592   47.65591   3.81e-06 |
     |----------------------------------------------------------------------|
 12. |     2007   31   80.26433       2007   59.36463   59.36463          0 |
     |----------------------------------------------------------------------|
 13. |     2008   22   69.11573       2008   66.42963   66.42963          0 |
     |----------------------------------------------------------------------|
 14. |     2010    3   4.534889       2009   62.83172   62.83172          0 |
 15. |     2010   30   49.00176       2010   62.83172   62.83172          0 |
     |----------------------------------------------------------------------|
 16. |     2011    8   76.37936       2011   42.44352   42.44352   3.81e-06 |
     +----------------------------------------------------------------------+

Comment

Victor Cruz

Join Date: Jun 2014
Posts: 54

26 Sep 2016, 02:11

It did work, Robert Picard, thanks a lot!
However, I didn't describe accurately my data, so I understand why my objective might seem confusing.
Below, the way I would use Robert's kind and effective commands to calculate "w" (the weighted moving average). I would appreciate suggestions on how to improve my rather complicated way to use Robert's cmds on my data (again, Robert's suggestion was absolutely accurate, it is my bad that I describe my data accurately).

The goal is to capture in a variable (called “absolutedistance”) the absolute value of the distance of a numeric variable (called “v1”) between a given subject (var “id”; total 50 subjects) in a given year (var “year”; total 13 years) that is susceptible to get sick (to experience the “event”, captured by var “failure”; coded 1 when a subject experiences the event in a given year, 0 otherwise) and the value of “v1” of all the subjects that got sick (subjects that have previously experienced the event).

This is the data of 5 subjects. Let's refer to this data as "5 subjects data"

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double id float(year v1 failure)
1 1996  43.95443 0
1 1997  43.95443 0
1 1998  48.30526 0
1 1999  79.81371 0
1 2000  79.81371 0
1 2001  79.81371 0
1 2002  79.81371 0
1 2003  48.30526 0
1 2004  42.45374 0
1 2005  42.45374 0
1 2006  42.45374 0
1 2007   41.5652 0
1 2008  41.96951 0
1 2009  41.65173 0
1 2010  37.04457 0
1 2011  37.04457 0
2 1996  49.05129 0
2 1997  47.60572 0
2 1998  47.81606 1
2 1999  47.81606 0
2 2000  47.75643 0
2 2001  47.75643 0
2 2002  47.53053 0
2 2003  27.47109 0
2 2004  31.24092 0
2 2005  31.24092 0
2 2006  31.24092 0
2 2007  31.24092 0
2 2008  34.31742 0
2 2009  38.41942 0
2 2010  37.38018 0
2 2011  37.38018 0
3 1996  4.051216 0
3 1997  4.051216 0
3 1998  4.051216 0
3 1999  9.530586 0
3 2000  9.530586 0
3 2001 14.325027 0
3 2002 12.609268 0
3 2003  46.97109 0
3 2004  51.31606 0
3 2005  48.43472 0
3 2006  48.43472 0
3 2007  55.63807 0
3 2008  45.78523 0
3 2009  4.534889 0
3 2010         0 1
3 2011         0 0
4 1996  61.67141 0
4 1997  47.03312 0
4 1998  47.83461 0
4 1999  47.83461 0
4 2000  45.12806 0
4 2001  45.12806 0
4 2002  44.52393 0
4 2003  44.52393 0
4 2004  43.88434 0
4 2005  43.88434 0
4 2006  43.88434 0
4 2007  71.22043 0
4 2008  71.22043 0
4 2009  71.22043 0
4 2010  71.22043 0
4 2011  71.22043 0
5 1996  35.22383 1
5 1997  45.39093 0
5 1998  44.97823 0
5 1999  88.42124 0
5 2000  87.61613 0
5 2001  87.99413 0
5 2002  87.21405 0
5 2003  87.21405 0
5 2004  49.17004 0
5 2005  49.17004 0
5 2006  48.44385 0
5 2007  48.44385 0
5 2008  47.57368 0
5 2009  47.97112 0
5 2010  47.73407 0
5 2011  47.73407 0
end

The value for the subjects that previously fell sick is the subjects’ value of “v1” at the time of experiencing the event (in the year when the subjects got sick). The value of “v1” at the time of falling sick is a weighted moving average wherein the most recent subject to fall sick has the same weight as all other sick subjects. Thus, for each year in the data set, I wish to calculate the distance ( “absolutedistance”) for each healthy subject using the following formula (from my previous post):
[ATTACH=CONFIG]n1357416[/ATTACH]
In order to obtain the first 4 columns of my previous post, I just asked Stata to keep the value of "v1" and year of subjects that experienced the event and rename accordingly. Let's refer to this as "v1 when failed data"

Code:

rename (year v1) (year_failure v1_year_failure)
keep if failure==1

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double id float(year_failure v1_year_failure failure)
 2 1998  47.81606 1
 3 2010         0 1
 5 1996  35.22383 1
 6 2000  7.001259 1
 8 2011  76.37936 1
11 2000  88.99662 1
19 1999  69.31047 1
22 2008  69.11573 1
26 2004  24.74232 1
28 2000 27.498024 1
30 2010  49.00176 1
31 2007  80.26433 1
37 1998  60.95007 1
39 2006  69.51917 1
45 2004 65.388954 1
47 1998  57.08948 1
end

Given that the weighted moving average (called "wma1" by me in the previous post, "w" by Robert) would be the same for all subjects, what I could do with Robert's commands above is to "merge" ("pasted", for which I had to change the names of the vars year--changed to duration-- and id--changed to id_1) the "v1 when failed data" with the obs of only one subject from the "5 subjects data", which would end up in the data set that Robert used for his kind solution:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int year_failure byte(failure id) float v1_year_failure int duration byte id_1 float(wma1 v1)
1996 1  5 35.22383 1996 1         0 43.95443
1998 1  2 47.81606 1997 1  35.22383 43.95443
1998 1 37 60.95007 1998 1  35.22383 48.30526
1998 1 47 57.08948 1999 1  45.25452 79.81371
1999 1 19 69.31047 2000 1  57.28249 79.81371
2000 1  6 7.001259 2001 1  47.21923 79.81371
2000 1 11 88.99662 2002 1  47.21923 79.81371
2000 1 28 27.49802 2003 1  47.21923 48.30526
2004 1 26 24.74232 2004 1  47.21923 42.45374
2004 1 45 65.38895 2005 1  47.65592 42.45374
2006 1 39 69.51917 2006 1  47.65592 42.45374
2007 1 31 80.26433 2007 1  59.36463  41.5652
2008 1 22 69.11573 2008 1 66.429634 41.96951
2010 1  3 4.534889 2009 1  62.83172 41.65173
2010 1 30 49.00176 2010 1  62.83172 37.04457
2011 1  8 76.37936 2011 1  42.44352 37.04457
end

Following Robert's instructions, one can end up with "w" (the weighted moving average) for subject one, which would be the same for all years for all subjects (I guess I will have to paste "w" manually in excel and then merge it to the original .dta with obs for 50); Once in the .dta of the 50 subjects, I would substract "w" from "v1", obtain the absolute value of that difference and that would be it.

Last edited by Victor Cruz; 26 Sep 2016, 02:14.

Victor Cruz

Announcement