Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • weighted moving average panel data

    Dear Stata list community,

    I have been computing a moving weighted average in excel, but I am sure there is a more efficient (and less error prone) way to do this in Stata; advice on the matter will be as always very much appreciated. I had a look in https://goo.gl/PkW4mI & https://goo.gl/Jg27wd, where Nick Cox proposed:
    Code:
    egen, filter() ///alternative 1
    generate movingaverage = (F1.myvar + myvar + L1.myvar) / 3 ///alternative 2
    mvsumm /// alternative 3
    I have a multiple-record-per-subject data set; each subject ("id") was observed annually ("duration") from 1998 - 2010; a.k.a. panel or Time Series Cross Section data). There are 50 subjects.
    "year_failure" specifies the year in which the dependent variable ("failure") was coded "1"
    "failure" binary (dependent) variable
    "v1" nummeric covariate
    "v1_year_failure" value of "v1" for a given subject, when "failure" was coded as "1". the value of v1_year_failure corresponds to time when the variable "failure" was coded "1".


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int year_failure byte(failure id) float v1_year_failure
    1996 1  5 35.22383
    1998 1  2 47.81606
    1998 1 37 60.95007
    1998 1 47 57.08948
    1999 1 19 69.31047
    2000 1  6 7.001259
    2000 1 11 88.99662
    2000 1 28 27.49802
    2004 1 26 24.74232
    2004 1 45 65.38895
    2006 1 39 69.51917
    2007 1 31 80.26433
    2008 1 22 69.11573
    2010 1  3 4.534889
    2010 1 30 49.00176
    2011 1  8 76.37936
    end
    The formula that I want to use to calculate the weighted moving average is as follows:

    abswma1v1 = ABS([(MostRecent "v1_year_failure" + average of previous "v1_year_failure"/2] − v1)


    note that in years 1998, 2000, 2004 and 2010, more than one subjects experienced the event (aka tied failures). From those tied failures, I want them to contribute the average of v1_year_failure. I computed this in excel ("averageties") and placed the resulting average in the last cell of the tied events, e.g., subjects 2, 37 & 47 experienced the event in 1998. The average of their corresponding values (v1_year_failure) was placed under "averageties".

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int year_failure byte(failure id) float(v1_year_failure averageties)
    1996 1  5 35.22383         .
    1998 1  2 47.81606         .
    1998 1 37 60.95007         .
    1998 1 47 57.08948   55.2852
    1999 1 19 69.31047         .
    2000 1  6 7.001259         .
    2000 1 11 88.99662         .
    2000 1 28 27.49802   41.1653
    2004 1 26 24.74232         .
    2004 1 45 65.38895  45.06564
    2006 1 39 69.51917         .
    2007 1 31 80.26433         .
    2008 1 22 69.11573         .
    2010 1  3 4.534889         .
    2010 1 30 49.00176 26.768324
    2011 1  8 76.37936         .
    end
    in order to compute the absolute weighted moving average for subject one (identiied with var "id_1), in excel I manually computed a moving average of "v1_year_failure" in one variable called "wma1" as follows:

    in year 1996, I arbitrarily assigned 0
    in 1997, the value of v1_year_failure in the year before (1996), i.e., 35.22383
    in 1998, the value of v1_year_failure in the previous years (1996-1997), i.e., 35.22383
    in 1999, the value of v1_year_failure in the previous years (1996-1998); however, in 1998 three subjects experienced the event, hence, the formula would be
    [(v1_year_failure in 1996 + v1_year_failure in 1997) + (average of "v1_year_failure" in 1998)]/2, i.e., the 3 tied events from 1998 contribute as the average of "v1_year_failure" in 1998
    in 2000, the value of v1_year_failure in the previous years (1996-1999); the 3 tied events from 1998 keep contributing as the average of "v1_year_failure" in 1998:
    [ [(v1_year_failure in 1996 + (average of "v1_year_failure" in 1998)]/2 + (v1_year_failure in 1999) ]/2
    in 2001: [ [(v1_year_failure in 1996 + (average of "v1_year_failure" in 1998) + (v1_year_failure in 1999)/ 3 ] + (average of "v1_year_failure" in 2000 ) ] / 2
    v1_year_failure in years 2002 - 2004: same value as in 2001, because there were no events (failure = 0 during this period)
    v1_year_failure in year 2005:
    [ [(v1_year_failure in 1996 + (average of "v1_year_failure" in 1998) + (v1_year_failure in 1999) + (average of "v1_year_failure" in 2000 ) / 4] + (average of "v1_year_failure" in 2004) ] /2
    v1_year_failure in year 2006: same as in 2005, because there was no failure in year 2005
    v1_year_failure in year 2007:
    [ [(v1_year_failure in 1996 + (average of "v1_year_failure" in 1998) + (v1_year_failure in 1999) + (average of "v1_year_failure" in 2000 ) + (average of "v1_year_failure" in 2004 ) / 5] + v1_year_failure in 2006 ] / 2
    v1_year_failure in years 2008 - 2011: formulae follows the same logic as above, omitted to save space.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int year_failure byte(failure id) float(v1_year_failure averageties) int duration byte id_1 float wma1
    1996 1  5 35.22383         . 1996 1         0
    1998 1  2 47.81606         . 1997 1  35.22383
    1998 1 37 60.95007         . 1998 1  35.22383
    1998 1 47 57.08948   55.2852 1999 1  45.25452
    1999 1 19 69.31047         . 2000 1  57.28249
    2000 1  6 7.001259         . 2001 1  47.21923
    2000 1 11 88.99662         . 2002 1  47.21923
    2000 1 28 27.49802   41.1653 2003 1  47.21923
    2004 1 26 24.74232         . 2004 1  47.21923
    2004 1 45 65.38895  45.06564 2005 1  47.65592
    2006 1 39 69.51917         . 2006 1  47.65592
    2007 1 31 80.26433         . 2007 1  59.36463
    2008 1 22 69.11573         . 2008 1 66.429634
    2010 1  3 4.534889         . 2009 1  62.83172
    2010 1 30 49.00176 26.768324 2010 1  62.83172
    2011 1  8 76.37936         . 2011 1  42.44352
    end

    finally, I substracted v1 from wma1 and then obtained its absolute value. The variable I am aiming at is abswma1v1 (the first value of "wma1v1" was arbitrarily set to 0)

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int year_failure byte(failure id) float(v1_year_failure averageties) int duration byte id_1 float(wma1 v1 wma1v1 abswma1v1)
    1996 1  5 35.22383         . 1996 1         0 43.95443         0        0
    1998 1  2 47.81606         . 1997 1  35.22383 43.95443   -8.7306   8.7306
    1998 1 37 60.95007         . 1998 1  35.22383 48.30526 -13.08143 13.08143
    1998 1 47 57.08948   55.2852 1999 1  45.25452 79.81371 -34.55919 34.55919
    1999 1 19 69.31047         . 2000 1  57.28249 79.81371 -22.53122 22.53122
    2000 1  6 7.001259         . 2001 1  47.21923 79.81371 -32.59448 32.59448
    2000 1 11 88.99662         . 2002 1  47.21923 79.81371 -32.59448 32.59448
    2000 1 28 27.49802   41.1653 2003 1  47.21923 48.30526  -1.08603  1.08603
    2004 1 26 24.74232         . 2004 1  47.21923 42.45374   4.76549  4.76549
    2004 1 45 65.38895  45.06564 2005 1  47.65592 42.45374   5.20218  5.20218
    2006 1 39 69.51917         . 2006 1  47.65592 42.45374   5.20218  5.20218
    2007 1 31 80.26433         . 2007 1  59.36463  41.5652  17.79943 17.79943
    2008 1 22 69.11573         . 2008 1 66.429634 41.96951  24.46012 24.46012
    2010 1  3 4.534889         . 2009 1  62.83172 41.65173  21.17999 21.17999
    2010 1 30 49.00176 26.768324 2010 1  62.83172 37.04457  25.78715 25.78715
    2011 1  8 76.37936         . 2011 1  42.44352 37.04457   5.39895  5.39895
    end
    below, the spreadsheet I have been using to calculate this variable in excel
    Click image for larger version

Name:	Capture.PNG
Views:	1
Size:	81.1 KB
ID:	1357362

    Victor Cruz

  • #2
    Originally posted by Victor Cruz View Post
    Dear Stata list community,

    I have been computing a moving weighted average in excel, but I am sure there is a more efficient (and less error prone) way to do this in Stata; advice on the matter will be as always very much appreciated. I had a look in https://goo.gl/PkW4mI & https://goo.gl/Jg27wd, where Nick Cox proposed:
    Code:
    egen, filter() ///alternative 1
    generate movingaverage = (F1.myvar + myvar + L1.myvar) / 3 ///alternative 2
    mvsumm /// alternative 3
    I have a multiple-record-per-subject data set; each subject ("id") was observed annually ("duration") from 1998 - 2010; a.k.a. panel or Time Series Cross Section data). There are 50 subjects.
    "year_failure" specifies the year in which the dependent variable ("failure") was coded "1"
    "failure" binary (dependent) variable
    "v1" nummeric covariate
    "v1_year_failure" value of "v1" for a given subject, when "failure" was coded as "1". the value of v1_year_failure corresponds to time when the variable "failure" was coded "1".


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int year_failure byte(failure id) float v1_year_failure
    1996 1 5 35.22383
    1998 1 2 47.81606
    1998 1 37 60.95007
    1998 1 47 57.08948
    1999 1 19 69.31047
    2000 1 6 7.001259
    2000 1 11 88.99662
    2000 1 28 27.49802
    2004 1 26 24.74232
    2004 1 45 65.38895
    2006 1 39 69.51917
    2007 1 31 80.26433
    2008 1 22 69.11573
    2010 1 3 4.534889
    2010 1 30 49.00176
    2011 1 8 76.37936
    end
    The formula that I want to use to calculate the weighted moving average is as follows:

    abswma1v1 = ABS([(MostRecent "v1_year_failure" + average of previous "v1_year_failure"/2] − v1)


    note that in years 1998, 2000, 2004 and 2010, more than one subjects experienced the event (aka tied failures). From those tied failures, I want them to contribute the average of v1_year_failure. I computed this in excel ("averageties") and placed the resulting average in the last cell of the tied events, e.g., subjects 2, 37 & 47 experienced the event in 1998. The average of their corresponding values (v1_year_failure) was placed under "averageties".

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int year_failure byte(failure id) float(v1_year_failure averageties)
    1996 1 5 35.22383 .
    1998 1 2 47.81606 .
    1998 1 37 60.95007 .
    1998 1 47 57.08948 55.2852
    1999 1 19 69.31047 .
    2000 1 6 7.001259 .
    2000 1 11 88.99662 .
    2000 1 28 27.49802 41.1653
    2004 1 26 24.74232 .
    2004 1 45 65.38895 45.06564
    2006 1 39 69.51917 .
    2007 1 31 80.26433 .
    2008 1 22 69.11573 .
    2010 1 3 4.534889 .
    2010 1 30 49.00176 26.768324
    2011 1 8 76.37936 .
    end
    in order to compute the absolute weighted moving average for subject one (identiied with var "id_1), in excel I manually computed a moving average of "v1_year_failure" in one variable called "wma1" as follows:

    in year 1996, I arbitrarily assigned 0
    in 1997, the value of v1_year_failure in the year before (1996), i.e., 35.22383
    in 1998, the value of v1_year_failure in the previous years (1996-1997), i.e., 35.22383
    in 1999, the value of v1_year_failure in the previous years (1996-1998); however, in 1998 three subjects experienced the event, hence, the formula would be
    [(v1_year_failure in 1996 + v1_year_failure in 1997) + (average of "v1_year_failure" in 1998)]/2, i.e., the 3 tied events from 1998 contribute as the average of "v1_year_failure" in 1998
    in 2000, the value of v1_year_failure in the previous years (1996-1999); the 3 tied events from 1998 keep contributing as the average of "v1_year_failure" in 1998:
    [ [(v1_year_failure in 1996 + (average of "v1_year_failure" in 1998)]/2 + (v1_year_failure in 1999) ]/2
    in 2001: [ [(v1_year_failure in 1996 + (average of "v1_year_failure" in 1998) + (v1_year_failure in 1999)/ 3 ] + (average of "v1_year_failure" in 2000 ) ] / 2
    v1_year_failure in years 2002 - 2004: same value as in 2001, because there were no events (failure = 0 during this period)
    v1_year_failure in year 2005:
    [ [(v1_year_failure in 1996 + (average of "v1_year_failure" in 1998) + (v1_year_failure in 1999) + (average of "v1_year_failure" in 2000 ) / 4] + (average of "v1_year_failure" in 2004) ] /2
    v1_year_failure in year 2006: same as in 2005, because there was no failure in year 2005
    v1_year_failure in year 2007:
    [ [(v1_year_failure in 1996 + (average of "v1_year_failure" in 1998) + (v1_year_failure in 1999) + (average of "v1_year_failure" in 2000 ) + (average of "v1_year_failure" in 2004 ) / 5] + v1_year_failure in 2006 ] / 2
    v1_year_failure in years 2008 - 2011: formulae follows the same logic as above, omitted to save space.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int year_failure byte(failure id) float(v1_year_failure averageties) int duration byte id_1 float wma1
    1996 1 5 35.22383 . 1996 1 0
    1998 1 2 47.81606 . 1997 1 35.22383
    1998 1 37 60.95007 . 1998 1 35.22383
    1998 1 47 57.08948 55.2852 1999 1 45.25452
    1999 1 19 69.31047 . 2000 1 57.28249
    2000 1 6 7.001259 . 2001 1 47.21923
    2000 1 11 88.99662 . 2002 1 47.21923
    2000 1 28 27.49802 41.1653 2003 1 47.21923
    2004 1 26 24.74232 . 2004 1 47.21923
    2004 1 45 65.38895 45.06564 2005 1 47.65592
    2006 1 39 69.51917 . 2006 1 47.65592
    2007 1 31 80.26433 . 2007 1 59.36463
    2008 1 22 69.11573 . 2008 1 66.429634
    2010 1 3 4.534889 . 2009 1 62.83172
    2010 1 30 49.00176 26.768324 2010 1 62.83172
    2011 1 8 76.37936 . 2011 1 42.44352
    end

    finally, I substracted v1 from wma1 and then obtained its absolute value. The variable I am aiming at is abswma1v1 (the first value of "wma1v1" was arbitrarily set to 0)

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int year_failure byte(failure id) float(v1_year_failure averageties) int duration byte id_1 float(wma1 v1 wma1v1 abswma1v1)
    1996 1 5 35.22383 . 1996 1 0 43.95443 0 0
    1998 1 2 47.81606 . 1997 1 35.22383 43.95443 -8.7306 8.7306
    1998 1 37 60.95007 . 1998 1 35.22383 48.30526 -13.08143 13.08143
    1998 1 47 57.08948 55.2852 1999 1 45.25452 79.81371 -34.55919 34.55919
    1999 1 19 69.31047 . 2000 1 57.28249 79.81371 -22.53122 22.53122
    2000 1 6 7.001259 . 2001 1 47.21923 79.81371 -32.59448 32.59448
    2000 1 11 88.99662 . 2002 1 47.21923 79.81371 -32.59448 32.59448
    2000 1 28 27.49802 41.1653 2003 1 47.21923 48.30526 -1.08603 1.08603
    2004 1 26 24.74232 . 2004 1 47.21923 42.45374 4.76549 4.76549
    2004 1 45 65.38895 45.06564 2005 1 47.65592 42.45374 5.20218 5.20218
    2006 1 39 69.51917 . 2006 1 47.65592 42.45374 5.20218 5.20218
    2007 1 31 80.26433 . 2007 1 59.36463 41.5652 17.79943 17.79943
    2008 1 22 69.11573 . 2008 1 66.429634 41.96951 24.46012 24.46012
    2010 1 3 4.534889 . 2009 1 62.83172 41.65173 21.17999 21.17999
    2010 1 30 49.00176 26.768324 2010 1 62.83172 37.04457 25.78715 25.78715
    2011 1 8 76.37936 . 2011 1 42.44352 37.04457 5.39895 5.39895
    end
    below, the spreadsheet I have been using to calculate this variable in excel
    [ATTACH=CONFIG]n1357362[/ATTACH]
    For clarity, let me summarize what I am trying to do:

    I want to capture in a variable (called “absolutedistance”) the absolute value of the distance of a numeric variable (called “v1”) between a given subject (identified with var “id”; total 50 subjects) in a given year (identified with var “duration”; total 13 years) that is susceptible to get sick (to experience the “event”, captured by the dependent/outcome binary var called “sick”; coded 1 when a subject experiences the event in a given year, 0 otherwise) and the value of “v1” of all the subjects that got sick (subjects that have previously experienced the event).

    The value for the subjects that previously fell sick is the subjects’ value of “v1” at the time of experiencing the event (in the year when the subjects got sick). The value of “v1” at the time of falling sick is a weighted moving average wherein the most recent subject to fall sick has the same weight as all other sick subjects. Thus, for each year in the data set, I wish to calculate the distance ( “absolutedistance”) for each healthy subject using the following formula:
    Click image for larger version

Name:	Capture.PNG
Views:	1
Size:	11.8 KB
ID:	1357416

    “sick” can be experienced more than once, but for simplicity, let us focus on a data set where we are interested in the time until the first “sick” (obs. after the first event would be “right censored” and dropped from the analysis; the data will be analyzed with survival/duration techniques).

    Now, things get more complicated when we consider that some subjects experienced the event (got sick) in the same year (let´s identify these as “ties”). “ties” contribute to the “average of v1 of all other subjects that fell sick before” with the “average of v1 of those ties”. E.g., three subjects fell sick in year 2000. Those “ties” will contribute to the “average of v1 of all other subjects that fell sick before” from year 2001 onwards with the average of v1 of those three subjects in year 2000.

    I will appreciate comments on this.
    Last edited by Victor Cruz; 21 Sep 2016, 01:37.
    Victor Cruz

    Comment


    • #3
      cross-posted http://stackoverflow.com/questions/3...age-stata-or-r, to see whether someone proposes a workaround in R
      Victor Cruz

      Comment


      • #4
        I'm not sure I understand all the details of your problem but the following code replicates your results for the wma1 variable. You'll need to install rangestat. To do so, type in Stata's Command window:

        Code:
        ssc install rangestat
        I start from the last dataex listing in #1.

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input int year_failure byte(failure id) float(v1_year_failure averageties) int duration byte id_1 float(wma1 v1 wma1v1 abswma1v1)
        1996 1  5 35.22383         . 1996 1         0 43.95443         0        0
        1998 1  2 47.81606         . 1997 1  35.22383 43.95443   -8.7306   8.7306
        1998 1 37 60.95007         . 1998 1  35.22383 48.30526 -13.08143 13.08143
        1998 1 47 57.08948   55.2852 1999 1  45.25452 79.81371 -34.55919 34.55919
        1999 1 19 69.31047         . 2000 1  57.28249 79.81371 -22.53122 22.53122
        2000 1  6 7.001259         . 2001 1  47.21923 79.81371 -32.59448 32.59448
        2000 1 11 88.99662         . 2002 1  47.21923 79.81371 -32.59448 32.59448
        2000 1 28 27.49802   41.1653 2003 1  47.21923 48.30526  -1.08603  1.08603
        2004 1 26 24.74232         . 2004 1  47.21923 42.45374   4.76549  4.76549
        2004 1 45 65.38895  45.06564 2005 1  47.65592 42.45374   5.20218  5.20218
        2006 1 39 69.51917         . 2006 1  47.65592 42.45374   5.20218  5.20218
        2007 1 31 80.26433         . 2007 1  59.36463  41.5652  17.79943 17.79943
        2008 1 22 69.11573         . 2008 1 66.429634 41.96951  24.46012 24.46012
        2010 1  3 4.534889         . 2009 1  62.83172 41.65173  21.17999 21.17999
        2010 1 30 49.00176 26.768324 2010 1  62.83172 37.04457  25.78715 25.78715
        2011 1  8 76.37936         . 2011 1  42.44352 37.04457   5.39895  5.39895
        end
        
        isid year_failure id, sort
        
        * mean failure per year_failure
        by year_failure: egen mfy = mean(v1_year_failure)
        
        * use only one mfy value per year_failure
        by year_failure: gen mfy1 = mfy if _n == 1
        
        * average mfy, from the start to 2 year before duration year
        gen low = 0
        gen high = duration - 2
        rangestat (mean) m2 = mfy1, interval(year_failure low high)
        
        * look-up mfy for year prior to duration year
        replace low = duration - 1
        replace high = low
        rangestat (mean) m1 = mfy1, interval(year_failure low high)
        
        * calculate measure
        gen w = (m1 + m2) / 2
        
        * special case for the first two obs
        replace w = 0 in 1
        replace w = m1 in 2
        
        * carry forward the value of the measure and compare to wma1
        replace w = w[_n-1] if mi(w)
        gen diff = abs(wma1 - w)
        
        list year_failure id v1_year_failure duration wma1 w diff, sepby(year_failure)
        and the results
        Code:
        . list year_failure id v1_year_failure duration wma1 w diff, sepby(year_failure)
        
             +----------------------------------------------------------------------+
             | year_f~e   id   v1_yea~e   duration       wma1          w       diff |
             |----------------------------------------------------------------------|
          1. |     1996    5   35.22383       1996          0          0          0 |
             |----------------------------------------------------------------------|
          2. |     1998    2   47.81606       1997   35.22383   35.22383          0 |
          3. |     1998   37   60.95007       1998   35.22383   35.22383          0 |
          4. |     1998   47   57.08948       1999   45.25452   45.25452   3.81e-06 |
             |----------------------------------------------------------------------|
          5. |     1999   19   69.31047       2000   57.28249   57.28249   3.81e-06 |
             |----------------------------------------------------------------------|
          6. |     2000    6   7.001259       2001   47.21923   47.21923   3.81e-06 |
          7. |     2000   11   88.99662       2002   47.21923   47.21923   3.81e-06 |
          8. |     2000   28   27.49802       2003   47.21923   47.21923   3.81e-06 |
             |----------------------------------------------------------------------|
          9. |     2004   26   24.74232       2004   47.21923   47.21923   3.81e-06 |
         10. |     2004   45   65.38895       2005   47.65592   47.65591   3.81e-06 |
             |----------------------------------------------------------------------|
         11. |     2006   39   69.51917       2006   47.65592   47.65591   3.81e-06 |
             |----------------------------------------------------------------------|
         12. |     2007   31   80.26433       2007   59.36463   59.36463          0 |
             |----------------------------------------------------------------------|
         13. |     2008   22   69.11573       2008   66.42963   66.42963          0 |
             |----------------------------------------------------------------------|
         14. |     2010    3   4.534889       2009   62.83172   62.83172          0 |
         15. |     2010   30   49.00176       2010   62.83172   62.83172          0 |
             |----------------------------------------------------------------------|
         16. |     2011    8   76.37936       2011   42.44352   42.44352   3.81e-06 |
             +----------------------------------------------------------------------+

        Comment


        • #5
          It did work, Robert Picard, thanks a lot!
          However, I didn't describe accurately my data, so I understand why my objective might seem confusing.
          Below, the way I would use Robert's kind and effective commands to calculate "w" (the weighted moving average). I would appreciate suggestions on how to improve my rather complicated way to use Robert's cmds on my data (again, Robert's suggestion was absolutely accurate, it is my bad that I describe my data accurately).

          The goal is to capture in a variable (called “absolutedistance”) the absolute value of the distance of a numeric variable (called “v1”) between a given subject (var “id”; total 50 subjects) in a given year (var “year”; total 13 years) that is susceptible to get sick (to experience the “event”, captured by var “failure”; coded 1 when a subject experiences the event in a given year, 0 otherwise) and the value of “v1” of all the subjects that got sick (subjects that have previously experienced the event).

          This is the data of 5 subjects. Let's refer to this data as "5 subjects data"

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input double id float(year v1 failure)
          1 1996  43.95443 0
          1 1997  43.95443 0
          1 1998  48.30526 0
          1 1999  79.81371 0
          1 2000  79.81371 0
          1 2001  79.81371 0
          1 2002  79.81371 0
          1 2003  48.30526 0
          1 2004  42.45374 0
          1 2005  42.45374 0
          1 2006  42.45374 0
          1 2007   41.5652 0
          1 2008  41.96951 0
          1 2009  41.65173 0
          1 2010  37.04457 0
          1 2011  37.04457 0
          2 1996  49.05129 0
          2 1997  47.60572 0
          2 1998  47.81606 1
          2 1999  47.81606 0
          2 2000  47.75643 0
          2 2001  47.75643 0
          2 2002  47.53053 0
          2 2003  27.47109 0
          2 2004  31.24092 0
          2 2005  31.24092 0
          2 2006  31.24092 0
          2 2007  31.24092 0
          2 2008  34.31742 0
          2 2009  38.41942 0
          2 2010  37.38018 0
          2 2011  37.38018 0
          3 1996  4.051216 0
          3 1997  4.051216 0
          3 1998  4.051216 0
          3 1999  9.530586 0
          3 2000  9.530586 0
          3 2001 14.325027 0
          3 2002 12.609268 0
          3 2003  46.97109 0
          3 2004  51.31606 0
          3 2005  48.43472 0
          3 2006  48.43472 0
          3 2007  55.63807 0
          3 2008  45.78523 0
          3 2009  4.534889 0
          3 2010         0 1
          3 2011         0 0
          4 1996  61.67141 0
          4 1997  47.03312 0
          4 1998  47.83461 0
          4 1999  47.83461 0
          4 2000  45.12806 0
          4 2001  45.12806 0
          4 2002  44.52393 0
          4 2003  44.52393 0
          4 2004  43.88434 0
          4 2005  43.88434 0
          4 2006  43.88434 0
          4 2007  71.22043 0
          4 2008  71.22043 0
          4 2009  71.22043 0
          4 2010  71.22043 0
          4 2011  71.22043 0
          5 1996  35.22383 1
          5 1997  45.39093 0
          5 1998  44.97823 0
          5 1999  88.42124 0
          5 2000  87.61613 0
          5 2001  87.99413 0
          5 2002  87.21405 0
          5 2003  87.21405 0
          5 2004  49.17004 0
          5 2005  49.17004 0
          5 2006  48.44385 0
          5 2007  48.44385 0
          5 2008  47.57368 0
          5 2009  47.97112 0
          5 2010  47.73407 0
          5 2011  47.73407 0
          end
          The value for the subjects that previously fell sick is the subjects’ value of “v1” at the time of experiencing the event (in the year when the subjects got sick). The value of “v1” at the time of falling sick is a weighted moving average wherein the most recent subject to fall sick has the same weight as all other sick subjects. Thus, for each year in the data set, I wish to calculate the distance ( “absolutedistance”) for each healthy subject using the following formula (from my previous post):
          [ATTACH=CONFIG]n1357416[/ATTACH]
          In order to obtain the first 4 columns of my previous post, I just asked Stata to keep the value of "v1" and year of subjects that experienced the event and rename accordingly. Let's refer to this as "v1 when failed data"

          Code:
          rename (year v1) (year_failure v1_year_failure)
          keep if failure==1
          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input double id float(year_failure v1_year_failure failure)
           2 1998  47.81606 1
           3 2010         0 1
           5 1996  35.22383 1
           6 2000  7.001259 1
           8 2011  76.37936 1
          11 2000  88.99662 1
          19 1999  69.31047 1
          22 2008  69.11573 1
          26 2004  24.74232 1
          28 2000 27.498024 1
          30 2010  49.00176 1
          31 2007  80.26433 1
          37 1998  60.95007 1
          39 2006  69.51917 1
          45 2004 65.388954 1
          47 1998  57.08948 1
          end
          Given that the weighted moving average (called "wma1" by me in the previous post, "w" by Robert) would be the same for all subjects, what I could do with Robert's commands above is to "merge" ("pasted", for which I had to change the names of the vars year--changed to duration-- and id--changed to id_1) the "v1 when failed data" with the obs of only one subject from the "5 subjects data", which would end up in the data set that Robert used for his kind solution:

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input int year_failure byte(failure id) float v1_year_failure int duration byte id_1 float(wma1 v1)
          1996 1  5 35.22383 1996 1         0 43.95443
          1998 1  2 47.81606 1997 1  35.22383 43.95443
          1998 1 37 60.95007 1998 1  35.22383 48.30526
          1998 1 47 57.08948 1999 1  45.25452 79.81371
          1999 1 19 69.31047 2000 1  57.28249 79.81371
          2000 1  6 7.001259 2001 1  47.21923 79.81371
          2000 1 11 88.99662 2002 1  47.21923 79.81371
          2000 1 28 27.49802 2003 1  47.21923 48.30526
          2004 1 26 24.74232 2004 1  47.21923 42.45374
          2004 1 45 65.38895 2005 1  47.65592 42.45374
          2006 1 39 69.51917 2006 1  47.65592 42.45374
          2007 1 31 80.26433 2007 1  59.36463  41.5652
          2008 1 22 69.11573 2008 1 66.429634 41.96951
          2010 1  3 4.534889 2009 1  62.83172 41.65173
          2010 1 30 49.00176 2010 1  62.83172 37.04457
          2011 1  8 76.37936 2011 1  42.44352 37.04457
          end
          Following Robert's instructions, one can end up with "w" (the weighted moving average) for subject one, which would be the same for all years for all subjects (I guess I will have to paste "w" manually in excel and then merge it to the original .dta with obs for 50); Once in the .dta of the 50 subjects, I would substract "w" from "v1", obtain the absolute value of that difference and that would be it.
          Last edited by Victor Cruz; 26 Sep 2016, 02:14.
          Victor Cruz

          Comment

          Working...
          X