Calculating statistics based on a panel with repeated time values

Pantelis Kazakis

Join Date: Aug 2014

Posts: 123
#1

Calculating statistics based on a panel with repeated time values

08 May 2017, 02:13

Hello Stata users,

I would like to see how the average values based on a specific time window affect future values up to five calendar years in the future. Specifically, in a time series analysis, how the average values of the past three calendar years affect the future values over the next (1, 2, 3, 4, 5) calendar years. I have provided a minimal example below.

Code:

clear all set more off input /// id year var_x 1 1975 0.475697247 1 1981 0.075769328 1 1982 0.904638688 1 1983 0.4612996 1 1983 0.955458246 1 1983 0.014580401 1 1985 0.380078561 1 1985 0.340454901 1 1987 0.008309949 1 1987 0.159163017 1 1987 0.763778463 1 1989 0.003309207 2 1995 0.37667648 2 1998 0.547401227 2 1999 0.294790755 2 2001 0.484209626 2 2001 0.040294883 2 2001 0.03179457 2 2002 0.431170685 2 2003 0.743045929 2 2003 0.278046652 2 2003 0.233674965 2 2004 0.71374945 end

Having this example in mind, one can see that we can take averages within the specific three year window multiple times, as we have rolling three year averages. For example, the average value for the years 1981, 1982, and 1983, based on var_x, will affect that variable in the years '84, '85, '86, '87, and '88. This process continues for all other potential combinations as time progresses.

I have seen commands such as: rolling, mvsumm, tsegen that could deal with rolling averages, however, when I tried them I got the error that the panel has repeated time values, which is to be expected, as there might be multiple events in the same year.

What I tried, is to collapse the data based on id-year and then I used the following code:

Code:

bysort id (year): gen Past_1YR = var_x[_n-1] if year[_n] - year[_n-1] == 1 bysort id (year): gen Past_2YR = var_x[_n-2] if year[_n] - year[_n-2] == 2 bysort id (year): replace Past_2YR = var_x[_n-1] if year[_n] - year[_n-1] == 2 bysort id (year): gen Past_3YR = var_x[_n-3] if year[_n] - year[_n-3] == 3 bysort id (year): replace Past_3YR = var_x[_n-2] if year[_n] - year[_n-2] == 3 bysort id (year): replace Past_3YR = var_x[_n-1] if year[_n] - year[_n-1] == 3 egen Past_val = rmean(Past_1YR Past_2YR Past_3YR) drop Past_1YR Past_2YR Past_3YR

Nonetheless, I am not happy with this, as I would like the whole process to happen in the original file.

Eventually, I would like to run regressions of future values of var_x on past_val (of course, the example here is too small, so something like that cannot be performed).

I would be grateful for any assistance in this matter.
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35806

08 May 2017, 02:32

Thanks for the data example. mvsumm (SSC) and tsegen (SSC) are user-written commands, as you are asked to explain.

rangestat (SSC) doesn't insist on panel data. so something like this should help.

Code:

clear all
set more off

input ///
id     year    var_x
1    1975    0.475697247
1    1981    0.075769328
1    1982    0.904638688
1    1983    0.4612996
1    1983    0.955458246
1    1983    0.014580401
1    1985    0.380078561
1    1985    0.340454901
1    1987    0.008309949
1    1987    0.159163017
1    1987    0.763778463
1    1989    0.003309207
2    1995    0.37667648
2    1998    0.547401227
2    1999    0.294790755
2    2001    0.484209626
2    2001    0.040294883
2    2001    0.03179457
2    2002    0.431170685
2    2003    0.743045929
2    2003    0.278046652
2    2003    0.233674965
2    2004    0.71374945
end

rangestat mean1=var_x, interval(year -3 -1) by(id) 
rangestat mean2=var_x, interval(year 1 5) by(id) 

list, sepby(id) 

     +----------------------------------------------+
     | id   year      var_x       mean1       mean2 |
     |----------------------------------------------|
  1. |  1   1975   .4756972           .           . |
  2. |  1   1981   .0757693           .    .5094184 |
  3. |  1   1982   .9046387   .07576933   .38539039 |
  4. |  1   1983   .4612996   .49020402   .33035697 |
  5. |  1   1983   .9554582   .49020402   .33035697 |
  6. |  1   1983   .0145804   .49020402   .33035697 |
  7. |  1   1985   .3800786   .58399423   .23364015 |
  8. |  1   1985   .3404549   .58399423   .23364015 |
  9. |  1   1987   .0083099   .36026673   .00330921 |
 10. |  1   1987    .159163   .36026673   .00330921 |
 11. |  1   1987   .7637784   .36026673   .00330921 |
 12. |  1   1989   .0033092   .31041714           . |
     |----------------------------------------------|
 13. |  2   1995   .3766765           .     .421096 |
 14. |  2   1998   .5474012   .37667647    .3171285 |
 15. |  2   1999   .2947907   .54740125   .36949834 |
 16. |  2   2001   .4842096     .421096   .47993753 |
 17. |  2   2001   .0402949     .421096   .47993753 |
 18. |  2   2001   .0317946     .421096   .47993753 |
 19. |  2   2002   .4311707   .21277246   .49212925 |
 20. |  2   2003   .7430459   .24686744   .71374947 |
 21. |  2   2003   .2780466   .24686744   .71374947 |
 22. |  2   2003    .233675   .24686744   .71374947 |
 23. |  2   2004   .7137495   .32031961           . |
     +----------------------------------------------+

Comment

Pantelis Kazakis

Join Date: Aug 2014

Posts: 123
#3

08 May 2017, 03:28

Thank you Nick. This is very helpful.
Comment

Announcement

Calculating statistics based on a panel with repeated time values

Comment

Comment