Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating statistics based on a panel with repeated time values

    Hello Stata users,

    I would like to see how the average values based on a specific time window affect future values up to five calendar years in the future. Specifically, in a time series analysis, how the average values of the past three calendar years affect the future values over the next (1, 2, 3, 4, 5) calendar years. I have provided a minimal example below.

    Code:
    clear all
    set more off
    
    input ///
    id     year    var_x
    1    1975    0.475697247
    1    1981    0.075769328
    1    1982    0.904638688
    1    1983    0.4612996
    1    1983    0.955458246
    1    1983    0.014580401
    1    1985    0.380078561
    1    1985    0.340454901
    1    1987    0.008309949
    1    1987    0.159163017
    1    1987    0.763778463
    1    1989    0.003309207
    2    1995    0.37667648
    2    1998    0.547401227
    2    1999    0.294790755
    2    2001    0.484209626
    2    2001    0.040294883
    2    2001    0.03179457
    2    2002    0.431170685
    2    2003    0.743045929
    2    2003    0.278046652
    2    2003    0.233674965
    2    2004    0.71374945
    end
    Having this example in mind, one can see that we can take averages within the specific three year window multiple times, as we have rolling three year averages. For example, the average value for the years 1981, 1982, and 1983, based on var_x, will affect that variable in the years '84, '85, '86, '87, and '88. This process continues for all other potential combinations as time progresses.

    I have seen commands such as: rolling, mvsumm, tsegen that could deal with rolling averages, however, when I tried them I got the error that the panel has repeated time values, which is to be expected, as there might be multiple events in the same year.

    What I tried, is to collapse the data based on id-year and then I used the following code:

    Code:
    bysort id (year): gen Past_1YR = var_x[_n-1] if year[_n] - year[_n-1] == 1
     
    bysort id (year): gen Past_2YR = var_x[_n-2] if year[_n] - year[_n-2] == 2 
    bysort id (year): replace Past_2YR = var_x[_n-1] if year[_n] - year[_n-1] == 2 
    
    bysort id (year): gen Past_3YR = var_x[_n-3] if year[_n] - year[_n-3] == 3 
    bysort id (year): replace Past_3YR = var_x[_n-2] if year[_n] - year[_n-2] == 3
    bysort id (year): replace Past_3YR = var_x[_n-1] if year[_n] - year[_n-1] == 3
     
    egen Past_val = rmean(Past_1YR Past_2YR Past_3YR)
    
    drop Past_1YR Past_2YR Past_3YR
    Nonetheless, I am not happy with this, as I would like the whole process to happen in the original file.

    Eventually, I would like to run regressions of future values of var_x on past_val (of course, the example here is too small, so something like that cannot be performed).

    I would be grateful for any assistance in this matter.

  • #2
    Thanks for the data example. mvsumm (SSC) and tsegen (SSC) are user-written commands, as you are asked to explain.

    rangestat (SSC) doesn't insist on panel data. so something like this should help.

    Code:
    clear all
    set more off
    
    input ///
    id     year    var_x
    1    1975    0.475697247
    1    1981    0.075769328
    1    1982    0.904638688
    1    1983    0.4612996
    1    1983    0.955458246
    1    1983    0.014580401
    1    1985    0.380078561
    1    1985    0.340454901
    1    1987    0.008309949
    1    1987    0.159163017
    1    1987    0.763778463
    1    1989    0.003309207
    2    1995    0.37667648
    2    1998    0.547401227
    2    1999    0.294790755
    2    2001    0.484209626
    2    2001    0.040294883
    2    2001    0.03179457
    2    2002    0.431170685
    2    2003    0.743045929
    2    2003    0.278046652
    2    2003    0.233674965
    2    2004    0.71374945
    end
    
    rangestat mean1=var_x, interval(year -3 -1) by(id) 
    rangestat mean2=var_x, interval(year 1 5) by(id) 
    
    list, sepby(id) 
    
         +----------------------------------------------+
         | id   year      var_x       mean1       mean2 |
         |----------------------------------------------|
      1. |  1   1975   .4756972           .           . |
      2. |  1   1981   .0757693           .    .5094184 |
      3. |  1   1982   .9046387   .07576933   .38539039 |
      4. |  1   1983   .4612996   .49020402   .33035697 |
      5. |  1   1983   .9554582   .49020402   .33035697 |
      6. |  1   1983   .0145804   .49020402   .33035697 |
      7. |  1   1985   .3800786   .58399423   .23364015 |
      8. |  1   1985   .3404549   .58399423   .23364015 |
      9. |  1   1987   .0083099   .36026673   .00330921 |
     10. |  1   1987    .159163   .36026673   .00330921 |
     11. |  1   1987   .7637784   .36026673   .00330921 |
     12. |  1   1989   .0033092   .31041714           . |
         |----------------------------------------------|
     13. |  2   1995   .3766765           .     .421096 |
     14. |  2   1998   .5474012   .37667647    .3171285 |
     15. |  2   1999   .2947907   .54740125   .36949834 |
     16. |  2   2001   .4842096     .421096   .47993753 |
     17. |  2   2001   .0402949     .421096   .47993753 |
     18. |  2   2001   .0317946     .421096   .47993753 |
     19. |  2   2002   .4311707   .21277246   .49212925 |
     20. |  2   2003   .7430459   .24686744   .71374947 |
     21. |  2   2003   .2780466   .24686744   .71374947 |
     22. |  2   2003    .233675   .24686744   .71374947 |
     23. |  2   2004   .7137495   .32031961           . |
         +----------------------------------------------+

    Comment


    • #3
      Thank you Nick. This is very helpful.

      Comment

      Working...
      X