Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculate a rolling sum of observations during previous week only (unbalanced panel)

    I have a very unbalanced and unevenly spaced panel dataset where the time variable is specified in the %tc format. I need to create a rolling sum of all the observations that fall within the week prior to each observation. The goal is to find a solution to create the "to_create" variable in the following minimum example:
    Code:
    clear
    input byte id double time int(x to_create)
    1 1547831786001  485  485
    1 1548022583001  343  828
    1 1548173184001  942 1770
    1 1549133620001  325  325
    1 1549219883001  641  966
    2 1547670452001    0    0
    2 1547893360001  314  314
    2 1554561731001    0    0
    2 1555571825001  257  257
    2 1555758690001  465  722
    3 1428166871001  688  688
    3 1428752209001  850 1538
    3 1428947181001 1450 2300
    3 1429353750001 1980 4280
    3 1429440331001 1450 4880
    end
    format %tc time
    I have tried to use the deduction of the two cumulative sums as
    Code:
    tsset id time
    by id time: gen aux = sum(x)
    by id: gen y = aux - L604800000.aux
    but, because the panel is not balanced and contains much more gaps than regularly spaced intervals, this solution creates all missing values (unless there are two records which are timed exactly 3600*24*7 seconds apart, which happens with probability approaching zero).

    I have also looked at the filter option of egenmore , but the issue still remains that I have not been able to find a way to identify the "correct number of lags" based on the dataset.

    A theoretical solution which would involve tsfill is not doable due to extreme memory requirements of such solution.


  • #2
    Thanks for your exemplary question.

    I'd use rangestat (SSC) for this purpose. That's a search term for previous posts in this forum.


    Code:
    clear
    input byte id double time int(x to_create)
    1 1547831786001  485  485
    1 1548022583001  343  828
    1 1548173184001  942 1770
    1 1549133620001  325  325
    1 1549219883001  641  966
    2 1547670452001    0    0
    2 1547893360001  314  314
    2 1554561731001    0    0
    2 1555571825001  257  257
    2 1555758690001  465  722
    3 1428166871001  688  688
    3 1428752209001  850 1538
    3 1428947181001 1450 2300
    3 1429353750001 1980 4280
    3 1429440331001 1450 4880
    end
    format %tc time
    
    rangestat (sum) wanted = x, int(time -6048e5 0) by(id) 
    
    list, sepby(id) 
    
         +----------------------------------------------------+
         | id                 time      x   to_cre~e   wanted |
         |----------------------------------------------------|
      1. |  1   17jan2009 17:16:26    485        485      485 |
      2. |  1   19jan2009 22:16:23    343        828      828 |
      3. |  1   21jan2009 16:06:24    942       1770     1770 |
      4. |  1   01feb2009 18:53:40    325        325      325 |
      5. |  1   02feb2009 18:51:23    641        966      966 |
         |----------------------------------------------------|
      6. |  2   15jan2009 20:27:32      0          0        0 |
      7. |  2   18jan2009 10:22:40    314        314      314 |
      8. |  2   05apr2009 14:42:11      0          0        0 |
      9. |  2   17apr2009 07:17:05    257        257      257 |
     10. |  2   19apr2009 11:11:30    465        722      722 |
         |----------------------------------------------------|
     11. |  3   03apr2005 17:01:11    688        688      688 |
     12. |  3   10apr2005 11:36:49    850       1538     1538 |
     13. |  3   12apr2005 17:46:21   1450       2300     2300 |
     14. |  3   17apr2005 10:42:30   1980       4280     4280 |
     15. |  3   18apr2005 10:45:31   1450       4880     4880 |
         +----------------------------------------------------+

    Comment


    • #3
      Thank you! This is exactly the command I was unable to find :-).

      Comment

      Working...
      X