I have a very unbalanced and unevenly spaced panel dataset where the time variable is specified in the %tc format. I need to create a rolling sum of all the observations that fall within the week prior to each observation. The goal is to find a solution to create the "to_create" variable in the following minimum example:
I have tried to use the deduction of the two cumulative sums as
but, because the panel is not balanced and contains much more gaps than regularly spaced intervals, this solution creates all missing values (unless there are two records which are timed exactly 3600*24*7 seconds apart, which happens with probability approaching zero).
I have also looked at the filter option of egenmore , but the issue still remains that I have not been able to find a way to identify the "correct number of lags" based on the dataset.
A theoretical solution which would involve tsfill is not doable due to extreme memory requirements of such solution.
Code:
clear input byte id double time int(x to_create) 1 1547831786001 485 485 1 1548022583001 343 828 1 1548173184001 942 1770 1 1549133620001 325 325 1 1549219883001 641 966 2 1547670452001 0 0 2 1547893360001 314 314 2 1554561731001 0 0 2 1555571825001 257 257 2 1555758690001 465 722 3 1428166871001 688 688 3 1428752209001 850 1538 3 1428947181001 1450 2300 3 1429353750001 1980 4280 3 1429440331001 1450 4880 end format %tc time
Code:
tsset id time by id time: gen aux = sum(x) by id: gen y = aux - L604800000.aux
I have also looked at the filter option of egenmore , but the issue still remains that I have not been able to find a way to identify the "correct number of lags" based on the dataset.
A theoretical solution which would involve tsfill is not doable due to extreme memory requirements of such solution.
Comment