Sum over nth observations

amandacpac

Join Date: Sep 2014

Posts: 55
#1

Sum over nth observations

09 Jun 2025, 11:35

Hello Statalisters - I know there is probably an easy way to do this, but for the life of me I cannot figure it out. I have a dataset that has daily air pollution values for a given person (in long form, though I could put it in wide form is that's easier, especially since I will ultimately need it in wide form) and I'm looking to create weekly values. Essentially, I would like to sum up the first 7 observations, the next 7, and so on. Below is an example dataset:

id day poll
1 1 5.840e-06
1 2 5.190e-06
1 3 3.700e-06
1 4 7.420e-06
1 5 4.150e-06
1 6 2.050e-06
1 7 .0000164
1 8 .0000515
1 9 .0000293
1 10 .0000206
1 11 .000016
1 12 .0000186
1 13 .000017
1 14 .0000118
1 15 9.340e-06
1 16 .0000187
1 17 .0000239
1 18 .0000227
1 19 .0000211
1 20 .0000184
1 21 .0000199

So, I would like the sum of days 1-7, 8-14, 15-21, etc. I have over a year's worth of daily values for most people, so manually putting in the specific days to sum over would be difficult.

Thanks!
Tags: None
George Ford

Join Date: Aug 2014

Posts: 3144
#2

09 Jun 2025, 11:46

if every day is observed, then

egen week = seq() , from(1) to(100) block(7) by(id)

if not every day is observed, might use fillin to replace missing values.
Comment
amandacpac

Join Date: Sep 2014

Posts: 55
#3

09 Jun 2025, 11:50

This worked perfectly, thank you!
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35681

09 Jun 2025, 11:55

On this information, I would use the ceiling function, as here. That would be tolerant of gaps.

Code:

clear 
input id day poll
1 1 5.840e-06
1 2 5.190e-06
1 3 3.700e-06
1 4 7.420e-06
1 5 4.150e-06
1 6 2.050e-06
1 7 .0000164
1 8 .0000515
1 9 .0000293
1 10 .0000206
1 11 .000016
1 12 .0000186
1 13 .000017
1 14 .0000118
1 15 9.340e-06
1 16 .0000187
1 17 .0000239
1 18 .0000227
1 19 .0000211
1 20 .0000184
1 21 .0000199
end 
gen week = ceil(day/7)
list, sepby(id week)

     +----------------------------+
     | id   day       poll   week |
     |----------------------------|
  1. |  1     1   5.84e-06      1 |
  2. |  1     2   5.19e-06      1 |
  3. |  1     3   3.70e-06      1 |
  4. |  1     4   7.42e-06      1 |
  5. |  1     5   4.15e-06      1 |
  6. |  1     6   2.05e-06      1 |
  7. |  1     7   .0000164      1 |
     |----------------------------|
  8. |  1     8   .0000515      2 |
  9. |  1     9   .0000293      2 |
 10. |  1    10   .0000206      2 |
 11. |  1    11    .000016      2 |
 12. |  1    12   .0000186      2 |
 13. |  1    13    .000017      2 |
 14. |  1    14   .0000118      2 |
     |----------------------------|
 15. |  1    15   9.34e-06      3 |
 16. |  1    16   .0000187      3 |
 17. |  1    17   .0000239      3 |
 18. |  1    18   .0000227      3 |
 19. |  1    19   .0000211      3 |
 20. |  1    20   .0000184      3 |
 21. |  1    21   .0000199      3 |
     +----------------------------+

I can't see that wide layout is a good idea for time series data.

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35681
#5

10 Jun 2025, 01:10

Weekly cycles are common with pollution data, as any researcher will know from general knowledge alone: Day of the week has implications for industrial activity, traffic patterns, and whether children or adults are at home, school or work. If that is known but a nuisance, or any rate not of interest, then there you go.

Otherwise there is a real advantage in using Stata daily dates, from which day of the week can be extracted direclty using dow().

Published last week and immediately available to all is a tutorial review of dates and times designed to be as friendly as possible in territory that many Stata users find complicated if not confusing. (Spoiler alert: Dates are complicated; they don't have to be confusing.)

https://journals.sagepub.com/doi/epu...6867X251341416

Key detail: If the .pdf looks odd or mangled when you view it, just download it: it should look fine when locally accessed and you use your favourite browser or pdf reader.

What's most relevant here are first, don't use Stata weeks at all; they really won't help; but do see various references in the paper that in turn give various hints on managing daily data together with weeks.
Comment

Announcement

Sum over nth observations

Comment

Comment

Comment

Comment