Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sum over nth observations

    Hello Statalisters - I know there is probably an easy way to do this, but for the life of me I cannot figure it out. I have a dataset that has daily air pollution values for a given person (in long form, though I could put it in wide form is that's easier, especially since I will ultimately need it in wide form) and I'm looking to create weekly values. Essentially, I would like to sum up the first 7 observations, the next 7, and so on. Below is an example dataset:

    id day poll
    1 1 5.840e-06
    1 2 5.190e-06
    1 3 3.700e-06
    1 4 7.420e-06
    1 5 4.150e-06
    1 6 2.050e-06
    1 7 .0000164
    1 8 .0000515
    1 9 .0000293
    1 10 .0000206
    1 11 .000016
    1 12 .0000186
    1 13 .000017
    1 14 .0000118
    1 15 9.340e-06
    1 16 .0000187
    1 17 .0000239
    1 18 .0000227
    1 19 .0000211
    1 20 .0000184
    1 21 .0000199

    So, I would like the sum of days 1-7, 8-14, 15-21, etc. I have over a year's worth of daily values for most people, so manually putting in the specific days to sum over would be difficult.

    Thanks!

  • #2
    if every day is observed, then

    egen week = seq() , from(1) to(100) block(7) by(id)

    if not every day is observed, might use fillin to replace missing values.

    Comment


    • #3
      This worked perfectly, thank you!

      Comment


      • #4
        On this information, I would use the ceiling function, as here. That would be tolerant of gaps.

        Code:
        clear 
        input id day poll
        1 1 5.840e-06
        1 2 5.190e-06
        1 3 3.700e-06
        1 4 7.420e-06
        1 5 4.150e-06
        1 6 2.050e-06
        1 7 .0000164
        1 8 .0000515
        1 9 .0000293
        1 10 .0000206
        1 11 .000016
        1 12 .0000186
        1 13 .000017
        1 14 .0000118
        1 15 9.340e-06
        1 16 .0000187
        1 17 .0000239
        1 18 .0000227
        1 19 .0000211
        1 20 .0000184
        1 21 .0000199
        end 
        gen week = ceil(day/7)
        list, sepby(id week)
        
             +----------------------------+
             | id   day       poll   week |
             |----------------------------|
          1. |  1     1   5.84e-06      1 |
          2. |  1     2   5.19e-06      1 |
          3. |  1     3   3.70e-06      1 |
          4. |  1     4   7.42e-06      1 |
          5. |  1     5   4.15e-06      1 |
          6. |  1     6   2.05e-06      1 |
          7. |  1     7   .0000164      1 |
             |----------------------------|
          8. |  1     8   .0000515      2 |
          9. |  1     9   .0000293      2 |
         10. |  1    10   .0000206      2 |
         11. |  1    11    .000016      2 |
         12. |  1    12   .0000186      2 |
         13. |  1    13    .000017      2 |
         14. |  1    14   .0000118      2 |
             |----------------------------|
         15. |  1    15   9.34e-06      3 |
         16. |  1    16   .0000187      3 |
         17. |  1    17   .0000239      3 |
         18. |  1    18   .0000227      3 |
         19. |  1    19   .0000211      3 |
         20. |  1    20   .0000184      3 |
         21. |  1    21   .0000199      3 |
             +----------------------------+
        I can't see that wide layout is a good idea for time series data.

        Comment


        • #5
          Weekly cycles are common with pollution data, as any researcher will know from general knowledge alone: Day of the week has implications for industrial activity, traffic patterns, and whether children or adults are at home, school or work. If that is known but a nuisance, or any rate not of interest, then there you go.

          Otherwise there is a real advantage in using Stata daily dates, from which day of the week can be extracted direclty using dow().

          Published last week and immediately available to all is a tutorial review of dates and times designed to be as friendly as possible in territory that many Stata users find complicated if not confusing. (Spoiler alert: Dates are complicated; they don't have to be confusing.)

          https://journals.sagepub.com/doi/epu...6867X251341416

          Key detail: If the .pdf looks odd or mangled when you view it, just download it: it should look fine when locally accessed and you use your favourite browser or pdf reader.

          What's most relevant here are first, don't use Stata weeks at all; they really won't help; but do see various references in the paper that in turn give various hints on managing daily data together with weeks.

          Comment

          Working...
          X