Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Want a cumulative count that resets when there's a gap between occurrences

    Somewhat new to stata. I have data with participant IDs, year, and a binary outcome. I want to count how many times the binary outcome (unemployment) occurs in consecutive years for a given individual. I only want to count consecutive years and to reset the count after employment is regained.

    Here is some test data I've been using, where unemp_c is the count outcome that I'm trying to create (with participants like id = 3 and id = 5 being the main problem).

    id year unemp unemp_c
    1 1 0 0
    1 2 0 0
    1 3 1 1
    1 4 1 2
    1 5 0 0
    1 6 0 0
    2 1 0 0
    2 2 0 0
    2 3 0 0
    2 4 0 0
    2 5 0 0
    2 6 0 0
    3 1 1 1
    3 2 1 2
    3 3 0 0
    3 4 0 0
    3 5 1 1
    3 6 0 0
    4 1 0 0
    4 2 1 1
    4 3 1 2
    4 4 1 3
    4 5 0 0
    4 6 0 0
    5 1 1 1
    5 2 1 2
    5 3 0 0
    5 4 1 1
    5 5 1 2
    5 6 1 3

    Initially I thought something like the following might work:

    bysort id (year) unemp : gen unemp_c = _n
    replace unemp_c = 0 if unemp == 0

    But this just ignores the unemp variable.

    I can't use
    bysort id year unemp : gen unemp_c = _n
    replace unemp_c = 0 if unemp == 0
    because there's only 1 observation per year (so unemp_c is entirely 1s)

    And
    bysort id unemp (year) : gen unemp_c = _n
    ​​​​​​​replace unemp_c = 0 if unemp == 0
    doesn't work because it doesn't account for when the unemployed years don't all occur consecutively.

    Based on how the by command works, I don't think it's possible to do it that way.

    I've considered whether creating lagged variables might work but can't figure how I would get it to count correctly either. I was initially thinking I might eventually be able to get there through conditional statements with unemp and L.unemp, but I would still want some way to generate a count using _n and for that to work (as far as I understand it) I would need a grouping variable, in which case I end up running into the same problem again where I can only group all occurrences of unemployment together regardless of year.

    As I said, I'm fairly new to stata so I might be missing something here. I tried searching for similar problems but couldn't find any that desired similar output (i.e., they only want to count total cases not resetting when there were non-consecutive observations.

    It looks to me like I need to somehow create a new grouping variable to separate different periods of unemployment for an individual. I can flag the beginning of a period of unemployment using
    gen unemp_new = 0
    replace unemp_new = 1 if L.unemp == 0 & unemp == 1
    bysort id unemp_new : gen unemp_period = _n
    replace unemp_new = 0 if unemp == 0

    but I'm still stuck on how to mark that consecutive years of unemployment as part of the same period of unemployment.

    Any guidance would be appreciated!

  • #2
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(id year unemp unemp_c)
    1 1 0 0
    1 2 0 0
    1 3 1 1
    1 4 1 2
    1 5 0 0
    1 6 0 0
    2 1 0 0
    2 2 0 0
    2 3 0 0
    2 4 0 0
    2 5 0 0
    2 6 0 0
    3 1 1 1
    3 2 1 2
    3 3 0 0
    3 4 0 0
    3 5 1 1
    3 6 0 0
    4 1 0 0
    4 2 1 1
    4 3 1 2
    4 4 1 3
    4 5 0 0
    4 6 0 0
    5 1 1 1
    5 2 1 2
    5 3 0 0
    5 4 1 1
    5 5 1 2
    5 6 1 3
    end
    
    bys id (year): gen wanted= (unemp & !unemp[_n-1]) | _n==1 & unemp
    by id: replace wanted= wanted[_n-1]+1 if unemp & unemp[_n-1] & _n>1
    assert unemp_c== wanted
    Res.:

    Code:
    . l, sepby(id)
    
         +--------------------------------------+
         | id   year   unemp   unemp_c   wanted |
         |--------------------------------------|
      1. |  1      1       0         0        0 |
      2. |  1      2       0         0        0 |
      3. |  1      3       1         1        1 |
      4. |  1      4       1         2        2 |
      5. |  1      5       0         0        0 |
      6. |  1      6       0         0        0 |
         |--------------------------------------|
      7. |  2      1       0         0        0 |
      8. |  2      2       0         0        0 |
      9. |  2      3       0         0        0 |
     10. |  2      4       0         0        0 |
     11. |  2      5       0         0        0 |
     12. |  2      6       0         0        0 |
         |--------------------------------------|
     13. |  3      1       1         1        1 |
     14. |  3      2       1         2        2 |
     15. |  3      3       0         0        0 |
     16. |  3      4       0         0        0 |
     17. |  3      5       1         1        1 |
     18. |  3      6       0         0        0 |
         |--------------------------------------|
     19. |  4      1       0         0        0 |
     20. |  4      2       1         1        1 |
     21. |  4      3       1         2        2 |
     22. |  4      4       1         3        3 |
     23. |  4      5       0         0        0 |
     24. |  4      6       0         0        0 |
         |--------------------------------------|
     25. |  5      1       1         1        1 |
     26. |  5      2       1         2        2 |
     27. |  5      3       0         0        0 |
     28. |  5      4       1         1        1 |
     29. |  5      5       1         2        2 |
     30. |  5      6       1         3        3 |
         +--------------------------------------+
    
    .

    Comment


    • #3
      Thank you so much!

      Comment


      • #4
        What it boils down to is an exercise in using by: and subscripting in various ways, as Andrew Musau's excellent reply indicates concisely. That is fiddly without some experience.

        See also tsspell from SSC and https://journals.sagepub.com/doi/pdf...867X0700700209

        The lack of mention of tsspell in the 2007 paper can be explained. I was going to spell out (pun intended) the principles and then get to mentioning tsspell. But the paper got long and I was close to a deadline, so it was submitted as you see it. There was a vague intention of a sequel mentioning tsspell but that never got written and in any event the help file for tsspell is fairly detailed.

        Code:
        clear
        input id    year    unemp    unemp_c
        1    1    0    0
        1    2    0    0
        1    3    1    1
        1    4    1    2
        1    5    0    0
        1    6    0    0
        2    1    0    0
        2    2    0    0
        2    3    0    0
        2    4    0    0
        2    5    0    0
        2    6    0    0
        3    1    1    1
        3    2    1    2
        3    3    0    0
        3    4    0    0
        3    5    1    1
        3    6    0    0
        4    1    0    0
        4    2    1    1
        4    3    1    2
        4    4    1    3
        4    5    0    0
        4    6    0    0
        5    1    1    1
        5    2    1    2
        5    3    0    0
        5    4    1    1
        5    5    1    2
        5    6    1    3
        end
        
        tsset id year
        tsspell, cond(unemp == 1)
        
        list, sepby(id _spell)
        
             +----------------------------------------------------+
             | id   year   unemp   unemp_c   _seq   _spell   _end |
             |----------------------------------------------------|
          1. |  1      1       0         0      0        0      0 |
          2. |  1      2       0         0      0        0      0 |
             |----------------------------------------------------|
          3. |  1      3       1         1      1        1      0 |
          4. |  1      4       1         2      2        1      1 |
             |----------------------------------------------------|
          5. |  1      5       0         0      0        0      0 |
          6. |  1      6       0         0      0        0      0 |
             |----------------------------------------------------|
          7. |  2      1       0         0      0        0      0 |
          8. |  2      2       0         0      0        0      0 |
          9. |  2      3       0         0      0        0      0 |
         10. |  2      4       0         0      0        0      0 |
         11. |  2      5       0         0      0        0      0 |
         12. |  2      6       0         0      0        0      0 |
             |----------------------------------------------------|
         13. |  3      1       1         1      1        1      0 |
         14. |  3      2       1         2      2        1      1 |
             |----------------------------------------------------|
         15. |  3      3       0         0      0        0      0 |
         16. |  3      4       0         0      0        0      0 |
             |----------------------------------------------------|
         17. |  3      5       1         1      1        2      1 |
             |----------------------------------------------------|
         18. |  3      6       0         0      0        0      0 |
             |----------------------------------------------------|
         19. |  4      1       0         0      0        0      0 |
             |----------------------------------------------------|
         20. |  4      2       1         1      1        1      0 |
         21. |  4      3       1         2      2        1      0 |
         22. |  4      4       1         3      3        1      1 |
             |----------------------------------------------------|
         23. |  4      5       0         0      0        0      0 |
         24. |  4      6       0         0      0        0      0 |
             |----------------------------------------------------|
         25. |  5      1       1         1      1        1      0 |
         26. |  5      2       1         2      2        1      1 |
             |----------------------------------------------------|
         27. |  5      3       0         0      0        0      0 |
             |----------------------------------------------------|
         28. |  5      4       1         1      1        2      0 |
         29. |  5      5       1         2      2        2      0 |
         30. |  5      6       1         3      3        2      1 |
             +----------------------------------------------------+
        You can see that the counter variable _seq is precisely the variable requested. The other variables can help too. For example, the number of spells per person is

        Code:
        egen n_spells = max(_spell), by(id)
        and the mean length of employment per person is

        Code:
        egen mean_length = mean(cond(_end, _seq, .)), by(id)
        while the total length of unemployment is already in sight as

        Code:
        egen total_length = total(unemp), by(id) 
        Know that averaging across people, not people-years, is easier with egen, tag(), but see the help for egen for more.

        tsspell is an otherwise unpredictable search term for previous discussions here.

        Comment


        • #5
          Thank you Nick - great to know about tsspell!

          Comment

          Working...
          X