Want a cumulative count that resets when there's a gap between occurrences

Miranda Chilver

Join Date: Aug 2023
Posts: 3

Want a cumulative count that resets when there's a gap between occurrences

22 Aug 2023, 03:28

Somewhat new to stata. I have data with participant IDs, year, and a binary outcome. I want to count how many times the binary outcome (unemployment) occurs in consecutive years for a given individual. I only want to count consecutive years and to reset the count after employment is regained.

Here is some test data I've been using, where unemp_c is the count outcome that I'm trying to create (with participants like id = 3 and id = 5 being the main problem).

id	year	unemp	unemp_c
1	1	0	0
1	2	0	0
1	3	1	1
1	4	1	2
1	5	0	0
1	6	0	0
2	1	0	0
2	2	0	0
2	3	0	0
2	4	0	0
2	5	0	0
2	6	0	0
3	1	1	1
3	2	1	2
3	3	0	0
3	4	0	0
3	5	1	1
3	6	0	0
4	1	0	0
4	2	1	1
4	3	1	2
4	4	1	3
4	5	0	0
4	6	0	0
5	1	1	1
5	2	1	2
5	3	0	0
5	4	1	1
5	5	1	2
5	6	1	3

Initially I thought something like the following might work:

bysort id (year) unemp : gen unemp_c = _n
replace unemp_c = 0 if unemp == 0

But this just ignores the unemp variable.

I can't use
bysort id year unemp : gen unemp_c = _n
replace unemp_c = 0 if unemp == 0
because there's only 1 observation per year (so unemp_c is entirely 1s)

And
bysort id unemp (year) : gen unemp_c = _n
replace unemp_c = 0 if unemp == 0
doesn't work because it doesn't account for when the unemployed years don't all occur consecutively.

Based on how the by command works, I don't think it's possible to do it that way.

I've considered whether creating lagged variables might work but can't figure how I would get it to count correctly either. I was initially thinking I might eventually be able to get there through conditional statements with unemp and L.unemp, but I would still want some way to generate a count using _n and for that to work (as far as I understand it) I would need a grouping variable, in which case I end up running into the same problem again where I can only group all occurrences of unemployment together regardless of year.

As I said, I'm fairly new to stata so I might be missing something here. I tried searching for similar problems but couldn't find any that desired similar output (i.e., they only want to count total cases not resetting when there were non-consecutive observations.

It looks to me like I need to somehow create a new grouping variable to separate different periods of unemployment for an individual. I can flag the beginning of a period of unemployment using
gen unemp_new = 0
replace unemp_new = 1 if L.unemp == 0 & unemp == 1
bysort id unemp_new : gen unemp_period = _n
replace unemp_new = 0 if unemp == 0
but I'm still stuck on how to mark that consecutive years of unemployment as part of the same period of unemployment.

Any guidance would be appreciated!

Tags: cumulative count, data, groups, lag, panel data

Andrew Musau

Join Date: Oct 2014
Posts: 10180

22 Aug 2023, 03:45

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(id year unemp unemp_c)
1 1 0 0
1 2 0 0
1 3 1 1
1 4 1 2
1 5 0 0
1 6 0 0
2 1 0 0
2 2 0 0
2 3 0 0
2 4 0 0
2 5 0 0
2 6 0 0
3 1 1 1
3 2 1 2
3 3 0 0
3 4 0 0
3 5 1 1
3 6 0 0
4 1 0 0
4 2 1 1
4 3 1 2
4 4 1 3
4 5 0 0
4 6 0 0
5 1 1 1
5 2 1 2
5 3 0 0
5 4 1 1
5 5 1 2
5 6 1 3
end

bys id (year): gen wanted= (unemp & !unemp[_n-1]) | _n==1 & unemp
by id: replace wanted= wanted[_n-1]+1 if unemp & unemp[_n-1] & _n>1
assert unemp_c== wanted

Res.:

Code:

. l, sepby(id)

     +--------------------------------------+
     | id   year   unemp   unemp_c   wanted |
     |--------------------------------------|
  1. |  1      1       0         0        0 |
  2. |  1      2       0         0        0 |
  3. |  1      3       1         1        1 |
  4. |  1      4       1         2        2 |
  5. |  1      5       0         0        0 |
  6. |  1      6       0         0        0 |
     |--------------------------------------|
  7. |  2      1       0         0        0 |
  8. |  2      2       0         0        0 |
  9. |  2      3       0         0        0 |
 10. |  2      4       0         0        0 |
 11. |  2      5       0         0        0 |
 12. |  2      6       0         0        0 |
     |--------------------------------------|
 13. |  3      1       1         1        1 |
 14. |  3      2       1         2        2 |
 15. |  3      3       0         0        0 |
 16. |  3      4       0         0        0 |
 17. |  3      5       1         1        1 |
 18. |  3      6       0         0        0 |
     |--------------------------------------|
 19. |  4      1       0         0        0 |
 20. |  4      2       1         1        1 |
 21. |  4      3       1         2        2 |
 22. |  4      4       1         3        3 |
 23. |  4      5       0         0        0 |
 24. |  4      6       0         0        0 |
     |--------------------------------------|
 25. |  5      1       1         1        1 |
 26. |  5      2       1         2        2 |
 27. |  5      3       0         0        0 |
 28. |  5      4       1         1        1 |
 29. |  5      5       1         2        2 |
 30. |  5      6       1         3        3 |
     +--------------------------------------+

.

Comment

Miranda Chilver

Join Date: Aug 2023

Posts: 3
#3

22 Aug 2023, 03:54

Thank you so much!
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35617

22 Aug 2023, 04:18

What it boils down to is an exercise in using by: and subscripting in various ways, as Andrew Musau's excellent reply indicates concisely. That is fiddly without some experience.

See also tsspell from SSC and https://journals.sagepub.com/doi/pdf...867X0700700209

The lack of mention of tsspell in the 2007 paper can be explained. I was going to spell out (pun intended) the principles and then get to mentioning tsspell. But the paper got long and I was close to a deadline, so it was submitted as you see it. There was a vague intention of a sequel mentioning tsspell but that never got written and in any event the help file for tsspell is fairly detailed.

Code:

clear
input id    year    unemp    unemp_c
1    1    0    0
1    2    0    0
1    3    1    1
1    4    1    2
1    5    0    0
1    6    0    0
2    1    0    0
2    2    0    0
2    3    0    0
2    4    0    0
2    5    0    0
2    6    0    0
3    1    1    1
3    2    1    2
3    3    0    0
3    4    0    0
3    5    1    1
3    6    0    0
4    1    0    0
4    2    1    1
4    3    1    2
4    4    1    3
4    5    0    0
4    6    0    0
5    1    1    1
5    2    1    2
5    3    0    0
5    4    1    1
5    5    1    2
5    6    1    3
end

tsset id year
tsspell, cond(unemp == 1)

list, sepby(id _spell)

     +----------------------------------------------------+
     | id   year   unemp   unemp_c   _seq   _spell   _end |
     |----------------------------------------------------|
  1. |  1      1       0         0      0        0      0 |
  2. |  1      2       0         0      0        0      0 |
     |----------------------------------------------------|
  3. |  1      3       1         1      1        1      0 |
  4. |  1      4       1         2      2        1      1 |
     |----------------------------------------------------|
  5. |  1      5       0         0      0        0      0 |
  6. |  1      6       0         0      0        0      0 |
     |----------------------------------------------------|
  7. |  2      1       0         0      0        0      0 |
  8. |  2      2       0         0      0        0      0 |
  9. |  2      3       0         0      0        0      0 |
 10. |  2      4       0         0      0        0      0 |
 11. |  2      5       0         0      0        0      0 |
 12. |  2      6       0         0      0        0      0 |
     |----------------------------------------------------|
 13. |  3      1       1         1      1        1      0 |
 14. |  3      2       1         2      2        1      1 |
     |----------------------------------------------------|
 15. |  3      3       0         0      0        0      0 |
 16. |  3      4       0         0      0        0      0 |
     |----------------------------------------------------|
 17. |  3      5       1         1      1        2      1 |
     |----------------------------------------------------|
 18. |  3      6       0         0      0        0      0 |
     |----------------------------------------------------|
 19. |  4      1       0         0      0        0      0 |
     |----------------------------------------------------|
 20. |  4      2       1         1      1        1      0 |
 21. |  4      3       1         2      2        1      0 |
 22. |  4      4       1         3      3        1      1 |
     |----------------------------------------------------|
 23. |  4      5       0         0      0        0      0 |
 24. |  4      6       0         0      0        0      0 |
     |----------------------------------------------------|
 25. |  5      1       1         1      1        1      0 |
 26. |  5      2       1         2      2        1      1 |
     |----------------------------------------------------|
 27. |  5      3       0         0      0        0      0 |
     |----------------------------------------------------|
 28. |  5      4       1         1      1        2      0 |
 29. |  5      5       1         2      2        2      0 |
 30. |  5      6       1         3      3        2      1 |
     +----------------------------------------------------+

You can see that the counter variable _seq is precisely the variable requested. The other variables can help too. For example, the number of spells per person is

Code:

egen n_spells = max(_spell), by(id)

and the mean length of employment per person is

Code:

egen mean_length = mean(cond(_end, _seq, .)), by(id)

while the total length of unemployment is already in sight as

Code:

egen total_length = total(unemp), by(id)

Know that averaging across people, not people-years, is easier with egen, tag(), but see the help for egen for more.

tsspell is an otherwise unpredictable search term for previous discussions here.

Comment

Miranda Chilver

Join Date: Aug 2023

Posts: 3
#5

22 Aug 2023, 19:41

Thank you Nick - great to know about tsspell!
Comment

Announcement

Want a cumulative count that resets when there's a gap between occurrences

Comment

Comment

Comment

Comment