Somewhat new to stata. I have data with participant IDs, year, and a binary outcome. I want to count how many times the binary outcome (unemployment) occurs in consecutive years for a given individual. I only want to count consecutive years and to reset the count after employment is regained.
Here is some test data I've been using, where unemp_c is the count outcome that I'm trying to create (with participants like id = 3 and id = 5 being the main problem).
Initially I thought something like the following might work:
bysort id (year) unemp : gen unemp_c = _n
replace unemp_c = 0 if unemp == 0
But this just ignores the unemp variable.
I can't use
bysort id year unemp : gen unemp_c = _n
replace unemp_c = 0 if unemp == 0
because there's only 1 observation per year (so unemp_c is entirely 1s)
And
bysort id unemp (year) : gen unemp_c = _n
replace unemp_c = 0 if unemp == 0
doesn't work because it doesn't account for when the unemployed years don't all occur consecutively.
Based on how the by command works, I don't think it's possible to do it that way.
I've considered whether creating lagged variables might work but can't figure how I would get it to count correctly either. I was initially thinking I might eventually be able to get there through conditional statements with unemp and L.unemp, but I would still want some way to generate a count using _n and for that to work (as far as I understand it) I would need a grouping variable, in which case I end up running into the same problem again where I can only group all occurrences of unemployment together regardless of year.
As I said, I'm fairly new to stata so I might be missing something here. I tried searching for similar problems but couldn't find any that desired similar output (i.e., they only want to count total cases not resetting when there were non-consecutive observations.
It looks to me like I need to somehow create a new grouping variable to separate different periods of unemployment for an individual. I can flag the beginning of a period of unemployment using
gen unemp_new = 0
replace unemp_new = 1 if L.unemp == 0 & unemp == 1
bysort id unemp_new : gen unemp_period = _n
replace unemp_new = 0 if unemp == 0
but I'm still stuck on how to mark that consecutive years of unemployment as part of the same period of unemployment.
Any guidance would be appreciated!
Here is some test data I've been using, where unemp_c is the count outcome that I'm trying to create (with participants like id = 3 and id = 5 being the main problem).
id | year | unemp | unemp_c |
1 | 1 | 0 | 0 |
1 | 2 | 0 | 0 |
1 | 3 | 1 | 1 |
1 | 4 | 1 | 2 |
1 | 5 | 0 | 0 |
1 | 6 | 0 | 0 |
2 | 1 | 0 | 0 |
2 | 2 | 0 | 0 |
2 | 3 | 0 | 0 |
2 | 4 | 0 | 0 |
2 | 5 | 0 | 0 |
2 | 6 | 0 | 0 |
3 | 1 | 1 | 1 |
3 | 2 | 1 | 2 |
3 | 3 | 0 | 0 |
3 | 4 | 0 | 0 |
3 | 5 | 1 | 1 |
3 | 6 | 0 | 0 |
4 | 1 | 0 | 0 |
4 | 2 | 1 | 1 |
4 | 3 | 1 | 2 |
4 | 4 | 1 | 3 |
4 | 5 | 0 | 0 |
4 | 6 | 0 | 0 |
5 | 1 | 1 | 1 |
5 | 2 | 1 | 2 |
5 | 3 | 0 | 0 |
5 | 4 | 1 | 1 |
5 | 5 | 1 | 2 |
5 | 6 | 1 | 3 |
Initially I thought something like the following might work:
bysort id (year) unemp : gen unemp_c = _n
replace unemp_c = 0 if unemp == 0
But this just ignores the unemp variable.
I can't use
bysort id year unemp : gen unemp_c = _n
replace unemp_c = 0 if unemp == 0
because there's only 1 observation per year (so unemp_c is entirely 1s)
And
bysort id unemp (year) : gen unemp_c = _n
replace unemp_c = 0 if unemp == 0
doesn't work because it doesn't account for when the unemployed years don't all occur consecutively.
Based on how the by command works, I don't think it's possible to do it that way.
I've considered whether creating lagged variables might work but can't figure how I would get it to count correctly either. I was initially thinking I might eventually be able to get there through conditional statements with unemp and L.unemp, but I would still want some way to generate a count using _n and for that to work (as far as I understand it) I would need a grouping variable, in which case I end up running into the same problem again where I can only group all occurrences of unemployment together regardless of year.
As I said, I'm fairly new to stata so I might be missing something here. I tried searching for similar problems but couldn't find any that desired similar output (i.e., they only want to count total cases not resetting when there were non-consecutive observations.
It looks to me like I need to somehow create a new grouping variable to separate different periods of unemployment for an individual. I can flag the beginning of a period of unemployment using
gen unemp_new = 0
replace unemp_new = 1 if L.unemp == 0 & unemp == 1
bysort id unemp_new : gen unemp_period = _n
replace unemp_new = 0 if unemp == 0
but I'm still stuck on how to mark that consecutive years of unemployment as part of the same period of unemployment.
Any guidance would be appreciated!
Comment