I've looked high & low for a solution to this, so my apologies if I'm repeating an issue from a different post/FAQ/Stata Journal. This journal article: http://www.stata-journal.com/sjpdf.h...iclenum=dm0029, among many other sources, has gotten me pretty far already, but I can't seem to fix one last bug/issue dealing with sequencing spells.
I'm using Stata 13 SE and have a dataset that looks like:
My goal is to sequentially identify the enrollment/graduation patterns for each 'id'. As you can see, I've already sorted by id & EnrGrad_Dates and my date variables are in Stata format (e.g., days since 1/1/1960). I'm able to get most of the way there with:
egen college_tag=group(id collegeid) if collegeid!=.
by id: gen college_tag2=(college_tag!=[_n-1]) if collegeid!=.
by id: gen college_seq=sum(college_tag2) if collegeid!=.
sort id EnrGrad_Dates
This gets me:
Ideally, I'd like to have the college_seq variable arranged numerically by collegeid & EnrGrad_Dates, but if I try to bysort on collegeid & EnrGrad_Dates, it treats each date like it's own entity, which makes sense. If it wasn't the case that ids concurrently enroll in a college (two colleges--one term) or that they go back to previously enrolled in colleges, the normal approach to spells would work out great, but because I don't want an already attended (preceding row within an id) college to start a new spell, I struggle with the spell approach. Is there a way to use a spell approach that takes the repeated collegeid variable into account, or is there a way to manipulate my college_seq variable to get to the same place I'd like to. Here's what I'd want to have:
Thanks!
I'm using Stata 13 SE and have a dataset that looks like:
id | collegeid | Enroll Date | End Date | Grad Date | EnrGrad_Dates |
1 | 2341 | 13387 | 13507 | 13387 | |
1 | 8789 | 13390 | 13510 | 13390 | |
1 | 2341 | 13537 | 13657 | 13537 | |
1 | 2341 | 13743 | 13863 | 13743 | |
1 | 2341 | 13942 | 14062 | 13942 | |
1 | 2341 | 14232 | 14352 | 14232 | |
1 | 2341 | . | . | 14578 | 14578 |
2 | 4563 | 17891 | 18011 | 17891 | |
2 | 4563 | 18002 | 18122 | 18002 | |
2 | 4563 | 18250 | 18370 | 18250 | |
2 | 4563 | 18353 | 18473 | 18353 | |
2 | 4563 | 19567 | 19624 | 19567 | |
3 | . | . | . | . | . |
4 | 1289 | 18763 | 18883 | 18763 | |
4 | 1289 | 18987 | 19107 | 18987 | |
4 | 9089 | 20032 | 20152 | 20032 | |
4 | 9089 | 20198 | 20318 | 20198 | |
4 | 1289 | 20346 | 20466 | 20346 |
egen college_tag=group(id collegeid) if collegeid!=.
by id: gen college_tag2=(college_tag!=[_n-1]) if collegeid!=.
by id: gen college_seq=sum(college_tag2) if collegeid!=.
sort id EnrGrad_Dates
This gets me:
id | college id | Enroll Date | End Date | Grad Date | EnrGrad_Dates | college_tag | college_tag2 | college_seq |
1 | 8789 | 13387 | 13507 | 13387 | 1 | 1 | 3 | |
1 | 1245 | 13390 | 13510 | 13390 | 2 | 1 | 1 | |
1 | 8789 | 13537 | 13657 | 13537 | 1 | 0 | 3 | |
1 | 2341 | 13743 | 13863 | 13743 | 1 | 0 | 2 | |
1 | 2341 | 13942 | 14062 | 13942 | 1 | 0 | 2 | |
1 | 2341 | 14232 | 14352 | 14232 | 1 | 0 | 2 | |
1 | 2341 | . | . | 14578 | 14578 | 1 | 0 | 2 |
2 | 4563 | 17891 | 18011 | 17891 | 3 | 1 | 1 | |
2 | 4563 | 18002 | 18122 | 18002 | 3 | 0 | 1 | |
2 | 4563 | 18250 | 18370 | 18250 | 3 | 0 | 1 | |
2 | 4563 | 18353 | 18473 | 18353 | 3 | 0 | 1 | |
2 | 4563 | 19567 | 19624 | 19567 | 3 | 0 | 1 | |
3 | . | . | . | . | . | . | . | . |
4 | 1289 | 18763 | 18883 | 18883 | 4 | 1 | 1 | |
4 | 1289 | 18987 | 19107 | 19107 | 4 | 0 | 1 | |
4 | 9089 | 20032 | 20152 | 20152 | 5 | 1 | 2 | |
4 | 9089 | 20198 | 20318 | 20318 | 5 | 0 | 2 | |
4 | 1289 | 20346 | 20466 | 20466 | 4 | 0 | 1 |
Ideally, I'd like to have the college_seq variable arranged numerically by collegeid & EnrGrad_Dates, but if I try to bysort on collegeid & EnrGrad_Dates, it treats each date like it's own entity, which makes sense. If it wasn't the case that ids concurrently enroll in a college (two colleges--one term) or that they go back to previously enrolled in colleges, the normal approach to spells would work out great, but because I don't want an already attended (preceding row within an id) college to start a new spell, I struggle with the spell approach. Is there a way to use a spell approach that takes the repeated collegeid variable into account, or is there a way to manipulate my college_seq variable to get to the same place I'd like to. Here's what I'd want to have:
id | college id | Enroll Date | End Date | Grad Date | EnrGrad_Dates | college_tag | college_tag2 | college_seq | desired_seq |
1 | 8789 | 13387 | 13507 | 13387 | 1 | 1 | 3 | 1 | |
1 | 1245 | 13390 | 13510 | 13390 | 2 | 1 | 1 | 2 | |
1 | 8789 | 13537 | 13657 | 13537 | 1 | 0 | 3 | 1 | |
1 | 2341 | 13743 | 13863 | 13743 | 1 | 0 | 2 | 3 | |
1 | 2341 | 13942 | 14062 | 13942 | 1 | 0 | 2 | 3 | |
1 | 2341 | 14232 | 14352 | 14232 | 1 | 0 | 2 | 3 | |
1 | 2341 | . | . | 14578 | 14578 | 1 | 0 | 2 | 3 |
2 | 4563 | 17891 | 18011 | 17891 | 3 | 1 | 1 | 1 | |
2 | 4563 | 18002 | 18122 | 18002 | 3 | 0 | 1 | 1 | |
2 | 4563 | 18250 | 18370 | 18250 | 3 | 0 | 1 | 1 | |
2 | 4563 | 18353 | 18473 | 18353 | 3 | 0 | 1 | 1 | |
2 | 4563 | 19567 | 19624 | 19567 | 3 | 0 | 1 | 1 | |
3 | . | . | . | . | . | . | . | . | |
4 | 1289 | 18763 | 18883 | 18883 | 4 | 1 | 1 | 1 | |
4 | 1289 | 18987 | 19107 | 19107 | 4 | 0 | 1 | 1 | |
4 | 9089 | 20032 | 20152 | 20152 | 5 | 1 | 2 | 2 | |
4 | 9089 | 20198 | 20318 | 20318 | 5 | 0 | 2 | 2 | |
4 | 1289 | 20346 | 20466 | 20466 | 4 | 0 | 1 | 1 |
Thanks!