Hi everyone,
I've been looking through forums but I can't seem to find an answer to my question.
I have a panel dataset that often has three waves (6, 7, 8). We need to collapse this to a single line per person, so we are needing to economise. Where people have been to multiple schools we are taking the mode
However, there are some people that have been to three different schools, or two different schools with one line missing. In this instance, we would like to take the most recent wave. This must be somewhat straight forward but I cannot think of how to do this without replacing the existing data. We also need all three waves per person to be filled with the most recent school not only one line which is the solution I seem to have via
Could someone help me with this please? I've given an example of what I have (mode_school) and what I need (wanted_column)
I've been looking through forums but I can't seem to find an answer to my question.
I have a panel dataset that often has three waves (6, 7, 8). We need to collapse this to a single line per person, so we are needing to economise. Where people have been to multiple schools we are taking the mode
Code:
bysort id (wave): egen mode_school = mode(school_id)
Code:
bysort id (wave): replace mode_school = school_id if mode_school ==. & wave==8
ID | WAVE | SCHOOL_ID | MODE_SCHOOL | WANTED_COLUMN | |
1 | 6 | 12345 | 12345 | 12345 | |
1 | 7 | 12345 | 12345 | 12345 | |
1 | 8 | 12345 | 12345 | 12345 | |
2 | 6 | . | . | 45678 | |
2 | 7 | 34567 | . | 45678 | |
2 | 8 | 45678 | . | 45678 | |
3 | 6 | . | . | . | |
3 | 7 | . | . | . | |
3 | 8 | . | . | . | |
4 | 6 | 12345 | . | 89078 | |
4 | 7 | 54321 | . | 89078 | |
4 | 8 | 89078 | . | 89078 | |
5 | 6 | 45678 | 45678 | 45678 | |
5 | 8 | 45678 | 45678 | 45678 |
Comment