I am using the British Household Panel Survey (BHPS) and have appended all 18 waves together. I now want to keep only those participants who have observations present in each wave. Some may have entered the survey in wave 2 or wave 3 or some started in wave 1 but dropped out later; these I want to drop.
Here is 10 observations out of over 200,000:
1000209 2 2 3 10002251 6 0 3488.5703125 1991 1
1000381 1 2 2 10004491 6 0 1789.7335205078125 1991 1
2000148 1 1 1 10004491 6 0 6345.7158203125 1992 2
1000381 1 2 2 10004521 6 0 1789.7335205078125 1991 1
2000148 1 2 2 10004521 6 0 5826.1171875 1992 2
3000192 1 2 2 10004521 6 0 5101.91064453125 1993 3
1000667 2 2 2 10007857 3 0 7200.06005859375 1991 1
2000296 2 2 3 10007857 3 0 9829.087890625 1992 2
3000257 2 2 2 10007857 3 0 8795.4912109375 1993 3
8410658 2 2 2 10007857 3 0 2258.809814453125 1998 8
This is what I have done:
use "Q:\fulldata.dta"
(Contains individual-level data for respondents)
. tsset pid wave
panel variable: pid (unbalanced)
time variable: wave, 1 to 18, but with gaps
delta: 1 unit
. bysort pid: keep if _N ==18
(206,781 observations deleted)
My problem here is that it just deleted all the observations instead of keeping those present in every wave.
Can someone advice?
Thanks in advance
p.s. first Statalist post, forgive me if the formatting is wrong.
Here is 10 observations out of over 200,000:
1000209 2 2 3 10002251 6 0 3488.5703125 1991 1
1000381 1 2 2 10004491 6 0 1789.7335205078125 1991 1
2000148 1 1 1 10004491 6 0 6345.7158203125 1992 2
1000381 1 2 2 10004521 6 0 1789.7335205078125 1991 1
2000148 1 2 2 10004521 6 0 5826.1171875 1992 2
3000192 1 2 2 10004521 6 0 5101.91064453125 1993 3
1000667 2 2 2 10007857 3 0 7200.06005859375 1991 1
2000296 2 2 3 10007857 3 0 9829.087890625 1992 2
3000257 2 2 2 10007857 3 0 8795.4912109375 1993 3
8410658 2 2 2 10007857 3 0 2258.809814453125 1998 8
This is what I have done:
use "Q:\fulldata.dta"
(Contains individual-level data for respondents)
. tsset pid wave
panel variable: pid (unbalanced)
time variable: wave, 1 to 18, but with gaps
delta: 1 unit
. bysort pid: keep if _N ==18
(206,781 observations deleted)
My problem here is that it just deleted all the observations instead of keeping those present in every wave.
Can someone advice?
Thanks in advance
p.s. first Statalist post, forgive me if the formatting is wrong.
Comment