How to keep participants present in every wave/ year in panel data?

Jordan Louis

Join Date: Dec 2020

Posts: 2
#1

How to keep participants present in every wave/ year in panel data?

15 Dec 2020, 10:21

I am using the British Household Panel Survey (BHPS) and have appended all 18 waves together. I now want to keep only those participants who have observations present in each wave. Some may have entered the survey in wave 2 or wave 3 or some started in wave 1 but dropped out later; these I want to drop.

Here is 10 observations out of over 200,000:

1000209 2 2 3 10002251 6 0 3488.5703125 1991 1
1000381 1 2 2 10004491 6 0 1789.7335205078125 1991 1
2000148 1 1 1 10004491 6 0 6345.7158203125 1992 2
1000381 1 2 2 10004521 6 0 1789.7335205078125 1991 1
2000148 1 2 2 10004521 6 0 5826.1171875 1992 2
3000192 1 2 2 10004521 6 0 5101.91064453125 1993 3
1000667 2 2 2 10007857 3 0 7200.06005859375 1991 1
2000296 2 2 3 10007857 3 0 9829.087890625 1992 2
3000257 2 2 2 10007857 3 0 8795.4912109375 1993 3
8410658 2 2 2 10007857 3 0 2258.809814453125 1998 8

This is what I have done:

use "Q:\fulldata.dta"
(Contains individual-level data for respondents)

. tsset pid wave
panel variable: pid (unbalanced)
time variable: wave, 1 to 18, but with gaps
delta: 1 unit

. bysort pid: keep if _N ==18
(206,781 observations deleted)

My problem here is that it just deleted all the observations instead of keeping those present in every wave.

Can someone advice?

Thanks in advance

p.s. first Statalist post, forgive me if the formatting is wrong.
Tags: BHPS, data, panel data
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

15 Dec 2020, 10:50

If -bysort pid: keep if _N == 18- resulted in all observations being deleted, that would suggest that there is nobody in the data set who participated in all 18 waves. That doesn't really seem very surprising to me.

The data example you showed is not helpful. There are no variable names: it's anybody's guess which variables pid, which is wave, etc. Also a data example with 10 observations is not very helpful for solving a problem relating to data chunks of 18 observations! And even had you included all that, the description shown is still missing attributes of the data that are sometimes important (though probably not in this particular situation.) for answering the question posed. The useful way to show example data is with the -dataex- command. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
1 like
Comment
Yara Issa

Join Date: Nov 2020

Posts: 42
#3

22 Feb 2021, 19:59

hello,

I have the same issue as above. I am using the understanding society database and I need to keep those who participated from wave 1 to 5.

i have tried to install the dataex but due to the fact I am connected via bin to the uni server I could not install it
here an example of the data where pidp is the person identifire, wave, age, and number of children

thank you
pidp wave age nchildren
223725 5 38 1 .
261125 2 24 1 .
261125 3 25 1 .
261125 4 26 1 .
299885 4 32 1 .
537205 2 34 1 .
537205 3 35 1 .
541285 3 25 1 .
541285 4 26 1 .
541285 5 27 1 .
665045 2 28 1 .
665045 3 29 1 0
665045 4 30 1 0
665045 5 31 1 0
813285 2 40 1 .
813285 3 41 1 .
813285 4 42 1 .
813285 5 43 1 .
940445 2 30 1 .
945205 2 36 1 .
952005 2 59 1 .
956765 2 55 1 .
956765 3 56 1 .
956765 4 57 1 .
956765 5 58 1 .
1114525 2 36 1 .
1390605 2 19 1 .
1731965 5 22 1 0
1833965 2 45 1 0
1833965 3 46 1 0
1833965 4 47 1 .
1833965 5 48 1 0
2292285 3 36 1 .
2292285 4 36 1 .
2292285 5 38 1 .
2297045 5 16 1 .
2626845 2 32 1 2
2626845 4 34 1 2
2665605 2 38 1 .
2665605 3 39 1 .
2665605 4 40 1 .
2665605 5 41 1 .
2817245 2 48 1 2
2817245 4 50 1 0
2817245 5 51 1 .
2825405 3 30 1 .
2825405 4 31 1 .
2825405 5 32 1 .
2932845 2 27 1 .
2932845 3 28 1 .
2932845 4 29 1 .
3063405 3 39 1 .
3063405 4 40 1 .
3489765 2 68 1 .
3489765 3 69 1 .
3489765 4 70 1 .
3489765 5 71 1 .
3565245 4 24 1 .
3565245 5 25 1 .
3567285 4 20 1 .
3567285 5 21 1 .
3568645 4 17 1 .
3568645 5 17 1 .
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#4

22 Feb 2021, 21:34

Code:

isid pidp wave, sort assert inlist(wave, 1, 2, 3, 4, 5) by pidp (wave): keep if _N == 5

Note: In your example data, nobody is in all five waves. In fact, nobody is in wave 1. So if your real data set is like this, you will ended up with an empty data set.
Comment
Yara Issa

Join Date: Nov 2020

Posts: 42
#5

23 Feb 2021, 21:36

thank you for your replay. I appreciate you help
Comment

Announcement

How to keep participants present in every wave/ year in panel data?

Comment

Comment

Comment

Comment