I am having troubles when trying to include individuals who were interviewed in the 1st, 2nd, 3rd and 4th wave of SHARE dataset and who were 50–75 years old at the 2nd wave interview. Specifically, the part were the interviewed individuals have to be between 50-75 years old at the 5th wave interview is what I am failing to achieve without stata automatically dropping all other waves.
This is what I have done so far:
1. In order to keep only those who participated in all waves (1-4) I did the following (which was succesful):
recode wave (4=1) (5=2) (6=3) (7=4)
isid mergeid wave, sort
assert inlist(wave, 1, 2, 3, 4)
by mergeid (wave): keep if _N == 4
My sample now only includes individuals who are interviewed in all waves. I can see this by:
tab wave
wave | Freq. Percent Cum.
------------+-----------------------------------
1 | 24,352 25.00 25.00
2 | 24,352 25.00 50.00
3 | 24,352 25.00 75.00
4 | 24,352 25.00 100.00
------------+-----------------------------------
Total | 97,408 100.00
Then I de-string the person identifier variable (called mergeid): encode mergeid, generate (id) label (id)
2. This is the critical part. I am now trying to include only those who were interviewed in all waves AND WHO WERE 50-75 YEARS OLD AT THE 5TH WAVE INTERVIEW (which is now labelled as wave 2).
The are 4 age variables in my dataset, one for each wave (age2011 for the 1st wave, age2013 for the 2nd wave, age2015 for wave 3I have tried 2 methods:
FIRST METHOD:.drop if age2013<50
drop if age2013>75
SECOND METHOD: bysort id (wave): drop if age2013<50
bysort id (wave): drop if age2013>75
In both cases, when I enter tab wave, stata automatically drops the 1st, 3rd, and 4th wave, leaving me only with wave 2:
wave | Freq. Percent Cum.
------------+-----------------------------------
2 | 19,177 100.00 100.00
------------+-----------------------------------
Total | 19,177 100.00
I would be really thankful if someone could explain to me what I am doing wrong, and what I can do to perform this type of sample restriction, i.e. I want to keep only those individuals who are interviewed in all 4 waves, AND who were between 50–75 years old at the 2nd wave interview.
Many thanks,
Guri
This is what I have done so far:
1. In order to keep only those who participated in all waves (1-4) I did the following (which was succesful):
recode wave (4=1) (5=2) (6=3) (7=4)
isid mergeid wave, sort
assert inlist(wave, 1, 2, 3, 4)
by mergeid (wave): keep if _N == 4
My sample now only includes individuals who are interviewed in all waves. I can see this by:
tab wave
wave | Freq. Percent Cum.
------------+-----------------------------------
1 | 24,352 25.00 25.00
2 | 24,352 25.00 50.00
3 | 24,352 25.00 75.00
4 | 24,352 25.00 100.00
------------+-----------------------------------
Total | 97,408 100.00
Then I de-string the person identifier variable (called mergeid): encode mergeid, generate (id) label (id)
2. This is the critical part. I am now trying to include only those who were interviewed in all waves AND WHO WERE 50-75 YEARS OLD AT THE 5TH WAVE INTERVIEW (which is now labelled as wave 2).
The are 4 age variables in my dataset, one for each wave (age2011 for the 1st wave, age2013 for the 2nd wave, age2015 for wave 3I have tried 2 methods:
FIRST METHOD:.drop if age2013<50
drop if age2013>75
SECOND METHOD: bysort id (wave): drop if age2013<50
bysort id (wave): drop if age2013>75
In both cases, when I enter tab wave, stata automatically drops the 1st, 3rd, and 4th wave, leaving me only with wave 2:
wave | Freq. Percent Cum.
------------+-----------------------------------
2 | 19,177 100.00 100.00
------------+-----------------------------------
Total | 19,177 100.00
I would be really thankful if someone could explain to me what I am doing wrong, and what I can do to perform this type of sample restriction, i.e. I want to keep only those individuals who are interviewed in all 4 waves, AND who were between 50–75 years old at the 2nd wave interview.
Many thanks,
Guri

Comment