Question on how to eliminate only certain data

Oscar Weinzettl

Join Date: Nov 2018

Posts: 70
#1

Question on how to eliminate only certain data

14 Apr 2019, 11:03

Good Evening and sorry for the not very descriptive title. I am not sure how exactly to word my issue for a title.

I have a large amount of data that I would like to cut down to the relevant panel part. So there is a variable (in the code below it is sa0110) that indicates if the household was in the last wave too and the number in the sa0110 is the id of the household in the first wave. The variable survey indicates if the information is from wave 1 or wave 2.

So in the example below one can see that household 234 and 456 were in both wave 1 and 2. Households 123, 345, 567, 678, 789 and 890 were not. So I would like to eliminate all these households out of the data so I would only have household that only appear in either wave 1 or 2, in the example below that would be 234 and 456. However 234 and 456 also are in wave 1 so I am not sure sure how I would keep these observations in wave 1 and in wave 2 while eliminating only those observation which household only appear in one wave. How would I do this?

The variable implicate is just an indicator of the 5 multiple imputation and the original data point.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte survey int id byte implicate int sa0110 1 123 0 . 1 123 1 . 1 123 2 . 1 123 3 . 1 123 4 . 1 123 5 . 1 234 0 . 1 234 1 . 1 234 2 . 1 234 3 . 1 234 4 . 1 234 5 . 1 345 0 . 1 345 1 . 1 345 2 . 1 345 3 . 1 345 4 . 1 345 5 . 1 456 0 . 1 456 1 . 1 456 2 . 1 456 3 . 1 456 4 . 1 456 5 . 1 567 0 . 1 567 1 . 1 567 2 . 1 567 3 . 1 567 4 . 1 567 5 . 1 678 0 . 1 678 1 . 1 678 2 . 1 678 3 . 1 678 4 . 1 678 5 . 1 789 0 . 1 789 1 . 1 789 2 . 1 789 3 . 1 789 4 . 1 789 5 . 1 890 0 . 1 890 1 . 1 890 2 . 1 890 3 . 1 890 4 . 1 890 5 . 2 234 1 234 2 234 2 234 2 234 3 234 2 234 4 234 2 234 5 234 2 234 0 234 2 456 1 456 2 456 2 456 2 456 3 456 2 456 4 456 2 456 5 456 2 456 0 456 end
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30174
#2

14 Apr 2019, 11:21

So in the example below one can see that household 234 and 456 were in both wave 1 and 2. Households 123, 345, 567, 678, 789 and 890 were not. So I would like to eliminate all these households out of the data so I would only have household that only appear in either wave 1 or 2, in the example below that would be 234 and 456. However 234 and 456 also are in wave 1 so I am not sure sure how I would keep these observations in wave 1 and in wave 2 while eliminating only those observation which household only appear in one wave.

This paragraph appears to contradict itself, and I do not understand what you actually want and why.

When you refer to "all these" households, do you mean just 123, 345, 567, 678, 789, and 890, or also 234 and 456. The reference is ambiguous. "that would be 234 and 456" is simply not true if you are referring to appearing "in either wave 1 or 2." Do you perhaps mean both waves 1 and 2?

Please clarify.
Comment
Oscar Weinzettl

Join Date: Nov 2018

Posts: 70
#3

14 Apr 2019, 14:57

Clyde,
Ah sorry for explaining that so poorly. It should be

So in the example below one can see that household 234 and 456 were in both wave 1 and 2. Households 123, 345, 567, 678, 789 and 890 were not. So I would like to eliminate all these households out of the data so I would only have household that appeared in BOTH wave 1 or 2, in the example below that would be 234 and 456. .

Basically as you can see 234 and 456 appear in both waves, however in wave one thee two observations have a missing entry in the variable sa0110. sa0110 is a variable that appears in the second wave to mark households that have also previously appeared in the first wave. So naturally I would like to keep both observations in both waves for 234 and 456.

However I am unsure how to do so, since I cannot think of a way to distinguish the data within wave 1. How can I keep 234 and 456 in wave 1 while removing, as an example, 123?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30174
#4

14 Apr 2019, 15:20

I'm still not sure I understand. And, in particular, the variable sa0110 strikes me as superfluous--what am I missing? Anyway, if I do understand what you want, the following will do it:

Code:

by id, sort: egen in_wave_1 = max(survey == 1) by id: egen in_wave_2 = max(survey == 2) keep if in_wave_1 & in_wave_2
Comment
Oscar Weinzettl

Join Date: Nov 2018

Posts: 70
#5

15 Apr 2019, 00:25

Yes thank you Clyde! That seems to have worked! But sorry I didn't explain it well enough, what part is it that I need to explain more accurately?

To copy the explanation from the variable list from the survey

SA0010 household identification number (which I renamed to id in my data set above)

SA0110 past household ID (only to be provided by countries with a panel component)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30174
#6

15 Apr 2019, 08:38

Well if the code has done what you want, no further explanation is needed. Glad it worked.
Comment

Announcement

Question on how to eliminate only certain data

Comment

Comment

Comment

Comment

Comment