I have two data sets; 1st one is household (HH) and other one is individual. The HH and individual files have one unique ID.
There are more than 55,000 observation in the HH file and the individual file has 120,000 observations.
There are many duplicate observations in both HH and individual files. I can identify duplicate and delete from the HH file. But, the problem is that I have to identify the corresponding duplicates in the individual file too.
Can you suggest ..... How to identify and delete the corresponding duplicate observations from the individual file. Thank you...
Given below is the sample for the data. The bold observations are duplicate and need to identified and deleted.
There are more than 55,000 observation in the HH file and the individual file has 120,000 observations.
There are many duplicate observations in both HH and individual files. I can identify duplicate and delete from the HH file. But, the problem is that I have to identify the corresponding duplicates in the individual file too.
Can you suggest ..... How to identify and delete the corresponding duplicate observations from the individual file. Thank you...
Given below is the sample for the data. The bold observations are duplicate and need to identified and deleted.
HH File | Individual File | |||||||||||
UID | V1 | V2 | V3 | V4 | V5 | UID | V1 | V2 | V3 | V4 | V5 | |
1 | 10 | 20 | 30 | 40 | 50 | 1 | 10 | 20 | 30 | 40 | 50 | |
2 | 11 | 21 | 31 | 41 | 51 | 1 | 9 | 8 | 7 | 6 | 5 | |
3 | 12 | 22 | 32 | 42 | 52 | 1 | 6 | 5 | 4 | 3 | 2 | |
4 | 13 | 23 | 33 | 43 | 53 | 2 | 11 | 21 | 31 | 41 | 51 | |
5 | 14 | 24 | 34 | 44 | 54 | 2 | 3 | 2 | 1 | 3 | 2 | |
6 | 15 | 25 | 35 | 45 | 55 | 3 | 12 | 22 | 32 | 42 | 52 | |
7 | 16 | 26 | 36 | 46 | 56 | 3 | 7 | 6 | 5 | 4 | 3 | |
8 | 17 | 27 | 37 | 47 | 57 | 3 | 6 | 5 | 4 | 3 | 2 | |
1 | 10 | 20 | 30 | 40 | 50 | 4 | 13 | 23 | 33 | 43 | 53 | |
3 | 12 | 22 | 32 | 42 | 52 | 4 | 5 | 4 | 3 | 2 | 1 | |
2 | 11 | 21 | 31 | 41 | 51 | 4 | 4 | 3 | 2 | 1 | 4 |
Comment