Data set matching

Laksh Noori

Join Date: Dec 2018

Posts: 23
#1

Data set matching

03 Dec 2018, 18:10

I have 2 data sets with 75% similar names and 25% unique names. Values don't repeat within the data set ie there are no duplicate values within data set.
I want to compare both data sets, and delete unique values from each data set, so that both data sets have names of same group of individuals only. They will remain 2 separate data sets, I don't want to merge or append them.

How should I go about doing it?

Data set Before

Dataset 1 Dataset 2

A B

B G

C A

D E

F C

L

Data set After

Dataset 1 Dataset 2

A B

B A

C C
Tags: None

Joseph Coveney

Join Date: Apr 2014
Posts: 4451

03 Dec 2018, 19:28

Maybe consider something like this:

Code:

use dataset1, clear
contract name, freq(discard)
merge 1:m name using dataset2, keep(match) nogenerate noreport
contract name
drop _freq
preserve
merge 1:m name using dataset1, keep(match) nogenerate noreport
save dataset1Revised
restore
merge 1:m name using dataset2, keep(match) nogenerate noreport
save dataset2Revised

Comment

Romalpa Akzo

Join Date: Oct 2017

Posts: 369
#3

04 Dec 2018, 01:40

Similar solution with Joseph's, just different type of coding.

Code:

use dataset2, clear keep name joinby using dataset1 save dataset1_revised, replace keep name joinby using dataset2 save dataset2_revised, replace
1 like
Comment
Laksh Noori

Join Date: Dec 2018

Posts: 23
#4

09 May 2019, 15:52

Thanks! What if the variables are same, but differ by upper/lower case? For example, A - Celine Peter in Dataset 1 and CELINE PETER in Dataset 2? The code only matches variables which are exactly the same in terms of upper/lower case.
Comment
Laksh Noori

Join Date: Dec 2018

Posts: 23
#5

09 May 2019, 16:01

Got it, thanks! It's gen lname = lower(name) and repeat the above steps!
Comment

Dataset 1	Dataset 2
A	B
B	G
C	A
D	E
F	C
	L

Dataset 1	Dataset 2
A	B
B	A
C	C

Announcement

Comment

Comment

Comment

Comment