Merging datasets without uniquely identified observations

Joe Moore

Join Date: Jan 2020

Posts: 2
#1

Merging datasets without uniquely identified observations

17 Jan 2020, 07:06

Hello,

I found a post on this subject but the solutions posted do not work for me.

My master data set has 276 variables and 41,000 observations. Two of the variables are an MFIID and MFIName, however these are the same for multiple observations but with different fiscal years.
My secondary dataset has 5 variables and 3115 observations. It also has the variables MFIID and MFIName but has a single observation for each one (to show the country and region of the institution).

I am attempting to combine the two datasets so that each observation in my master dataset includes the corresponding data for country and region from my secondary dataset. I have tried merge 1:m, merge 1:1 and merge m:1 but I always receive the same error message: r(459); variables MFIID MFIName do not uniquely identify observations in the using data.

Is there a way to merge datasets without having a unique identifier?

Thank you
Tags: None
FernandoRios

Join Date: Apr 2014

Posts: 2469
#2

17 Jan 2020, 07:40

Hi Joe
for your main question. Is there a way to merge dataset wiithout a unique identifier? Yes, you can use -joinby-.
However, there is a more important question. Are you sure that MIFID and MIFName uniquely identify all observations in your "secondary dataset". Keep in mind that a unique identification should include cases where the are missing values. So, take a better look at your second dataset, perhaps drop observations where MIFID and MIFName are both missing. And try the merge 1:m or m:1 once again.
HTH
Fernando
1 like
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#3

17 Jan 2020, 10:25

Try running

Code:

isid MFIID MFIName

in your secondary dataset to confirm that those two variables uniquely identify observations in that dataset as you assert.

If the isid command succeeds, then what you want is

Code:

use master merge m:1 MFIID MFIName using secondary

If the isid command fails, then you need to learn why your secondary dataset has more than one observation for some combination(s) of MFIID and MFIName. The duplicates command is the tool you need.

Code:

use secondary help duplicates duplicates report MFIID MFIName

will be a good starting point.
Comment

Announcement

Merging datasets without uniquely identified observations

Comment

Comment