Problem with merging datasets - Variables do not uniquely identify

Froeydis Jensen

Join Date: Mar 2019

Posts: 6
#1

Problem with merging datasets - Variables do not uniquely identify

19 Mar 2019, 03:26

Hi,

I'm new to the forum, so I'm just learning the ropes around here. English is not my first language, so do overlook the grammatical mistakes.

I have two datasets that I want to merge, MaternityDB.dta and AllowancesDB.dta. The data sets look like this (only excerpts):

COUNTRY_ISO CODE YEAR ML_WKS

AU 1 1960 0

AU 1 1961 0

AT 2 1960 0

COUNTRY_ISO CODE YEAR TFR

AU 1 1960 3.45

AU 1 1961 3.55

AT 2 1960 2.69

I used this code:
use MaternityDB.dta, clear
merge m:1 COUNTRY_ISO CODE YEAR using AllowancesDB.dta

I think is a bit hard to understand the difference between one-to-one, one-to-many and many-to-one merges, so I tried all of them. However, I got the same red message from Stata: variables COUNTRY_ISO CODE YEAR do not uniquely identify observations in the using data/master data (depending which of the merge alternatives I tried).

I have tried to understand what the message means, but my searches haven't given results. Do any of you know what I do wrong? I would much appreciate the help, and I hope I have given the information needed.

Kind regards,
Frøydis
Tags: None
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

19 Mar 2019, 04:12

Please read the FQ. There you'll find advice about sharing data/output/command.

That said, you may try something as:

Code:

joinby CODE using AllowancesDB.dta

Best regards,

Marcos
Comment
Froeydis Jensen

Join Date: Mar 2019

Posts: 6
#3

19 Mar 2019, 07:40

Thank you for the advice, Marcos. I will make sure to read the FAQ thoroughly before starting another thread or posting another question.

Also, thank you for the recommended code, although it did not work.

Kind regards,

Frøydis
Comment
Nerea Becerra

Join Date: Mar 2019

Posts: 14
#4

19 Mar 2019, 08:46

Hi Froeydis,
this could be because you have duplicates in the variable that you are using to match both databases. For example: COUNTRY_ISO (AU, AU)

Nerea
Comment
Sarah Edgington

Join Date: Apr 2014

Posts: 284
#5

19 Mar 2019, 11:49

The data excerpts you show don't have duplicates on the variables you're using for the merge so it's hard to guess what the actual structure of the data is or what you would like the final merge to look like.

Is the combination of COUNTRY_ISO CODE and YEAR a unique identifier for either dataset? You can check this with the command isid. If not, you need to think through why they are not unique. That is, why do you have multiple observations per year. Are there any additional variables that you could use that do uniquely identify observations?

You also need to think through exactly what you want Stata to do with the duplicates. If you can describe what you want we can probably help you figure out how to get there. Without that information we'll just be guessing at how your data are linked.
Comment

David Benson

Join Date: Oct 2018
Posts: 489

19 Mar 2019, 23:06

If COUNTRY_ISO CODE YEAR should uniquely identify the data (in both datasets), than you will want to use a 1:1 merge. (That's also my guess just looking at your two short samples of data you posted in your original post.

I would use the duplicates command to determine which observations are causing you problems:

Code:

dataex id country_iso code year ml_wks  // ssc install dataex
clear
input float id str2 country_iso byte code int year byte ml_wks
1 "AT" 2 1960 0
2 "AT" 2 1960 0
3 "AT" 2 1960 0
4 "AU" 1 1960 2
5 "AU" 1 1960 0
6 "AU" 1 1961 0
7 "AU" 1 1961 5
8 "AU" 1 1962 5
9 "AU" 1 1963 5
end

------------------ copy up to and including the previous line ------------------

Code:

isid country_iso code year  // try this in both datasets
help duplicates
duplicates tag country_iso code year, gen(dup1)

. list, sepby(country_iso) noobs

  +---------------------------------------------+
  | id   countr~o   code   year   ml_wks   dup1 |
  |---------------------------------------------|
  |  1         AT      2   1960        0      2 |
  |  2         AT      2   1960        0      2 |
  |  3         AT      2   1960        0      2 |
  |---------------------------------------------|
  |  4         AU      1   1960        2      1 |
  |  5         AU      1   1960        0      1 |
  |  6         AU      1   1961        0      1 |
  |  7         AU      1   1961        5      1 |
  |  8         AU      1   1962        5      0 |
  |  9         AU      1   1963        5      0 |
  +---------------------------------------------+

. list, sepby(country_iso year) noobs

  +---------------------------------------------+
  | id   countr~o   code   year   ml_wks   dup1 |
  |---------------------------------------------|
  |  1         AT      2   1960        0      2 |
  |  2         AT      2   1960        0      2 |
  |  3         AT      2   1960        0      2 |
  |---------------------------------------------|
  |  4         AU      1   1960        2      1 |
  |  5         AU      1   1960        0      1 |
  |---------------------------------------------|
  |  6         AU      1   1961        0      1 |
  |  7         AU      1   1961        5      1 |
  |---------------------------------------------|
  |  8         AU      1   1962        5      0 |
  |---------------------------------------------|
  |  9         AU      1   1963        5      0 |
  +---------------------------------------------+

. tabulate dup

       dup1 |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |          2       22.22       22.22
          1 |          4       44.44       66.67
          2 |          3       33.33      100.00
------------+-----------------------------------
      Total |          9      100.00

*** Then you could do a "brow [varlist] if dup1 >=1"

Last edited by David Benson; 19 Mar 2019, 23:14.

COUNTRY_ISO	CODE	YEAR	ML_WKS
AU	1	1960	0
AU	1	1961	0
AT	2	1960	0

COUNTRY_ISO	CODE	YEAR	TFR
AU	1	1960	3.45
AU	1	1961	3.55
AT	2	1960	2.69

Announcement

Problem with merging datasets - Variables do not uniquely identify

Comment

Comment

Comment

Comment

Comment