Merge two data sets - error: no unique identifier

Julia Brunet

Join Date: May 2019

Posts: 8
#1

Merge two data sets - error: no unique identifier

23 Oct 2019, 01:38

Hello everyone,

I have the following problem and would kindly appreciate your help:

I have 2 datasets I'd like to merge.

The first one contains the stock of migrants coming from one country and staying in another in a particular year, and some individual characteristics on this stock (gender and level of education, which can be low, medium or high). It thus looks like that:

destination, origin, code of country of destination, code of country of origin, gender, year (I have data from 1980 to 2010, in 5-year intervals), low (number of migrants with low level of education), medium, high, total (adds up migrants).

The second one contains data on bilateral trade agreements and their provisions. And it looks like: Importer code, exporter code, year, agreement, year of entry in force, type (FTA, PTA, CU,...) and provisions wich are coded with dummies depending on whether the agreement cotanins such a provision or not. I then recode importer and exporter into country of destination and origin.

I'd like to merge the two datasets based on country of origin, country of destination and year but I find that in both cases there's no unique identifier. I understand why that is (in the first case, using these 3 variables, there's one case for males and one for females). In the second database, I again face a similar situation, using ExporterCode, ImporterCode and year I don't have unique observations because there are different agreements that concern the same pair of countries in the same year.

Could anyone suggest me a way to overcome this problem?

Many thanks

Julia

Last edited by Julia Brunet; 23 Oct 2019, 01:55.
Tags: None
Julia Brunet

Join Date: May 2019

Posts: 8
#2

23 Oct 2019, 01:39

I forgot: what I have tried so far is:

//Merge
use iabbd_8010_v1.dta, clear
sort ccode_origin ccode_destination year
save iabbd_8010_v1.dta, replace

use Horizontal_Depth_DB.dta, clear
rename ExporterCode ccode_origin
rename ImporterCode ccode_destination
sort ccode_origin ccode_destination year

merge 1:m ccode_origin ccode_destination year using iabbd_8010_v1.dta
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10186
#3

23 Oct 2019, 03:42

It is difficult to determine how you want the merged data set to look like. You can start by copying and pasting the output from the following. Your explanations may make sense once we see your variables.

Code:

use iabbd_8010_v1.dta, clear sort ccode_origin ccode_destination year dataex in 1/10 use Horizontal_Depth_DB.dta, clear rename ExporterCode ccode_origin rename ImporterCode ccode_destination sort ccode_origin ccode_destination year dataex in 1/10

If you are running an older version of Stata which does not have the dataex command, first

Code:

ssc install dataex
Comment

Julia Brunet

Join Date: May 2019
Posts: 8

23 Oct 2019, 04:05

Hello Andrew,

many thanks.

For the first dataset:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str14 destination str32 origin str3 ccode_destination str7    ccode_origin    str6    gender    int    year    long(tot    low    med    high)
"Australia" "Unknown" "AUS" "" "Male"   1980   7745   2049  1713    3983
"Australia" "Unknown" "AUS" "" "Female" 1980   7087   2471  1454    3162
"Australia" "Unknown" "AUS" "" "Male"   1985   9291   2346  2035    4910
"Australia" "Unknown" "AUS" "" "Female" 1985   8422   2775  1626    4021
"Australia" "Unknown" "AUS" "" "Female" 1990 106963  94633  6340    5990
"Australia" "Unknown" "AUS" "" "Male"   1990 105539  92874  5924    6741
"Australia" "Unknown" "AUS" "" "Male"   1995   1528    425   297    806
"Australia" "Unknown" "AUS" "" "Female" 1995    853    362   207    284
"Australia" "Unknown" "AUS" "" "Male"   2000 297223 226489 35770    34964
"Australia" "Unknown" "AUS" "" "Female" 2000 304111 233670 35596    34845
end

For the second one:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str3(ccode_destination ccode_origin) int year str91 agreement	int	entry_force	str9	type	byte(wto_x_labourmarketregulation	wto_x_visaandasylum_le)	double(yaerinto	dup)
"BDI" "AGO" 1994 "COMESA" 1994 "CU" 1 0 -1993 0
"BDI" "AGO" 1995 "COMESA" 1994 "CU" 1 0 -1993 0
"BDI" "AGO" 1996 "COMESA" 1994 "CU" 1 0 -1993 0
"BDI" "AGO" 1997 "COMESA" 1994 "CU" 1 0 -1993 0
"BDI" "AGO" 1998 "COMESA" 1994 "CU" 1 0 -1993 0
"BDI" "AGO" 1999 "COMESA" 1994 "CU" 1 0 -1993 0
"BDI" "AGO" 2000 "COMESA" 1994 "CU" 1 0 -1993 0
"BDI" "AGO" 2001 "COMESA" 1994 "CU" 1 0 -1993 0
"BDI" "AGO" 2002 "COMESA" 1994 "CU" 1 0 -1993 0
"BDI" "AGO" 2003 "COMESA" 1994 "CU" 1 0 -1993 0
end

Even though it can't be seen in the second dataset, for a lot of the country-pair combinations, there are often more than one trade agreement for that year. Eg. BDI and AGO in 1994 sign or are concerned by trade agreement 1 but also by trade agreement 2.

Thanks

Comment

Julia Brunet

Join Date: May 2019
Posts: 8

23 Oct 2019, 04:07

Sorry, I pasted it in the wrong format. Again, dataset 2:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str3(ccode_destination ccode_origin) int year str91 agreement    str9    type    byte(wto_x_labourmarketregulation    wto_x_visaandasylum_le)    double    dup
"BDI" "AGO" 1994 "COMESA" "CU" 1 0 0
"BDI" "AGO" 1995 "COMESA" "CU" 1 0 0
"BDI" "AGO" 1996 "COMESA" "CU" 1 0 0
"BDI" "AGO" 1997 "COMESA" "CU" 1 0 0
"BDI" "AGO" 1998 "COMESA" "CU" 1 0 0
"BDI" "AGO" 1999 "COMESA" "CU" 1 0 0
"BDI" "AGO" 2000 "COMESA" "CU" 1 0 0
"BDI" "AGO" 2001 "COMESA" "CU" 1 0 0
"BDI" "AGO" 2002 "COMESA" "CU" 1 0 0
"BDI" "AGO" 2003 "COMESA" "CU" 1 0 0
end

Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10186

23 Oct 2019, 04:33

Thanks for the data example. I think you want

Code:

use Horizontal_Depth_DB.dta, clear
rename ExporterCode ccode_origin
rename ImporterCode ccode_destination
sort ccode_origin ccode_destination year
save Horizontal_Depth_DB.dta, replace

use iabbd_8010_v1.dta, clear
sort ccode_origin ccode_destination year
joinby ccode_origin ccode_destination year using Horizontal_Depth_DB.dta

Comment

Julia Brunet

Join Date: May 2019

Posts: 8
#7

23 Oct 2019, 05:25

Hello Andrew,

yes, in the end I figured out myself too that I needed joinby instead of merge to generate all possible pairwise combinations.

Again, thanks for taking the time to help me.

Best,

Julia
Comment

Announcement