Doubts in merging two datasets

Fabio Delisio

Join Date: Aug 2023

Posts: 13
#1

Doubts in merging two datasets

04 Oct 2023, 04:01

Good morning everyone,
I merged two datasets using a 1:m merge. However, some observations didn't match, mostly from the master dataset.

Since I struggle in understanding the "merge" logic, I have few questions to ask:
1) How do you know if the type of merge (1:m or m:1) is correct for your purpose?
2) Why some observations are not merging, even if the variable name is exactly the same?
3) Suppose you do a merge and you obtain unmatched observations, what are your suggestions on how to proceed with the analysis?

Thanks.
Tags: None
Maarten Buis

Join Date: Mar 2014

Posts: 3467
#2

04 Oct 2023, 05:29

Originally posted by Fabio Delisio View Post

1) How do you know if the type of merge (1:m or m:1) is correct for your purpose?

The character before the colon refers to the data currently open, and the character after the colon to the file specified in using. Say you have two datasets: dataset1 contains survey data collected in different countries. So a row in this dataset is a person. The country is stored in a variable called country and each country will appear multiple times because multiple people from the same country were interviewed. Dataset2 contains for a set of countries the GDP per capita in a given year. So a row now represents a country and each country appears exactly once.

So if dataset1 (the dataset with individuals) is currently open, then you would type:

merge m:1 country using dataset2

because each country can appear multiple times in the dataset in memory but only once in the dataset specified in using.

If dataset2 is currently open, then you would type

merge 1:m country using dataset1

Originally posted by Fabio Delisio View Post

2) Why some observations are not merging, even if the variable name is exactly the same?

Sometimes there just isn't a match. Continuing the example, the survey was collected in a few countries, while the GDP per capita was collected in many countries. If your survey did not take place in Vatican City, but your dataset2 did contain GDP per capita data on the Vatican City, then no match will be found for Vatican City. This is usually not a problem, and you can just remove the extra observations the merge command creates.

Sometimes, there should be a match. For example, the variable country is a string with the country name, and in dataset1 you use Luxemburg as one of the country names, while dataset2 uses Luxembourg for that same country (or Ivory Coast and Côte D'Ivoire, or ... ). So in this case, there is a problem, and you first need to fix it before doing the merge again. The merge command leaves a variable behind called _merge, which identifies what happened to each observation. You just stare at the values of the problem observations long enough until you figure out what the problem is, and than you fix it. Usually this does not take that long, but sometimes it can be tricky. That is not a big problem, you just need to stare at it a little bit longer.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30164
#3

04 Oct 2023, 10:46

Going off on a tanget from Maarten Buis' outstanding explanation of -merge-, if you every find yourself confronting the problem of countries with alternative name spellings that he mentions, there is a fantastic tool available for resolving almost all of these problems. Rafal Raciborski's -kountry-, available from SSC, can reconcile nearly all such differences and also can crosswalk the commonly used standardized country coding systems if you are confronting a pair of data sets that use different ones.
Comment
Fabio Delisio

Join Date: Aug 2023

Posts: 13
#4

04 Oct 2023, 13:57

Thank you both for the clear explanation and help!
Comment

Announcement

Doubts in merging two datasets

Comment

Comment

Comment