Merging in Stata version 12 without specifying the merge type like 1:m, what happens then with what consequences?

LydiaSmit

Join Date: Jul 2014

Posts: 77
#1

Merging in Stata version 12 without specifying the merge type like 1:m, what happens then with what consequences?

09 Jul 2014, 23:32

Dear Statalists,

Merging in Stata version 12 without specifying the merge type like 1:m, what happens then with what consequences?
When I try to merge a 4th dataset to my previously merged three datasets, I cannot find/use an unique key variable for that merge, however, when I do not specify 1:1, 1:m, m:1 or m:m for that merge, the merge seems to go perfectly but I'm afraid that not specifying the merge type means using a m:m merge type which I shouldn't use.
Tags: None
Maarten Buis

Join Date: Mar 2014

Posts: 3458
#2

10 Jul 2014, 01:34

If you don't specify 1:m, m:1, 1:1, (or m:m) you are using the old syntax for -merge-. If you don't know what that syntax does, you should probably not use it.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Konrad Zdeb

Join Date: Apr 2014

Posts: 496
#3

10 Jul 2014, 02:29

Lydia,

Merging in Stata is neatly explained here and here. If you want to conducted more sophisticated merging you may have a look at user-written programmes, like xmerge, mmerge and similar. When performing merging it is good to consider uniqueness of your ids and explore newly obtained data, as suggested here. How you decide to merge depends on what you want to achieve with the data.

Kind regards,
Konrad
Version: Stata/IC 13.1
Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

10 Jul 2014, 11:00

Both versions of merge work the same way, it's just that the old version will not check your stated assumptions about identifiers in both the master and the using datasets. Your instincts tell you not to use an m:m merge but you want to merge anyway. Using the old syntax to get around your problem is going to lead to a m:m merge with your data. Since that is not illegal, merge will not complain and you will get results.

With merge, it always pays to practice with toy examples to make sure you understand what's happening. Please run the sample code below and carefully study each step. In both datasets, id does not uniquely identify observations. The example tries 1:1, 1:m, and m:1 merge and all yield an error because of the lack of unique identifiers. Then I show that the m:m merge results are the same as those produced with the old syntax of merge.

Since you don't have unique identifiers in both datasets, you have to decide how to combine observations with the same key value. The first with the first, the second with the second. What happens when the master has less observations that the using (or vice versa)? The last two sections of the sample code below show two possible approaches. Usually, the joinby command is what makes sense in these cases.

Code:

clear
input str1 id x
A 1 1
A 2 2
B 2 3
end
bysort id: gen obsno = _n
tempfile fx
qui save "`fx'"
list, sepby(id) noobs

clear
input str1 id y
A 11
A 12
A 13
A 14
B 12
B 13
B 14
end
bysort id: gen obsno = _n
tempfile fy
qui save "`fy'"
list, sepby(id) noobs

* id does not uniquely identify obs in either datasets; use
* capture and noisily to ignore the error but show the error message
use "`fx'", clear
capture noisily merge 1:1 id using "`fy'"
capture noisily merge 1:m id using "`fy'"
capture noisily merge m:1 id using "`fy'"

* Stata will let you do a m:m merge on these data
use "`fx'", clear
merge m:m id using "`fy'"
sort id x y
list, sepby(id) noobs

* the old syntax just did not check for unique identifiers
* so no errors are reported
use "`fx'", clear
merge id using "`fy'"

* the results are the same as a m:m merge using the new syntax
sort id x y
list, sepby(id) noobs

* there's only one observation with "B" in the first dataset so for
* id == "B", this is equivalent to a 1:m merge. So the merged dataset
* contains all pairwise combinations of x and y.
list if id == "B", sepby(id) noobs

* the problem with m:m merge is that x and y are paired in a way
* that can't be explained logically. Why is 1 matched only with 11 and 
* 2 is matched with 12,13,14?
list if id == "A", sepby(id x) noobs

* If you can't find unique identifiers in either dataset, it would
* perhaps make more sense to add obsno to the merge
use "`fx'", clear
merge 1:1 id obsno using "`fy'"
sort id x y
list, sepby(id) noobs

* But usually, what is desired is that every observation of the master
* dataset be considered unique for the purpose of the merge and match
* all observations with the same key value in the using dataset. That's
* exactly what -joinby- does
use "`fx'", clear
joinby id using "`fy'"
sort id x y
list, sepby(id x) noobs

Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

10 Jul 2014, 13:36

I don't know if the following example will help you understand what joinby does but it does connect it with merge by showing that you can replicate joinby results using multiple 1:m merge.

Code:

clear
input str1 id x
A 1
A 2
A 3
B 2
end
tempfile fx
qui save "`fx'"
list, sepby(id) noobs

clear
input str1 id y
A 11
A 12
A 13
A 14
B 12
B 13
B 14
end
tempfile fy
qui save "`fy'"
list, sepby(id) noobs

* Form all pairwise combinations within id
use "`fx'", clear
joinby id using "`fy'"
sort id x y
list, sepby(id x) noobs

* Replicate the -joinby- results using repeated 1:m merges.
* merge the first observation of each id in master with using
use "`fx'", clear
bysort id (x): keep if _n == 1
list, sepby(id) noobs
merge 1:m id using "`fy'", keep(master match) nogen
sort id x y
list, sepby(id) noobs
tempfile obs1
qui save "`obs1'"

* merge the second observation of each id in master with using
use "`fx'", clear
bysort id (x): keep if _n == 2
list, sepby(id) noobs
merge 1:m id using "`fy'", keep(master match) nogen
sort id x y
list, sepby(id) noobs
tempfile obs2
qui save "`obs2'"

* merge the third observation of each id in master with using
use "`fx'", clear
bysort id (x): keep if _n == 3
list, sepby(id) noobs
merge 1:m id using "`fy'", keep(master match) nogen
sort id x y
list, sepby(id) noobs

* combine all merges to replicate the -joinby- results
append using "`obs2'"
append using "`obs1'"
sort id x y
list, sepby(id x) noobs

Comment

Roberto Ferrer

Join Date: Apr 2014

Posts: 449
#6

10 Jul 2014, 13:43

Lydia,

The recommendation is to read (and re-read) help and manual entries for the commands, before going to non-official sources. If you run help merge and help joinby, you get links at the top of the help files that open pdf manuals which cover the commands extensively. Practice examples like that provided by Robert (lots of them) are of course very useful.

You should:

1. Read the FAQ carefully.

2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.
Comment
Robert Picard

Join Date: Mar 2014

Posts: 1536
#7

10 Jul 2014, 14:11

I would not use risky to describe each task. Using m:m merge is simply reckless. There are many instances where joinby is appropriate. If joinby does not seem appropriate for your particular predicament then you still have to decide what is the correct outcome. We can't see your data so it's hard to make additional suggestions. Perhaps you could post an example where there are multiple observations in both the master and using dataset for a particular value of the key variable and how you would mix these together.
Comment
Roberto Ferrer

Join Date: Apr 2014

Posts: 449
#8

10 Jul 2014, 22:53

Originally posted by LydiaSmit

I'm very curious about the pdf manuals at the top of the help files. Where can I find them? Using help merge and help joinby do not show them.

The blue link under Title should take you to the manual entry for merge.

Another way is through the Help > PDF Documentation menu in the main Stata window. But that will not take you to the specific entry; you can search for it though.

You should:

1. Read the FAQ carefully.

2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.
Comment

Announcement