Joinby Vs 1:m merge

Saeed Sardar

Join Date: Apr 2016

Posts: 55
#1

Joinby Vs 1:m merge

21 Apr 2016, 01:53

I find joinby and one to many merge similar in funcition, can someone help in highlighting the difference between two. Which one to use under what circumstances?
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17714
#2

21 Apr 2016, 01:58

Saeed:
your question is admittedly too broad to get halpful reply.
Please take a look at -merge- and -joinby- entries in Stata .pdf manual and get the differences yourself.

Kind regards,
Carlo
(Stata 19.0)
Comment
eric_a_booth

Join Date: Apr 2014

Posts: 292
#3

21 Apr 2016, 09:21

You can produce the same results with these 2 packages, for example, using merge 1:m if you use joinby ..., unmatched(both) ; however, I prefer to use merge because it makes me think about duplicates in my data and checks that the duplicates (type of merge) I declare in the command are indeed true. With -joinby- it will make the merge regardless of the duplicate situation which has tripped me up previously. Also, merge's reporting/flagging of the types of conflicts between the datasets when using the update & replace options is very helpful. Finally, on a large-ish dataset I am working on, the -joinby..., unm(both)- command takes nearly twice as long than -merge 1:m- equivalent (482 seconds vs. 250 seconds).

Consider this example:

Code:

sysuse auto, clear sa masterdata, replace sysuse auto2, clear expand 2 //now there are duplicates for a 1:m merge replace make = "mismatch" in 1/3 sa usingdata, replace **using joinby u masterdata, clear joinby make using usingdata, unm(both) ta _merge ta make _merge **using merge 1:m u masterdata, clear merge 1:m make using usingdata, ta make _merge * duplicates in master now: sysuse auto, clear expand 2 sa masterdata, replace u masterdata, clear joinby make using usingdata, unm(both) //essentially an ill-advised many to many merge ta _merge ta make _merge

As Carlo mentioned, you'll get better feedback/guidance if you provide some details about the datasets you are joining together.

Eric A. Booth | Senior Director of Research | Far Harbor | Austin TX
3 likes
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#4

21 Apr 2016, 11:50

Adding slightly to Eric's exposition, you can use joinby to produce the same results as a merge 1:m, but the purpose of joinby is to produce what people think merge m:m is supposed to produce but disappointingly does not - all combinations with matching keys. When the master file has distinct keys, merge 1:m and joinby can produce the same results, with properly selected options as Eric discusses.

And since I've opened the merge m:m can of worms let me add the usual caveat given even by the Stata documentation

Use of merge m:m is not encouraged.

and on Statalist it is actively discouraged.
1 like
Comment
Saeed Sardar

Join Date: Apr 2016

Posts: 55
#5

21 Apr 2016, 21:51

Thank you all for your helpful replies.
Comment

Announcement

Joinby Vs 1:m merge

Comment

Comment

Comment

Comment