When is the command -merge m:m- useful?

Adam Sadi

Join Date: Jul 2022

Posts: 68
#1

When is the command -merge m:m- useful?

14 Sep 2022, 04:27

Hello,

I have no particular problem regarding the merging, I was just asking this question for my own knowledge of Stata. I've always been told that merge m:m should NEVER be used, but I never really understood why. If there's no utility to many-to-many merging, why does it exist? Is there one particular case where this should be useful ?

Thanks for the expert -merge-rs who will broaden my knowledge!
Tags: None

1 like
Maarten Buis

Join Date: Mar 2014

Posts: 3456
#2

14 Sep 2022, 05:31

It was a mistake to include it. No one is perfect, not even Stata.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
1 like
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#3

14 Sep 2022, 06:13

Never.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#4

14 Sep 2022, 07:37

Since the original Stata merge command did not include the 1:1/1:m/m:1 argument, my guess is that it replicated the behavior of merging two datasets stored on sequential datasets such as magnetic tape. And when merge was rewritten in Stata 10, m:m was included for symmetry and to allow Stata to duplicate its previous behavior. But this is all a guess.

Stata now apparently regrets that decision, although not enough to change it. The following is copied word-for-word from the documentation of the merge command in the Stata Data Management Reference Manual PDF included in the Stata installation and accessible from Stata's Help menu.

m:m merges

m:m specifies a many-to-many merge and is a bad idea. In an m:m merge, observations are matched within equal values of the key variable(s), with the first observation being matched to the first; the second, to the second; and so on. If the master and using have an unequal number of observations within the group, then the last observation of the shorter group is used repeatedly to match with subsequent observations of the longer group. Thus m:m merges are dependent on the current sort order—something which should never happen.

Because m:m merges are such a bad idea, we are not going to show you an example. If you think that you need an m:m merge, then you probably need to work with your data so that you can use a 1:m or m:1 merge. Tips for this are given in Troubleshooting m:m merges below.

If you are thinking about using merge m:m, it is a near certainty that at least one of the following is true:

1. Your merge key actually does uniquely identify the observations in one of your data sets, so you can use merge 1:m or merge m:1 or maybe even merge 1:1.

2. You are failing to take account of one or more additional variables in your data set that, combined with the variables you are trying to use as the merge key, uniquely identify the observations in one or both of your data sets, so you can use merge 1:m or merge m:1 or merge 1:1 with the expanded merge key.

3. You are really trying to accomplish what joinby, a different command, does: create in the output dataset every possible combination of an observation from the first dataset and an observation from the second dataset, both having the same key. (SQL users take note! I fell for this myself: to Stata m-to-m does not mean m-by-m.)

4. You actually need to append your datasets rather than merge them.

5. The data sets you are trying to merge are incorrectly configured or contain data errors that need to be fixed.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#5

14 Sep 2022, 07:53

Adam:
after stepwise regression, is the second evidence that devil does exist !

Kind regards,
Carlo
(Stata 19.0)
2 likes
Comment
daniel klein

Join Date: Mar 2014

Posts: 3849
#6

14 Sep 2022, 09:00

Originally posted by William Lisowski View Post

And when merge was rewritten in Stata 10, m:m was included for symmetry and to allow Stata to duplicate its previous behavior. But this is all a guess.

The symmetry guess seems plausible; perhaps, they wanted all possible combinations. On the other points: merge was fundamentally rewritten in Stata 11; old behavior is technically replicated by passing old syntax thru to merge_10. See

Code:

viewsource merge.ado

and

Code:

help merge_10
1 like
Comment

Announcement

When is the command -merge m:m- useful?

Comment

Comment

Comment

Comment

Comment