Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • When is the command -merge m:m- useful?

    Hello,

    I have no particular problem regarding the merging, I was just asking this question for my own knowledge of Stata. I've always been told that merge m:m should NEVER be used, but I never really understood why. If there's no utility to many-to-many merging, why does it exist? Is there one particular case where this should be useful ?

    Thanks for the expert -merge-rs who will broaden my knowledge!

  • #2
    It was a mistake to include it. No one is perfect, not even Stata.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Never.

      Comment


      • #4
        Since the original Stata merge command did not include the 1:1/1:m/m:1 argument, my guess is that it replicated the behavior of merging two datasets stored on sequential datasets such as magnetic tape. And when merge was rewritten in Stata 10, m:m was included for symmetry and to allow Stata to duplicate its previous behavior. But this is all a guess.

        Stata now apparently regrets that decision, although not enough to change it. The following is copied word-for-word from the documentation of the merge command in the Stata Data Management Reference Manual PDF included in the Stata installation and accessible from Stata's Help menu.

        m:m merges

        m:m specifies a many-to-many merge and is a bad idea. In an m:m merge, observations are matched within equal values of the key variable(s), with the first observation being matched to the first; the second, to the second; and so on. If the master and using have an unequal number of observations within the group, then the last observation of the shorter group is used repeatedly to match with subsequent observations of the longer group. Thus m:m merges are dependent on the current sort order—something which should never happen.

        Because m:m merges are such a bad idea, we are not going to show you an example. If you think that you need an m:m merge, then you probably need to work with your data so that you can use a 1:m or m:1 merge. Tips for this are given in Troubleshooting m:m merges below.
        If you are thinking about using merge m:m, it is a near certainty that at least one of the following is true:

        1. Your merge key actually does uniquely identify the observations in one of your data sets, so you can use merge 1:m or merge m:1 or maybe even merge 1:1.

        2. You are failing to take account of one or more additional variables in your data set that, combined with the variables you are trying to use as the merge key, uniquely identify the observations in one or both of your data sets, so you can use merge 1:m or merge m:1 or merge 1:1 with the expanded merge key.

        3. You are really trying to accomplish what joinby, a different command, does: create in the output dataset every possible combination of an observation from the first dataset and an observation from the second dataset, both having the same key. (SQL users take note! I fell for this myself: to Stata m-to-m does not mean m-by-m.)

        4. You actually need to append your datasets rather than merge them.

        5. The data sets you are trying to merge are incorrectly configured or contain data errors that need to be fixed.

        Comment


        • #5
          Adam:
          after stepwise regression, is the second evidence that devil does exist !
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Originally posted by William Lisowski View Post
            And when merge was rewritten in Stata 10, m:m was included for symmetry and to allow Stata to duplicate its previous behavior. But this is all a guess.
            The symmetry guess seems plausible; perhaps, they wanted all possible combinations. On the other points: merge was fundamentally rewritten in Stata 11; old behavior is technically replicated by passing old syntax thru to merge_10. See

            Code:
            viewsource merge.ado
            and

            Code:
            help merge_10

            Comment

            Working...
            X