Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Joinby Vs 1:m merge

    I find joinby and one to many merge similar in funcition, can someone help in highlighting the difference between two. Which one to use under what circumstances?

  • #2
    Saeed:
    your question is admittedly too broad to get halpful reply.
    Please take a look at -merge- and -joinby- entries in Stata .pdf manual and get the differences yourself.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      You can produce the same results with these 2 packages, for example, using merge 1:m if you use joinby ..., unmatched(both) ; however, I prefer to use merge because it makes me think about duplicates in my data and checks that the duplicates (type of merge) I declare in the command are indeed true. With -joinby- it will make the merge regardless of the duplicate situation which has tripped me up previously. Also, merge's reporting/flagging of the types of conflicts between the datasets when using the update & replace options is very helpful. Finally, on a large-ish dataset I am working on, the -joinby..., unm(both)- command takes nearly twice as long than -merge 1:m- equivalent (482 seconds vs. 250 seconds).


      Consider this example:
      Code:
      sysuse auto, clear
      sa masterdata, replace
      sysuse auto2, clear
          expand 2 //now there are duplicates for a 1:m merge
          replace make = "mismatch" in 1/3
      sa usingdata, replace
      
      
      **using joinby
      u masterdata, clear
          joinby make using usingdata, unm(both)
          ta _merge
          ta make _merge
          
        **using merge 1:m
      u masterdata, clear
          merge 1:m make using usingdata, 
          ta make _merge 
          
          
      * duplicates in master now:
      sysuse auto, clear
      expand 2
      sa masterdata, replace
      u masterdata, clear
          joinby make using usingdata, unm(both) //essentially an ill-advised many to many merge
          ta _merge
          ta make _merge
      As Carlo mentioned, you'll get better feedback/guidance if you provide some details about the datasets you are joining together.
      Eric A. Booth | Senior Director of Research | Far Harbor | Austin TX

      Comment


      • #4
        Adding slightly to Eric's exposition, you can use joinby to produce the same results as a merge 1:m, but the purpose of joinby is to produce what people think merge m:m is supposed to produce but disappointingly does not - all combinations with matching keys. When the master file has distinct keys, merge 1:m and joinby can produce the same results, with properly selected options as Eric discusses.

        And since I've opened the merge m:m can of worms let me add the usual caveat given even by the Stata documentation
        Use of merge m:m is not encouraged.
        and on Statalist it is actively discouraged.

        Comment


        • #5
          Thank you all for your helpful replies.

          Comment

          Working...
          X