Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • merge on one variable and AT LEAST one other variable

    Hi there
    I am using the merge command for matching between two datasets.
    A successful pair needs to match on var1 and at least one other variable (var2, var3 or var4).
    Ideally I would then get a report on which combination of variables allowed the pairs to be successfully matched (e.g. var1+var2, var1+var4, var1+var3+var4, etc).
    Is there a way to do this without breaking my datasets into multiple different pieces and running multiple merge commands?
    Many thanks for any advice.

  • #2
    Can you tell us if var1 uniquely identifies the observations in either of your two datasets?

    Comment


    • #3
      Thank you.
      No, var1 does not uniquely identify the observations in either of my two datasets.
      This first step is to find potential matches. A subsequent step will determine the "best" match based on a scoring system.
      With thanks again, your replies are greatly appreciated!

      Comment


      • #4
        Code:
        use first
        rename (var2 var3 var4) (=_first)
        joinby var1 using second
        Now, suppose there are three observations in first with var1==42 and two observations in second with var1==42. The result of joinby will have 6 observations with var1==42 - all the combinations of one observation of the three in first with one observation of the two in second.

        Now you need to narrow this down to just those with a match on at least one of var2, var3, or var4.
        Code:
        keep if var2_first==var2 | var3_first==var3 | var4_first==var4
        And now you can do your scoring.

        Comment


        • #5
          I see. That's really helpful, thanks very much indeed.

          Comment

          Working...
          X