I'm debugging a many-to-many merge in someone else's code (I swear!), and I noticed that it was giving unreproducible results. Here's an example to illustrate:
I want to merge data2 with data1, to add the variable 'type' to data1. There are four observations in data2, but all have the same id (id=2), so we don't have a unique identifier.
There are two observations in data1 with id=2: one with type="y" and one with type="x". According to the manual:
So I'd expect this m:m merge to match "a" with "y", and "b", "c", and "d" with "x".
And if I run this merge many times, the count of type=="x" should be 3 every time.
However, when I run this code, I find that the count of type=="x" is either 3 or 1, split roughly in half. So half of the time it's matching "a" with "y", and b/c/d with "x"; and the other half it's the reverse.
What's going on?
I want to merge data2 with data1, to add the variable 'type' to data1. There are four observations in data2, but all have the same id (id=2), so we don't have a unique identifier.
There are two observations in data1 with id=2: one with type="y" and one with type="x". According to the manual:
Matching is performed by combining observations with equal values of varlist; within matching values, the first observation in the master dataset is matched with the first matching observation in the using dataset; the second, with the second; and so on. If there is an unequal number of observations within a group, then the last observation of the shorter group is used repeatedly to match with subsequent observations of the longer group.
And if I run this merge many times, the count of type=="x" should be 3 every time.
However, when I run this code, I find that the count of type=="x" is either 3 or 1, split roughly in half. So half of the time it's matching "a" with "y", and b/c/d with "x"; and the other half it's the reverse.
What's going on?
Code:
clear input long id str1 type 1 "x" 3 "y" 1 "y" 2 "y" 2 "x" end save data1, replace clear input long id str1 name 2 "a" 2 "b" 2 "c" 2 "d" end save data2, replace clear set seed 1 local N 500 matrix O = J(`N',1,.) forval i = 1/`N' { use data2, clear merge m:m id using data1, keepusing(type) keep(1 3) nogen count if type=="x" matrix O[`i',1] = r(N) } svmat O keep O1 tab O
Comment