Hello, I often using the assert option of -merge- to verify that there are no unmatched observations or that the unmatched observations come only from a particular dataset. Today I noticed when using assert(match master) keep(match) that there were unmatched observations in the using dataset but that the assertion was not failing. Below is code that reproduces this problem:
The drop in 1 on the second to last line means that the using dataset has one more observation than the master dataset. Therefore, -merge- should return an error, as there is an unmatched observation in the using dataset. Indeed, if you delete the assert and keep options, -merge- reports that there were 9 matched observations and 1 unmatched observation from the using data.
The -merge- help file implies that merge should return an error in this situation when it says, "Using assert(match master) specifies that the merged file is required to include only matched master or using observations and unmatched master observations, and may not include unmatched using observations."
I have noticed that this problem does not seem to appear when the largest id value in the using dataset is greater than the largest value in the master dataset. However, when the largest id value in the master dataset is greater than or equal to the largest id in the using dataset, -merge- correctly returns an error.
I am using Windows 10 and running 64-bit Stata/MP 15.
Do I have a faulty understanding of what -merge- is supposed to do? Or is this a bug?
Code:
clear set obs 10 gen id = _n tempfile touse save `touse', replace drop in 1 merge 1:1 id using `touse', assert(match master) keep(match)
The -merge- help file implies that merge should return an error in this situation when it says, "Using assert(match master) specifies that the merged file is required to include only matched master or using observations and unmatched master observations, and may not include unmatched using observations."
I have noticed that this problem does not seem to appear when the largest id value in the using dataset is greater than the largest value in the master dataset. However, when the largest id value in the master dataset is greater than or equal to the largest id in the using dataset, -merge- correctly returns an error.
I am using Windows 10 and running 64-bit Stata/MP 15.
Do I have a faulty understanding of what -merge- is supposed to do? Or is this a bug?
Comment