strange behaviour of -merge- with option -keep- when option -assert- fails

Hemanshu Kumar

Join Date: Mar 2015

Posts: 1411
#1

strange behaviour of -merge- with option -keep- when option -assert- fails

26 Jun 2023, 19:11

Consider this toy situation:

Code:

clear input byte(id num) 1 10 2 20 3 30 4 40 end tempfile using save `using' clear input byte(id numnum) 1 1 2 2 5 5 end

Now if I run

Code:

merge 1:1 id using `using', assert(1 3) keep(3)

where the assertion of course fails, the resulting dataset looks like this:

Code:

. list +-------------------------------------+ | id numnum num _merge | |-------------------------------------| 1. | 1 1 10 matched (3) | 2. | 2 2 20 matched (3) | 3. | 5 5 . master only (1) | +-------------------------------------+

That is, it has not obeyed my desire to keep only the _merge == 3 observations, but it has chosen to drop the _merge == 2 observations while retaining the _merge == 1 observation.

On the other hand, if I had done just

Code:

merge 1:1 id using `using', assert(1 3)

then we get

Code:

. list +-------------------------------------+ | id numnum num _merge | |-------------------------------------| 1. | 1 1 10 matched (3) | 2. | 2 2 20 matched (3) | 3. | 5 5 . master only (1) | 4. | 3 . 30 using only (2) | 5. | 4 . 40 using only (2) | +-------------------------------------+

i.e. Stata keeps all observations, matched or unmatched in any direction.

This is not making sense to me. Either the presence of the -keep- option should have made Stata only keep the desired result, or it should keep all observations (as it does when -keep- is not specified).

I am on Stata 18/MP, but I have tested this on the most updated version of Stata 16/MP, and I see the same behaviour. I don't recall ever encountering this before, so I don't know if this is the result of some relatively recent update to -merge-.
Tags: None
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2403
#2

26 Jun 2023, 20:23

I don't have a good answer. From the manual, I would agree that assert is evaluated first, and then any filtering by keep is applied.

assert() and keep() are convenience options whose functionality can be duplicated using _merge directly.

. merge ..., assert(match master) keep(match)

is identical to

. merge ...
. assert _merge==1 | _merge==3
. keep if _merge==3

I suppose one way to look at this is that Stata's error message is taken to mean that, while it leaves the merge result in memory, it is not guaranteed to conform to any request of the command.
Comment
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1411
#3

26 Jun 2023, 20:36

I would argue that Stata's current behaviour is both unexpected and unhelpful.

Code:

. merge 1:1 id using `using', assert(1 3) keep(3) after merge, not all observations from master or matched (merged result left in memory) r(9);

Notice that the assertion failed because there are some "using only" (_merge == 2) observations. Stata's behaviour is unexpected, because the "merged result" has not been actually left in memory -- a part of it has disappeared. Indeed, this is the part you are likeliest to want to inspect whilst troubleshooting -- the errant _merge == 2 observations. Ergo, unhelpful. Stata should simply retain all observations.

Last edited by Hemanshu Kumar; 26 Jun 2023, 20:42.
Comment
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1411
#4

26 Jun 2023, 20:41

assert() and keep() are convenience options whose functionality can be duplicated using _merge directly.

. merge ..., assert(match master) keep(match)

is identical to

. merge ...
. assert _merge==1 | _merge==3
. keep if _merge==3

Thanks for this quote, Leonardo. Indeed, this raises the further point that the behaviour is inconsistent with the manual. The two sets of commands are not identical. If I follow the second path, the code will break on the assert step, at which point I will have a dataset with all the observations, which will help me troubleshoot the failed assertion.
1 like
Comment

Nils Enevoldsen

Join Date: Oct 2014
Posts: 296

27 Jun 2023, 07:57

To add some context, Stata used to evalute the keep() before the assert().

Code:

. version
version 14.2

. list

     +---------------------------------+
     | id   numnum   num        _merge |
     |---------------------------------|
  1. |  1        1    10   matched (3) |
  2. |  2        2    20   matched (3) |
     +---------------------------------+

This was fixed with 15.1 (help whatsnew15).

Code:

    26. merge with options keep() and assert() did not always verify the required match results before keeping the
        requested observations.  This could result in merge not reporting an error when it should have.  This has
        been fixed.

That said, I agree that the current behavior is unexpected and unhelpful, and would escalate this to tech support. They might claim that this is "undefined behavior", but I think you have a good argument that at the very least the error message and the manual are inconsistent with the behavior.

Comment

Hemanshu Kumar

Join Date: Mar 2015

Posts: 1411
#6

27 Jun 2023, 09:23

Thanks for that info, Nils. I have sent off an email to Tech Support, pointing them to this thread.
Comment
Nils Enevoldsen

Join Date: Oct 2014

Posts: 296
#7

24 Sep 2023, 11:20

To close the loop on this, Stata 17 update 29aug2023 contains:

3. merge with options keep() and assert(), when the results of the merge failed to match option assert(), would not leave unmatched data from the using dataset in memory for inspection after the failed merge. This has been fixed.
1 like
Comment

Announcement

strange behaviour of -merge- with option -keep- when option -assert- fails

Comment

Comment

Comment

Comment

Comment

Comment