Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bug in -merge- when using assert(match master) keep(match)?

    Hello, I often using the assert option of -merge- to verify that there are no unmatched observations or that the unmatched observations come only from a particular dataset. Today I noticed when using assert(match master) keep(match) that there were unmatched observations in the using dataset but that the assertion was not failing. Below is code that reproduces this problem:
    Code:
    clear
    set obs 10
    gen id = _n
    tempfile touse
    save `touse', replace
    drop in 1
    merge 1:1 id using `touse', assert(match master) keep(match)
    The drop in 1 on the second to last line means that the using dataset has one more observation than the master dataset. Therefore, -merge- should return an error, as there is an unmatched observation in the using dataset. Indeed, if you delete the assert and keep options, -merge- reports that there were 9 matched observations and 1 unmatched observation from the using data.

    The -merge- help file implies that merge should return an error in this situation when it says, "Using assert(match master) specifies that the merged file is required to include only matched master or using observations and unmatched master observations, and may not include unmatched using observations."

    I have noticed that this problem does not seem to appear when the largest id value in the using dataset is greater than the largest value in the master dataset. However, when the largest id value in the master dataset is greater than or equal to the largest id in the using dataset, -merge- correctly returns an error.

    I am using Windows 10 and running 64-bit Stata/MP 15.

    Do I have a faulty understanding of what -merge- is supposed to do? Or is this a bug?

  • #2
    I think the documentation is confusing. It does say what you quote. But further down it says:

    assert() and keep() are convenience options whose functionality can be duplicated using _merge directly.

    . merge ..., assert(match master) keep(match)

    is identical to

    . merge ...
    . assert _merge==1 | _merge==3
    . keep if _merge==3
    There it is made clear that the -assert()- check is applied before the -keep()- option is. Stata's behavior is actually consistent with this fuller explanation.

    I think the use of the term "merged result" in the part of the documentation you quote is the source of the confusion. You have interpreted it to mean "result of merge after keep() is applied", whereas StataCorp intends to mean "result of merge before applying keep()-."

    It's confusing, and I get tripped up by this often. For this reason, I tend not to use the -assert()- and -keep()- options together, preferring to use separate -assert- and -keep- commands in the order that I want them. (But -assert()- without -keep()-, or vice versa, is never problematic.)

    Comment


    • #3
      On my Mac, the example does throw an error on my computer (Stata 15.1). The error message is a bit convoluted but indicates that your expectations were not valid AND that the requested outcome (taking into account the keep(match)) were left in memory:

      Code:
      . dis c(stata_version)
      15.1
      
      . clear
      
      . set obs 10
      number of observations (_N) was 0, now 10
      
      . gen id = _n
      
      . tempfile touse
      
      . save `touse', replace
      (note: file /var/folders/cp/z8cssshn6935x9p181c71_7m0000gn/T//S_04860.000008 not found)
      file /var/folders/cp/z8cssshn6935x9p181c71_7m0000gn/T//S_04860.000008 saved
      
      . drop in 1
      (1 observation deleted)
      
      . merge 1:1 id using `touse', assert(match master) keep(match)
      merge:  after merge, not all observations from master or matched
              (merged result left in memory)
      r(9);
      
      .

      Comment


      • #4
        Hi Robert and Clyde, thank you for the responses.

        Clyde, I think we are not understanding each other properly. I agree with you that the -assert()- check is applied before -keep()-. This is what I have understood -merge- to do all along. Therefore, when running the code I posted, I expect the -assert(master match)- to first check that all observations either matched or were in the master dataset before keeping only the matched observations. However, the -assert()- check isn't working properly. Even though there is an unmatched observation in the using data before -keep(match)- is applied, Stata does not throw an error, even though it should.

        Robert, thank you for posting your code. You machine shows different output than my machine. Here is what I see:

        Code:
        . di c(stata_version)
        15
        
        . clear
        
        . set obs 10
        number of observations (_N) was 0, now 10
        
        . gen id = _n
        
        . tempfile touse
        
        . save `touse', replace
        (note: file C:\Users\Peter\AppData\Local\Temp\ST_2bfc_00001g.tmp not found)
        file C:\Users\Peter\AppData\Local\Temp\ST_2bfc_00001g.tmp saved
        
        . drop in 1
        (1 observation deleted)
        
        . merge 1:1 id using `touse', assert(match master) keep(match)
        
            Result                           # of obs.
            -----------------------------------------
            not matched                             0
            matched                                 9  (_merge==3)
            -----------------------------------------
        I could try updating to 15.1 to see if that fixes the problem.

        Comment


        • #5
          Peter, sorry. Yes, you are right--I misunderstood your original post. The example you show contradicts the full explanation in the manual.

          Comment


          • #6
            Thanks Clyde. It seems that this bug was fixed in 15.1. I updated and now I get the same output as Robert. The help page "help whatsnew" confirms that this bug was fixed:
            merge with options keep() and assert() did not always verify the required match results before keeping the requested observations. This could result in merge not reporting an error when it should have. This has been fixed.

            Comment


            • #7
              This was indeed a bug then. Good for you that you spotted it and great that it has been fixed.

              Comment

              Working...
              X