Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • merging in using data without missing values, _merge == 3 records end up with missing values

    Hi all,
    This happens on Stata 13.1 MP for Unix (May 2014 build). I am merging m:1 data with using data which has no missing values. The match variable is type long in both datasets. The resulting data has many missing in the variable that comes from the using data, and I checked and clearly _merge == 3 (matched) cases will have missing values sometimes (~25% of cases). The merge raised no errors nor warnings. What might I be overlooking here? The variable in question is of type byte. This was an MP-heavy operation (64-cores) but it is hard to imagine that only 25% of threads would have read in corrupted data. An int variable next to the byte one in the using data is merged in uncorrupted.
    Thanks,
    Laszlo

  • #2
    Also note that the same does not seem to happen if I merge in the same using data onto other builds of essentially the same data (e.g. just the identifier in a separate dataset), using the same identifiers.

    Comment


    • #3
      Is the variable that ends up with missing values one that has the same name as an existing variable in the master dataset? If so, you probably want to rename the variable in the master or use the update or replace option.

      Comment


      • #4
        Thanks, Sarah, this is not that case. The only options I use are keep(1 3) and nogen, thus Stata would (should) fail if the variable were already in master. But it isn't. Thanks!

        Comment


        • #5
          since you are not using either the update or replace options, anything that is missing in the master data will stay missing in the merged data (your comment on none missing in the using data is irrelevant; so, do you have variables with the same name in each data set (variables other than the variables you are merging on)?

          Comment


          • #6
            No, the variable exists only in using, sorry I wasn't clear enough.

            Even more strangely, if I rearrange when I merge in this data in as I am building a large dataset and run it on a single core, an earlier merge does result in no missing values from the using data, correctly. I'll submit a bug report about this, but for now this is solved. And I don't see why this would occur normally.

            Comment

            Working...
            X