merging in using data without missing values, _merge == 3 records end up with missing values

László Sándor

Join Date: Apr 2014

Posts: 120
#1

merging in using data without missing values, _merge == 3 records end up with missing values

31 Aug 2014, 18:06

Hi all,
This happens on Stata 13.1 MP for Unix (May 2014 build). I am merging m:1 data with using data which has no missing values. The match variable is type long in both datasets. The resulting data has many missing in the variable that comes from the using data, and I checked and clearly _merge == 3 (matched) cases will have missing values sometimes (~25% of cases). The merge raised no errors nor warnings. What might I be overlooking here? The variable in question is of type byte. This was an MP-heavy operation (64-cores) but it is hard to imagine that only 25% of threads would have read in corrupted data. An int variable next to the byte one in the using data is merged in uncorrupted.
Thanks,
Laszlo
Tags: None
László Sándor

Join Date: Apr 2014

Posts: 120
#2

31 Aug 2014, 18:45

Also note that the same does not seem to happen if I merge in the same using data onto other builds of essentially the same data (e.g. just the identifier in a separate dataset), using the same identifiers.
Comment
Sarah Edgington

Join Date: Apr 2014

Posts: 284
#3

31 Aug 2014, 21:18

Is the variable that ends up with missing values one that has the same name as an existing variable in the master dataset? If so, you probably want to rename the variable in the master or use the update or replace option.
Comment
László Sándor

Join Date: Apr 2014

Posts: 120
#4

01 Sep 2014, 06:11

Thanks, Sarah, this is not that case. The only options I use are keep(1 3) and nogen, thus Stata would (should) fail if the variable were already in master. But it isn't. Thanks!
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4466
#5

01 Sep 2014, 06:43

since you are not using either the update or replace options, anything that is missing in the master data will stay missing in the merged data (your comment on none missing in the using data is irrelevant; so, do you have variables with the same name in each data set (variables other than the variables you are merging on)?
Comment
László Sándor

Join Date: Apr 2014

Posts: 120
#6

01 Sep 2014, 07:11

No, the variable exists only in using, sorry I wasn't clear enough.

Even more strangely, if I rearrange when I merge in this data in as I am building a large dataset and run it on a single core, an earlier merge does result in no missing values from the using data, correctly. I'll submit a bug report about this, but for now this is solved. And I don't see why this would occur normally.
Comment

Announcement

merging in using data without missing values, _merge == 3 records end up with missing values

Comment

Comment

Comment

Comment

Comment