Merge update, replace

Senor Edward

Join Date: Feb 2016

Posts: 29
#1

Merge update, replace

11 Sep 2017, 10:13

Hi,
I'm performing a merge of two datasets using the update option. I get the warning that there are 100 subjects who have a non missing conflict as described in the manual (https://www.stata.com/manuals13/dmerge.pdf).

merge 1:1 subject_Id using StudyB, update

I would like to somehow know which variables and subjects have the non missing conflicts. How can I perform this?
Tags: None
Rich Goldstein

Join Date: Mar 2014

Posts: 4466
#2

11 Sep 2017, 11:38

when Stata gives you back the result of the merge it is also giving you a new variable (by default name "_merge") and a table; values of this variable equal to 5 the values you want to investigate
Comment
Senor Edward

Join Date: Feb 2016

Posts: 29
#3

11 Sep 2017, 11:56

That is true..
However, my master dataset has close to 100 variables and the using dataset has another 100.
So, you are correct that I will know which subjects fall under this category, but I don't know which variables of that subject cause it.

If it were a small amount of variables, I would easily be able to eyeball it and find the differences. However there are too many variables for me to try to eyeball.
That is what I'm trying to figure out.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#4

11 Sep 2017, 12:13

This kind of problem often arises in my workflow. The way I usually handle it is to go back and -append- the two data sets instead of -merge-ing them. Then I do -duplicates drop-, which eliminates anything that would have been left completely unchanged in the merge. Then I would sort the data on the merge key (subject_id in your case). I would order the variables common to the two data sets first. This way each subject's records in both data sets are lined up right below each other and I can get a grasp of what is going on. To drill down, I then loop over the common variables and check for conflicts. So something like this:

Code:

use dataset1, clear ds local vars1 `r(varlist)' des using dataset2, varlist local vars2 `r(varlist)' local common: list vars1 & vars2 local exclude subject_id local common: list common - exclude gen byte source = 1 append using dataset2 replace source = 2 if missing(source) order subject_id source `common', first foreach c of local common { by subject_id (`c'), sort: egen byte conflict_`c' = max(`c' != `c'[1] & !missing(`c')) }

You can then identify records with conflict on any particular variable v by running -browse if conflict_v-. Note that the code above is tailored to numeric variables. For a string variable conflict_`c' would be defined as max(`c' != `c'[_N] & !missing(`c')).
2 likes
Comment
Maria Boutchkova

Join Date: May 2016

Posts: 103
#5

19 Mar 2019, 12:05

Brilliant! Thank you, Clyde!
Comment

Announcement

Merge update, replace

Comment

Comment

Comment

Comment