Dear Statalist,
Often times I have to perform the merge between a 10% subsample of my master file and the master file. We tend to verify 10% of our master data as a way to check whether our data entry were done accurately. As a result, I often have to merge the 10% sample back to the master file if anything were changed/updated in the 10% sample file. I used the update and replace option for the merge command to do this conveniently. Although Stata will produce a nice output of how many case have missing or non-missing values updated (see example below), I was wondering if there is any way that Stata can tell me which case (this can be done by listing the studyid for those who had _merge==5 or _merge==4, but if there are two changes made to a single case then stata will only tell you that changes made to that case but not how many changes were made to the case) and what value were updated (this is the part I needed the most help). What I did previously was creating two datasets (one with updated data and the other was the master file) and added an extension to all variable names in the subsample dataset and merge them all together. By comparing the same set of variables using the foreach command and list with "if" option, I was able to tell which case has what values updated however this is somehow time consuming. I am wondering if there is any other convenient way of doing this in Stata. Any advice/feedback on this would be greatly appreciated.
Thanks
Sam
Result # of obs.
-----------------------------------------
not matched 294
from master 294 (_merge==1)
from using 0 (_merge==2)
matched 33
not updated 26 (_merge==3)
missing updated 0 (_merge==4)
nonmissing conflict 7 (_merge==5)
-----------------------------------------
Often times I have to perform the merge between a 10% subsample of my master file and the master file. We tend to verify 10% of our master data as a way to check whether our data entry were done accurately. As a result, I often have to merge the 10% sample back to the master file if anything were changed/updated in the 10% sample file. I used the update and replace option for the merge command to do this conveniently. Although Stata will produce a nice output of how many case have missing or non-missing values updated (see example below), I was wondering if there is any way that Stata can tell me which case (this can be done by listing the studyid for those who had _merge==5 or _merge==4, but if there are two changes made to a single case then stata will only tell you that changes made to that case but not how many changes were made to the case) and what value were updated (this is the part I needed the most help). What I did previously was creating two datasets (one with updated data and the other was the master file) and added an extension to all variable names in the subsample dataset and merge them all together. By comparing the same set of variables using the foreach command and list with "if" option, I was able to tell which case has what values updated however this is somehow time consuming. I am wondering if there is any other convenient way of doing this in Stata. Any advice/feedback on this would be greatly appreciated.
Thanks
Sam
Result # of obs.
-----------------------------------------
not matched 294
from master 294 (_merge==1)
from using 0 (_merge==2)
matched 33
not updated 26 (_merge==3)
missing updated 0 (_merge==4)
nonmissing conflict 7 (_merge==5)
-----------------------------------------
Comment