update option to merge command: Why does it reduce the number of matches?

paulvonhippel

Join Date: Apr 2014

Posts: 502
#1

update option to merge command: Why does it reduce the number of matches?

07 Mar 2018, 06:01

I am merging a dataset with ~2.2 million students to a dataset with ~5,900 schools. I am merging on a school ID (called idschool) and the year in which data were collected.

If I run this command, I get 1.9 million matches:
merge m:1 year idschool using schools, nogenerate keep(match)

But if I run the same command with the update option --
merge m: 1 year idschool using schools, nogenerate keep(match) update
-- I only get ~1.7 million matches.

I don't get it. Why would the update option affect the number of matches?

Many thanks,
Paul
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#2

07 Mar 2018, 10:27

The problem is with your -keep(match)- option. When you use -update-, the observations that agree on the merge key variables (idschool and year) are no longer all given _merge == 3. That designation is used only for those that matched with no updating. Those options that resulted in an update get _merge == 4. When you specify -keep(match)- you are telling Stata to keep only those observations with _merge == 3. The _merge == 4 observations are therefore dropped. That is why you have fewer.

I think you don't really want the -keep(match)- option here. I think you really want -keep(match match_update)- instead.
1 like
Comment

Announcement