Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • update option to merge command: Why does it reduce the number of matches?

    I am merging a dataset with ~2.2 million students to a dataset with ~5,900 schools. I am merging on a school ID (called idschool) and the year in which data were collected.

    If I run this command, I get 1.9 million matches:
    merge m:1 year idschool using schools, nogenerate keep(match)

    But if I run the same command with the update option --
    merge m: 1 year idschool using schools, nogenerate keep(match) update
    -- I only get ~1.7 million matches.

    I don't get it. Why would the update option affect the number of matches?

    Many thanks,
    Paul

  • #2
    The problem is with your -keep(match)- option. When you use -update-, the observations that agree on the merge key variables (idschool and year) are no longer all given _merge == 3. That designation is used only for those that matched with no updating. Those options that resulted in an update get _merge == 4. When you specify -keep(match)- you are telling Stata to keep only those observations with _merge == 3. The _merge == 4 observations are therefore dropped. That is why you have fewer.

    I think you don't really want the -keep(match)- option here. I think you really want -keep(match match_update)- instead.

    Comment

    Working...
    X