Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging two files on multiple criteria and imperfect matching?

    Hi, I have two data files I want to merge. The issue is I don't have an ID column shared by both files. Instead, I want to merge based off other criteria like matching email addresses.
    How could I merge with the following rules:
    if two email addresses match, the records match.
    Otherwise, if the name and date of birth match, it's a match.

    Also, how could I account for differences in names such as "David" and "Dave"?
    Would the merge command work in this situation?

  • #2
    There are various community-contributed programs for this purpose. One that comes to mind is -reclink- (ssc describe reclink).

    Comment


    • #3
      The -merge- command could be called multiple times to handle different scenarios of exact matches. Other user-written Stata programs to consider are
      -reclink2-, -matchit-, and -dtalink-. If the datasets are large (say, over 100,000 observations) I usually prefer the R package -fastLink-, which you can call from Stata. How large are the two datasets? How important to you are missed matches versus false matches?

      Comment


      • #4
        On this conversation, I have seen different descriptions comparing -matchit- and -reclink2-. I saw one presentation on -dtalink-.
        Some of the materials are pretty complex, but would you be able to give even a brief overview on when you would use one of these vs the other and which is better for what type of tasks?
        I know if was mentioned regarding large observations using an R package, but if I am better at working with Stata, using large data sets, (~500,000 observations), would it still be ok to use one of these for similar fuzzy matching work?

        Thanks!

        Comment

        Working...
        X