Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • matchIt fuzzy match missing observations

    Dear Statalist,

    I am using matchIt to do fuzzy string matches. I notice something odd - matchIt seems not to do fuzzy match for all observations from the master file OR matchIt does not return matches for all observations from the master file. I am wondering why and how to have matchIt return results for all observations from the master file (it is possible that matchIt doesn't return observations that do not have match?)

    Here is an illustration of what I notice.

    My master file looks like:
    Code:
    clear
    input str22 Name float ID
    "3BridgesRd" 1       
    "64thSt." 2
    "64thStreet13kV" 3
    end
    I did the match using code:
    matchit ID Name using "FileForMatch.dta", idu(IDMatch) txtu(NameMatch)

    What I got after the match (and keeping the one with highest similarity) looks like:
    Code:
    clear
    input str22 (Name NameMatch) float(ID IDMatch)     
    "64thSt." "64thSt." 2  2
    "64thSt." "64thStreet13kV" 2 3
    "64thStreet13kV" "64thStreet13kV." 3 3 
    "64thStreet13kV" "64thSt."  3 2
    end
    My code doesn't return match for "3BridgesRd".

    Thanks.

  • #2
    Note that matchit by default omits all potential matches below the similarity score cutoff of 0.5 (which can be changed using the threshold() option) so perhaps there were no particularly good matches. So you could either lower the cutoff from the default, or experiment with different scoring options or similarity scoring methods.
    Last edited by William Lisowski; 07 Sep 2022, 13:56.

    Comment

    Working...
    X