I have two datasets containing identifiers of some of the same individuals.
df1 is more complete and has three variables: (1) the name of the individual, (2) a regional registration id, (3) the region code where the id was registered.
df2 only informs (1) and (2) (i.e., name and regional registration id, but not the region code).
Individuals are uniquely identified by region code and regional registration id. Different regions may inform the same id number. Besides, the same individual always has the same regional registration id across the two datasets, but may not have the exact same spelling of their name across these datasets.
I would like to add the region code to df2 by matching each one of its rows to df1 using as key variables the exact regional registration id as well as the closest name between the datasets (the one that has the most similar spelling).
Consider the simple example below:
My desired output is:
+------------------------------------+
| name region~d region~e |
|------------------------------------|
1. | flaviasilva 1 A |
2. | filipesantos 2 B |
+------------------------------------+
df1 is more complete and has three variables: (1) the name of the individual, (2) a regional registration id, (3) the region code where the id was registered.
df2 only informs (1) and (2) (i.e., name and regional registration id, but not the region code).
Individuals are uniquely identified by region code and regional registration id. Different regions may inform the same id number. Besides, the same individual always has the same regional registration id across the two datasets, but may not have the exact same spelling of their name across these datasets.
I would like to add the region code to df2 by matching each one of its rows to df1 using as key variables the exact regional registration id as well as the closest name between the datasets (the one that has the most similar spelling).
Consider the simple example below:
Code:
clear input str30 name regional_registration_id str1 region_code "flaviasillva" 1 "A" "rodrigomaia" 1 "B" "mariaconceicao" 1 , "C" "felipesantos" 2 "B" "mateussauro" 2 "D" "flaviasilva" 3 "F" end save "df1", replace clear input str30 name regional_registration_id str1 region_code "flaviasilva" 1 "A" "filipesantos" 2 "B" end save "df2", replace
+------------------------------------+
| name region~d region~e |
|------------------------------------|
1. | flaviasilva 1 A |
2. | filipesantos 2 B |
+------------------------------------+

Comment