I would like to do a 1:1 merge using two lists of 80 schools. There are about 10 variables to merge on, all numeric, such as number of students, % of students who are black, etc. The merge variables do not match perfectly, so it is a fuzzy merge problem.
One possible solution is find the merge that, across matched pairs, minimizes the sum of the Mahalanobis distances between the merging variables. Is there a Stata command that implements this or something similar?
The Stata commands that I know for fuzzy merging are designed for different problems and would not work for mine (I think):
-matchit- and -reclink- merge on strings, but I want to match on numeric variables.
-nearmrg- and -rangejoin- merge on a numeric variable, but only one. I want to merge on several numeric variables.
Note that close numeric matches are not necessarily close string matches, or vice versa. For example 59 and 60 are similar numbers but not similar strings.
Many thanks if you can alert me to a command that I have not found yet.
One possible solution is find the merge that, across matched pairs, minimizes the sum of the Mahalanobis distances between the merging variables. Is there a Stata command that implements this or something similar?
The Stata commands that I know for fuzzy merging are designed for different problems and would not work for mine (I think):
-matchit- and -reclink- merge on strings, but I want to match on numeric variables.
-nearmrg- and -rangejoin- merge on a numeric variable, but only one. I want to merge on several numeric variables.
Note that close numeric matches are not necessarily close string matches, or vice versa. For example 59 and 60 are similar numbers but not similar strings.
Many thanks if you can alert me to a command that I have not found yet.
Comment