Dear all,
I am trying to compare a string variable with several others for similarity:
The goal is to compare variable "investor_name" to the company names listed in variables firm1 – firm3. If the string of "investor_name" is a match with one of the others, then the investor name is correct.
As you can see, a difficulty is that the string are not always an identical match, e.g. Blue Ocean Partners LLC vs. Blue Ocean.
One approach I thought of was to run matchit 3 times and then select the one with highest similarity score. Do you have any other suggestion? Many thanks in advance!
I am trying to compare a string variable with several others for similarity:
The goal is to compare variable "investor_name" to the company names listed in variables firm1 – firm3. If the string of "investor_name" is a match with one of the others, then the investor name is correct.
As you can see, a difficulty is that the string are not always an identical match, e.g. Blue Ocean Partners LLC vs. Blue Ocean.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input byte person_id str32 investor_name str13 firm1 str19 firm2 str6 firm3 1 "Blue Ocean Partners LLC" "Blue Ocean" "Goldman Sachs" "" 2 "Goldman Sachs" "Goldman" "Breakthrough Energy" "" 3 "JP Morgan" "Deutsche Bank" "" "" 4 "Kleiner Perkins Caufield & Byers" "" "Kleiner Perkins" "Google" end
One approach I thought of was to run matchit 3 times and then select the one with highest similarity score. Do you have any other suggestion? Many thanks in advance!

). Plus, then you've to determine whether "Blue Ocean Partners LLC" is the same investor as "Blue Ocean."
Comment