Hi all,
I have a set of 300,000 company names that I would like to standardize into the most common names. I know some of the most common company names, and have been able to code the new standardized name variable with thier names (using regexm), but am not sure what other names are common in the set of 300,000.
Is there a command (user written or not) that examines strings for common sub-strings, similar to what a Word Cloud would do?
Thanks, in advance,
Ben Hoen
LBNL
I have a set of 300,000 company names that I would like to standardize into the most common names. I know some of the most common company names, and have been able to code the new standardized name variable with thier names (using regexm), but am not sure what other names are common in the set of 300,000.
Is there a command (user written or not) that examines strings for common sub-strings, similar to what a Word Cloud would do?
Thanks, in advance,
Ben Hoen
LBNL
Comment