Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to do case insensitive matching of strings

    I am using the command reclink to match company names in two different datasets.

    For this purpose, "Apple" and "APPLE" are the same. "Ltd" and "ltd" are the same.

    But reclink considers them different.

    Using "proper" command to turn "APPLE" to "Apple" does not help completely, because there are actual cases of company names containing consecutive capital letters (ABC). Those cases should not be turned into Abc.

    Can I force reclink to become case insensitive?

    Or is there any other command that does what reclink does but is case insensitive?

    Here is a do file that can test case sensitivity of reclink.

    Code:
    clear
    
    set obs 9
    gen company_name = "Apple"
    gen company_number=_n
    replace company_name = "MICROSOFT" if _n==2 | _n==9
    replace company_name = "Facebook" if _n==1 | _n==4
    save "usingdata",replace
    
    
    clear
    set obs 10
    gen company_name = "Apple"
    gen idmaster=_n
    replace company_name = "Microsoft" if _n==3 | _n==6
    
    
    reclink company_name using "usingdata", idmaster(idmaster) idusing(company_number)  gen(match_score)

  • #2
    There is nothing you can do to make -reclink- case insensitive. What you can do instead is, in each of the data sets, create a new variable which is company_name in all lower case. Have reclink match on those lower case variables. This will give you case insensitive linkage between the data sets, but you will still have the original capitalization in the original variables.

    Comment

    Working...
    X