Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to Identify common elements (words) between two string variables

    Dear all,

    I have a question about how to identify commonn elements (words) between two string variables. My variable of interest is firm names as the followings.

    var1
    "AKER CLEAN CARBON AS"
    "AKER CLEAN CARBON AS"
    "AKER KVAERNER SUBSEA"
    "KVAERNER ASA"

    var2
    "AKER CLEAN CARBON"
    "BADE OTTO MORTEN"
    "AKER KVAERNER SUBSEA"
    "KVAERNER ASA, LYSAKER"

    As you see, var1 and var2 share some common parts except the second one. I'd like to make var3 like this.

    var3
    "AKER CLEAN CARBON"
    " "
    "AKER KVAERNER SUBSEA"
    "KVAERNER ASA"

    In short, var3 has the intersection words between var1 and var2. How can I do this?
    Thank you in advance.

  • #2
    If you store each pair of firm names as locals, you can calculate their intersection.

    However, punctuation will give you some trouble with this approach (since, for example, "ASA," is different from "ASA"), so I would get rid of any punctuation characters first.

    Code:
    clear
    
    input str20 var1 str20 var2
    "AKER CLEAN CARBON AS" "AKER CLEAN CARBON"
    "AKER CLEAN CARBON AS" "BADE OTTO MORTEN"
    "AKER KVAERNER SUBSEA" "AKER KVAERNER SUBSEA"
    "KVAERNER ASA" "KVAERNER ASA, LYSAKER"
    end
    
    /* Geta list of characters to see what needs to be removed */
    charlist var1
    charlist var2
    
    /* Purge the commas */
    replace var1 = subinstr(var1,",","",.)
    replace var2 = subinstr(var1,",","",.)
    
    gen var3 = ""
    
    local N = _N
    forvalues i = 1/`N' {
        local s1 = var1[`i']
        local s2 = var2[`i']        
        local intersection: list s1 & s2
        replace var3="`intersection'" in `i'
    }

    Comment


    • #3
      Dimitriy's interesting approach requires that you install charlist (SSC) first:

      Code:
      ssc install charlist

      Comment


      • #4
        Thank you Dimitriy!

        Your suggestion is perfectly working.

        Comment

        Working...
        X