Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculate the "similscore" of two variables in the same dataset

    Dear All,

    The fuzzy match package "matchit" can create the similscore of the two matched string variables. But now I have two variables in the same dataset that I want to calculate the "similscore". Is there a function in STATA that does this? For example, I have collected firm names from two different sources and I want to confirm that it's the same firm using the two names. So I want a similarity measure between the two firm names. Thank you!

  • #2
    matchit is from SSC (FAQ Advice #12). Also see https://www.statalist.org/forums/help#spelling on spelling Stata. You likely need to separate the variables into two datasets and form pairwise combinations of the datasets if the strings are not paired. Something similar at: https://www.statalist.org/forums/for...-same-variable. If you cannot make sense of the example in the linked thread, provide a data example for any code suggestions. You may, e.g., copy and paste the output of

    Code:
    dataex var1 var2
    where you replace "var1" and "var2" with the names of your variables.

    Comment


    • #3
      Originally posted by Andrew Musau View Post
      matchit is from SSC (FAQ Advice #12). Also see https://www.statalist.org/forums/help#spelling on spelling Stata. You likely need to separate the variables into two datasets and form pairwise combinations of the datasets if the strings are not paired. Something similar at: https://www.statalist.org/forums/for...-same-variable. If you cannot make sense of the example in the linked thread, provide a data example for any code suggestions. You may, e.g., copy and paste the output of

      Code:
      dataex var1 var2
      where you replace "var1" and "var2" with the names of your variables.
      Thank you Andrew! Apologizing on the spelling of Stata. My data looks like the following:

      Name1 Name2
      AAP Corp, AAP Corporation
      AAP Corp, ABP Corp
      Univ Inc., AAP Corp
      Univ Inc., Universe Inc.

      I want to calculate a similarity score between the two variables Name1 and Name2. Thanks!

      Comment


      • #4
        If the entries are paired, isn't it simply

        Code:
        matchit Name1 Name2
        ?

        Comment


        • #5
          Originally posted by Andrew Musau View Post
          If the entries are paired, isn't it simply

          Code:
          matchit Name1 Name2
          ?
          Yes! That totally worked, thank you! I didn't know that I can use matchit to calculate similscore directly, I thought it was only used for fuzzy matching two datasets. Thanks again!

          Comment

          Working...
          X