Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merge with different string names

    Dear All,

    I would like to use company name to merge two datasets. However, the company name is not consistent in two datasets.

    For example, the company in one dataset is called " 1-800-FLOWERS.COM" while in the other is called "1-800-FLOWERS.COM INC". There are some more complicated and different names regarding the same company.

    I know most of us use numerical variables such as gvkey or id to merge two datasets.

    However, when using sting variables to merge, I am wondering whether there is a way to solve mentioned-above problem in Stata?

    Thank you so much.

    Best Regards,

    Chaoqun

  • #2
    Michael Blasnik's program -reclink-, available from SSC, may be helpful here.

    Still, there is no way that Stata is going to know that 1-800-FLOWERS.COM is the same firm as 1-800-FLOWERS.COM INC unless you explicitly tell it that. If -reclink- does not accomplish what you need, you may have to simply create a crosswalk between the names that appear in your data sets and a standardized version of the names that you create, merge each data set with the crosswalk, and then use the standardized name as the key to merge the two data sets themselves.

    Comment


    • #3
      This kind of question is one of the most frequent here. Searching the archives using keywords such as reclink would turn up similar material.

      Comment


      • #4
        Thank you so much for all of your answers. It is really helpful.

        Comment


        • #5
          This might be too late, and a little bit of self-advertising, but people facing this kind of problem might be interested by this ado command I wrote:
          Dear all, Let me share with you matchit which is an ado command I have just written. In a nutshell, matchit provides a similarity score between two different text

          Comment

          Working...
          X