Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Similarity Index

    Dear All,

    I have to create a similarity index among names of products belonging to a same firm. For each product I have a string indicating the name of the product. Of course by using "bys firm_name product_name: gen N = _N" I know how many products of that firm have exactly the same name, but I need something more avanced. For instance, something able to put together products with similar names, such as "speedy", "super speedy", and "speedy 3000". I thought of the command "substr" but I couldn't build anything useful from it.
    Could someone please help me?

    Thanks in advance
    Luca

  • #2
    Luca Gallorini
    You might be interested in a package I put together a while ago for working with string data. It contains several functions for phonetic string encoding and similarity/dissimilarity metrics:
    https://github.com/wbuchanan/StataStringUtilities

    I still need to set up a decent installer for one of the dependencies (https://GitHub.com/wbuchanan/StataJavaUtilities), but if you are interested I can explain the current process which is a bit more work.

    Comment

    Working...
    X