Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Find if a string is a substring of another

    Hi everybody,

    I have a string variable called 'Name', which consists of names of firms, that can appear once or several times throughout the dataset.
    Names have been recorded manually, so a firm may be recorded in ways that are similar but slightly different, and what I am trying to do is basically to unify and harmonize all the names that correspond to a single firm.
    What it would be extremely helpful would be to create a code that can recognize if a string is a substring of another, and in that case, replace the longer name with the one that it is nested.

    For example, if I have

    Name
    Zara
    Zara
    Zara
    Zara Espana
    Zara Home

    I would like to create a code that replace the entries 'Zara Espana' and 'Zara Home' with the shorter version 'Zara'.
    Is there a way to do that?

    Thanks in advance!
    Arianna

  • #2
    Arianna, if the name variable has consistent structure as "short name" + a blank space + "redundant strings", then the code below will extract the short name from original string.

    Code:
    split Name
    If name is arbitrarily structured, then the algorithm could be unclear. For example, if the last obs is "Home Zara", and there is another brand called "Home", then Stata won't know if "Home Zara" belongs to "Zara" or "Home".
    Last edited by Fei Wang; 10 Nov 2021, 09:30.

    Comment

    Working...
    X