Hi everybody,
I have a string variable called 'Name', which consists of names of firms, that can appear once or several times throughout the dataset.
Names have been recorded manually, so a firm may be recorded in ways that are similar but slightly different, and what I am trying to do is basically to unify and harmonize all the names that correspond to a single firm.
What it would be extremely helpful would be to create a code that can recognize if a string is a substring of another, and in that case, replace the longer name with the one that it is nested.
For example, if I have
Name
Zara
Zara
Zara
Zara Espana
Zara Home
I would like to create a code that replace the entries 'Zara Espana' and 'Zara Home' with the shorter version 'Zara'.
Is there a way to do that?
Thanks in advance!
Arianna
I have a string variable called 'Name', which consists of names of firms, that can appear once or several times throughout the dataset.
Names have been recorded manually, so a firm may be recorded in ways that are similar but slightly different, and what I am trying to do is basically to unify and harmonize all the names that correspond to a single firm.
What it would be extremely helpful would be to create a code that can recognize if a string is a substring of another, and in that case, replace the longer name with the one that it is nested.
For example, if I have
Name
Zara
Zara
Zara
Zara Espana
Zara Home
I would like to create a code that replace the entries 'Zara Espana' and 'Zara Home' with the shorter version 'Zara'.
Is there a way to do that?
Thanks in advance!
Arianna
Comment