Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cluster based on string similarity

    Hey Community,

    I'm quite new to working with Stata and therefore desperately looking for help! I have a dataset consisting of >200 firms and different characteristics of these firms such as their industry affiliation (see example below). However, each firm has multiple industry group affiliations. My goal is to cluster these firms based on the similarity of industry group affiliation and to create a new categorical variable consisting of those 3 clusters. Has anyone experience with this kind of problem or can help me on how to ideally approach this? Thank you so much in advance!!

    Data:
    firm_id industry_groups
    1 Advertising, Commerce and Shopping, Sales and Marketing
    2 Advertising, Media and Entertainment, Mobile, Sales and Marketing, Software
    3 Energy, Natural Resources, Sustainability
    ... ...
    Last edited by Justus Deters; 26 Sep 2022, 05:06.

  • #2
    Justus, you may have to decide what the "similarity" means and then let Stata process. Stata is not able to automatically define the "similarity".

    Comment

    Working...
    X