Dear Stata users,
I am sorry in advance if this question is not directly related to Stata's functionality.
I have data on companies nested within clusters over years. Unfortunately, the algorithm assigned cluster categories in a given year randomly and now I need to define consistency between clusters.
I have the following data now (this is just an example, the actual dataset is significantly bigger with thousands of firms nested within clusters over a decade):
firm_id cluster_id year
Firm1 1 2001
Firm2 1 2001
Firm3 1 2001
Firm4 2 2001
Firm5 2 2001
Firm6 2 2001
Firm7 2 2001
Firm4 1 2002
Firm5 1 2002
Firm8 1 2002
Firm1 2 2002
Firm2 2 2002
Firm9 2 2002
What can be seen from here is that Firm1 and Firm2 are in Cluster2 in 2002. Apparently this is Cluster1 from 2001. Percentage of member overlap between Cluster1 in 2001 and Cluster2 in 2002 is thus 50% (2 firms shared between two clusters over 4 firms in both communities). What I would like to have is to rename Cluster2 in 2002 in Cluster1 given a certain percentage of members overlap (say, 50%).
I would be grateful for you help.
Best,
Giorgio
I am sorry in advance if this question is not directly related to Stata's functionality.
I have data on companies nested within clusters over years. Unfortunately, the algorithm assigned cluster categories in a given year randomly and now I need to define consistency between clusters.
I have the following data now (this is just an example, the actual dataset is significantly bigger with thousands of firms nested within clusters over a decade):
firm_id cluster_id year
Firm1 1 2001
Firm2 1 2001
Firm3 1 2001
Firm4 2 2001
Firm5 2 2001
Firm6 2 2001
Firm7 2 2001
Firm4 1 2002
Firm5 1 2002
Firm8 1 2002
Firm1 2 2002
Firm2 2 2002
Firm9 2 2002
What can be seen from here is that Firm1 and Firm2 are in Cluster2 in 2002. Apparently this is Cluster1 from 2001. Percentage of member overlap between Cluster1 in 2001 and Cluster2 in 2002 is thus 50% (2 firms shared between two clusters over 4 firms in both communities). What I would like to have is to rename Cluster2 in 2002 in Cluster1 given a certain percentage of members overlap (say, 50%).
I would be grateful for you help.
Best,
Giorgio
Comment