Dear all,
I have an advanced question on cluster analysis. I read very carefully the cluster analysis section of the Stata support manual but I seem unable to approach this problem.
I have a data set that looks like the following
The first column group shows teams made of individuals (in my dataset these teams are somewhat larger). Within each team I would like to identify clusters based ton var1-var3. I would like cluster to identify clusters using the Ward method and the optimal number of clusters to be chosen using the Calinski–Harabasz maximum pseudo-F. Based on the number of identified clusters, I would like to calculate the Silhouette Width (briefly described in this Wikipedia entry) of each group. These steps should produce as an output a database containing one Silhouette Width value for each group. Precisely, I would like to obtain something like the following:
I imagine that the code I am looking for is modular and byable for every group.
I am greatly thankful for any help I can get to solve this problem.
All the best,
Riccardo
I have an advanced question on cluster analysis. I read very carefully the cluster analysis section of the Stata support manual but I seem unable to approach this problem.
I have a data set that looks like the following
group | var1 | var2 | var3 |
1 | 0.570149 | 0 | NL |
1 | 0.666821 | 1 | IT |
1 | 0.627552 | 1 | DE |
1 | 0.47595 | 0 | IT |
1 | 0.546017 | 0 | PO |
1 | 0.178791 | 1 | FR |
1 | 0.337399 | 0 | IT |
2 | 0.723914 | 0 | DE |
2 | 0.722514 | 1 | DE |
2 | 0.352195 | 1 | NL |
2 | 0.480027 | 1 | GB |
2 | 0.805926 | 0 | HU |
group | SW |
1 | 0.8447 |
2 | 0.2388 |
I am greatly thankful for any help I can get to solve this problem.
All the best,
Riccardo