Calculating Silhouette Width in Cluster Analysis

Riccardo Valboni

Join Date: Jun 2014
Posts: 123

Calculating Silhouette Width in Cluster Analysis

18 Jul 2014, 10:51

Dear all,

I have an advanced question on cluster analysis. I read very carefully the cluster analysis section of the Stata support manual but I seem unable to approach this problem.

I have a data set that looks like the following

group	var1	var2	var3
1	0.570149	0	NL
1	0.666821	1	IT
1	0.627552	1	DE
1	0.47595	0	IT
1	0.546017	0	PO
1	0.178791	1	FR
1	0.337399	0	IT
2	0.723914	0	DE
2	0.722514	1	DE
2	0.352195	1	NL
2	0.480027	1	GB
2	0.805926	0	HU

The first column group shows teams made of individuals (in my dataset these teams are somewhat larger). Within each team I would like to identify clusters based ton var1-var3. I would like cluster to identify clusters using the Ward method and the optimal number of clusters to be chosen using the Calinski–Harabasz maximum pseudo-F. Based on the number of identified clusters, I would like to calculate the Silhouette Width (briefly described in this Wikipedia entry) of each group. These steps should produce as an output a database containing one Silhouette Width value for each group. Precisely, I would like to obtain something like the following:

group	SW
1	0.8447
2	0.2388

I imagine that the code I am looking for is modular and byable for every group.

I am greatly thankful for any help I can get to solve this problem.

All the best,
Riccardo

Tags: None

Announcement

Calculating Silhouette Width in Cluster Analysis