Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating Silhouette Width in Cluster Analysis

    Dear all,

    I have an advanced question on cluster analysis. I read very carefully the cluster analysis section of the Stata support manual but I seem unable to approach this problem.

    I have a data set that looks like the following
    group var1 var2 var3
    1 0.570149 0 NL
    1 0.666821 1 IT
    1 0.627552 1 DE
    1 0.47595 0 IT
    1 0.546017 0 PO
    1 0.178791 1 FR
    1 0.337399 0 IT
    2 0.723914 0 DE
    2 0.722514 1 DE
    2 0.352195 1 NL
    2 0.480027 1 GB
    2 0.805926 0 HU
    The first column group shows teams made of individuals (in my dataset these teams are somewhat larger). Within each team I would like to identify clusters based ton var1-var3. I would like cluster to identify clusters using the Ward method and the optimal number of clusters to be chosen using the Calinski–Harabasz maximum pseudo-F. Based on the number of identified clusters, I would like to calculate the Silhouette Width (briefly described in this Wikipedia entry) of each group. These steps should produce as an output a database containing one Silhouette Width value for each group. Precisely, I would like to obtain something like the following:
    group SW
    1 0.8447
    2 0.2388
    I imagine that the code I am looking for is modular and byable for every group.

    I am greatly thankful for any help I can get to solve this problem.

    All the best,
    Riccardo
Working...
X