Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stopping rules in cluster analysis on binary data

    Hi everyone,

    I have conducted an average linkage, hierarchical cluster analysis using the Sneath and Sokall similarity coefficent as all my variables are binary (present=1, absent=0), but co-absence shouldn't weigh as much as co-presence in the clustering. Now I have found that the stopping rules in cluster analysis supported by Stata are the Calinski–Harabasz pseudo-F and the The Duda–Hart Je(2)/Je(1) index. However, both of these are for continous data.

    Is there any way I could for instance use an adaptation of the Goodman ad Kruskal's gamma statistic for categorical data or something else like it in Stata?

    FYI: I have nearly copy-pastet this post https://www.statalist.org/forums/for...on-binary-data as the problem described there is nearly the same as mine, however, no solution is provided. I am hoping a solution has been found since 2017.

  • #2
    How many variables do you have? For a modest number of such indicators, the number of possible classes may be very small and only some of those may occur in practice.

    Comment

    Working...
    X