Cluster analysis with ordinal data

Andrew Wade

Join Date: Aug 2017

Posts: 28
#1

Cluster analysis with ordinal data

30 Oct 2018, 18:31

Hello,
My colleague is doing the above with SPSS....the cluster analysis suite within Stata can only handle binary or continuous data.
Can someone point me to an ado that does this?
The data I'm using is a scale 0 to 4.
Or maybe Stata can introduce in Stata 16!

I found an old Stata-list post that suggested creating a dissimiliarity matrix in Mata, and then using the clustermat command.
(https://www.stata.com/statalist/arch.../msg00989.html)
But it didn't suggest a specific technique.
Any suggestions on that point?

Regards,

Andrew
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35734
#2

30 Oct 2018, 19:05

Please enlighten us on what SPSS does that is specific to ordinal data. For the moment I don't think it's fair to say that cluster analysis in Stata assumes binary or continuous data. You can use cluster analysis on discrete (counted or graded) data too, as Stata doesn't have variable types that correspond to measurement scales. So, you can feed graded data to a cluster analysis: the grades e.g. 1, 2, 3, 4, 5 will just get treated literally, so treated as if difference between grades is a distance. I don't know what cluster analysis for ordinal data would look like, so literature references would be welcome.
1 like
Comment
Andrew Wade

Join Date: Aug 2017

Posts: 28
#3

30 Oct 2018, 22:37

Hello,
Thanks for your response. I have done some more investigation regarding SPSS. I stand corrected....SPSS assumes each categorical variable has a multinomial distribution. This is equivalent, I believe, to simply using a set of binary variables, even if a variable is ordinal.
With respect to Stata, the documentation on cluster kmeans and kmedians says "Continuous or binary data are allowed with cluster kmeans and cluster kmedians"
And the documentation for cluster linkage indicates that the measure option is designed for either/or binary and continuous data.
So the questions are:
Should just consider my ordinal variable continuous?

Is this a robust approach within a cluster analysis?

Based upon my research, I think the answer to both is no.

As an aside, I've since learnt that the distance tool in SAS can handle nominal, ordinal, interval, and ratio data.

I will need to look for some literature examples.

Regards,

Andrew
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35734
#4

31 Oct 2018, 01:46

Thanks for the extra details. Without more information, it seems that to me that using several indicator variables ignores the ordinal flavour. Naturally, treating ordinal grades literally is a stronger assumption than might seem defensible, but techniques exist to estimate scores for ordinal variables and one could always use some kind of sensitivity analysis e.g. compare results with grades 1 2 3 4 5 and with alternative grades 1 3 4 5 7, etc.

There is an alternative approach which often seems neglected but it possible if you have just a few categorical variables. For example, two 5 point variables themselves define 25 possible clusters and you see which occur without using cluster analysis to find what clusters can be defined.

FWIW, I would tend to use correspondence analysis more often than cluster analysis with categorical data. Cluster analysis has long seemed to me to be a bit of a mess with so many ways to do it!
Comment

Announcement

Cluster analysis with ordinal data

Comment

Comment

Comment