SQ-Ados bundle: How to perform clustermat stop after clustering

Jan-Niklas Bartel

Join Date: Jul 2017
Posts: 2

SQ-Ados bundle: How to perform clustermat stop after clustering

10 Jul 2017, 09:42

Dear all,

I am using Stata 13.1 on Windows and created a dissimilarity matrix using SQ-Ados, see my code below.

Code:

* Use specified csv as input
import delimited using C:\Users\04BAJ\Documents\Stata\170703_CFO_SQ_v01.csv, delimiters (";")

* Prepare data for SQ analysis
reshape long year, i(id) j(order)
encode year, generate(value)
drop year
sqset value id order, trim

* Input substitution cost matrix
matrix input sub = (0.000,0.274,0.332,0.606,0.394,0.668,0.726,1.000\0.274,0.000,0.606,0.332,0.668,0.394,1.000,0.726\0.332,0.606,0.000,0.274,0.726,1.000,0.394,0.668\0.606,0.332,0.274, 0.000,1.000,0.726,0.668,0.394\0.394,0.668,0.726,1.000,0.000,0.274,0.332,0.606\0.668,0.394,1.000,0.726,0.274,0.000,0.606,0.332\0.726,1.000,0.394,0.668,0.332,0.606,0.000,0.274\ 1.000, 0.726,0.668,0.394,0.606,0.332,0.274,0.000)

* Perform full SQ Analysis with specified substitution and in/del cost
sqom, full indelcost(0.49) subcost(sub)

* Save dissimilarity matrix to file and replace existing file
sqom save SQdist, replace

* Prepare data for clustering
sqclusterdat

* Perform clustering of the dissimilarity matrix using Wards
clustermat wardslinkage SQdist, name(wards) add

* Calculate Calinski stopping rules for cluster 2 to 10 as generated by Wards and name resulting matrix Calinski
clustermat stop, variables(value) rule(calinski) groups(2/10) matrix(calinski)

My goal is to validate the Wards clustering with clustermat stop. How can I perform clustermat stop based on the Wards clusters? Do I need to use sqclusterdat, return first and then apply clustermat stop? It is unclear to me what would be the correct input for variables in the clustermat stop syntax.

Many thanks for your help.

Tags: clustering, sequence analysis, SQ Ados

Jan-Niklas Bartel

Join Date: Jul 2017

Posts: 2
#2

11 Jul 2017, 01:48

Any advice on this please?

Thank you!
Comment
Brendan Halpin

Join Date: Mar 2014

Posts: 152
#3

14 Aug 2017, 11:16

A late reply is possibly no better than no reply, but anyway:

1: clustermat stop does not work as you might reasonably expect: it calculates the CH statistic based on the squared Euclidean distances between the variables listed in the variables() option, and not the SQ distances. I have written a module, calinski (available on SSC) which calculates this correctly. See http://www.ulsites.ul.ie/sociology/s...p2016-01_0.pdf

2: SQ works in the sequences in long format, thus multiple observations per case -- this is sometimes awkward for analysis (and is what the sqclusterdat command is for). An alternative for sqclusterdat is to reshape wide. Here is an example using the youthemp.dta dataset that comes with SQ.

Code:

// Install calinski ssc install calinski

Code:

// Use youthemp.dta (comes with SQ, do "net get sq" to install it in current directory) use youthemp,clear // rather arbitrary substitution cost matrix matrix sm1 = (0,1,1,2,3 \ /// 1,0,1,2,3 \ /// 1,1,0,2,2 \ /// 2,2,2,0,1 \ /// 3,3,2,1,0 ) // Set up and run SQOM: puts distances in SQdist Stata matrix reshape long st, i(id) j(t) sqset st id t sqom, name(td) indelcost(1.5) subcost(sm1) full // Important: return to wide format reshape wide // Sort into the order SQ uses internally sort st* gen id2 = _n sort id2 clustermat wards SQdist, add cluster gen q8=groups(8) calinski, dist(SQdist) id(id2)
Comment

Announcement

SQ-Ados bundle: How to perform clustermat stop after clustering

Comment

Comment