I have a data that the secquence patter is like below:
Sequence-Pa |
ttern | Freq. Percent Cum.
------------+-----------------------------------
11111 | 209 83.27 83.27
22222 | 17 6.77 90.04
21111 | 8 3.19 93.23
22111 | 5 1.99 95.22
22211 | 4 1.59 96.81
11112 | 1 0.40 97.21
11122 | 1 0.40 97.61
11222 | 1 0.40 98.01
12111 | 1 0.40 98.41
12211 | 1 0.40 98.80
12222 | 1 0.40 99.20
21222 | 1 0.40 99.60
22212 | 1 0.40 100.00
------------+-----------------------------------
Total | 251 100.00
I tried Sqom syntex but the _SQdist score somehow did't interpret well the secuence pattern (the way that I wanted)
for example, the pattern 22111 and 11222 is completly difference pattern that I want to cluster. but it gave same distance score.
the cluster result should be like this :
1) 11111
2) 22222
3) 21111 or 22111 or 22211 the similar sequence doesn't matter, because I just want to cluster group who moved from 2 to 1.
4) 11112 or 11122, etc, I want to cluster group who started from 1 and moved to 2 at the end
Should I try to do another analysis than cluster analysis?
If it is recommended, Please help!
Sequence-Pa |
ttern | Freq. Percent Cum.
------------+-----------------------------------
11111 | 209 83.27 83.27
22222 | 17 6.77 90.04
21111 | 8 3.19 93.23
22111 | 5 1.99 95.22
22211 | 4 1.59 96.81
11112 | 1 0.40 97.21
11122 | 1 0.40 97.61
11222 | 1 0.40 98.01
12111 | 1 0.40 98.41
12211 | 1 0.40 98.80
12222 | 1 0.40 99.20
21222 | 1 0.40 99.60
22212 | 1 0.40 100.00
------------+-----------------------------------
Total | 251 100.00
I tried Sqom syntex but the _SQdist score somehow did't interpret well the secuence pattern (the way that I wanted)
for example, the pattern 22111 and 11222 is completly difference pattern that I want to cluster. but it gave same distance score.
the cluster result should be like this :
1) 11111
2) 22222
3) 21111 or 22111 or 22211 the similar sequence doesn't matter, because I just want to cluster group who moved from 2 to 1.
4) 11112 or 11122, etc, I want to cluster group who started from 1 and moved to 2 at the end
Should I try to do another analysis than cluster analysis?
If it is recommended, Please help!
Comment