Hello,
Im trying to do perform optimal matching of sequences using the sqom command. However, the matrix that is generated by this procedure seems to ignore 2/3 of my sequences.
I start with 23,665 sequences.
This generate the following matrix:
SQdist[1464,1464]
levels[1,9]
This matrix only account for only for 8,199 sequences. This number can be obtain by counting the total number of sequences reprensent by each sequence in the matrix :
sumcase | Freq. Percent Cum.
------------+-----------------------------------
8199 | 1,464 100.00 100.00
------------+-----------------------------------
Total | 1,464 100.00
For the cases included in the matrix I can generate cluster groups and these groups do make sense. However, I have no idea what happened to the other sequences and why they are not integrated into the matrix. Is this normal ? Am I missing something ?
-Nicolas Bastien
Im trying to do perform optimal matching of sequences using the sqom command. However, the matrix that is generated by this procedure seems to ignore 2/3 of my sequences.
I start with 23,665 sequences.
Code:
sqom, full k(2)
SQdist[1464,1464]
levels[1,9]
This matrix only account for only for 8,199 sequences. This number can be obtain by counting the total number of sequences reprensent by each sequence in the matrix :
Code:
egen sumcase = sum(_SQn) tab sumcase
------------+-----------------------------------
8199 | 1,464 100.00 100.00
------------+-----------------------------------
Total | 1,464 100.00
For the cases included in the matrix I can generate cluster groups and these groups do make sense. However, I have no idea what happened to the other sequences and why they are not integrated into the matrix. Is this normal ? Am I missing something ?
-Nicolas Bastien