Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • sqom ignoring 2/3 of my sequences

    Hello,

    Im trying to do perform optimal matching of sequences using the sqom command. However, the matrix that is generated by this procedure seems to ignore 2/3 of my sequences.
    I start with 23,665 sequences.

    Code:
    sqom, full k(2)
    ​
    This generate the following matrix:

    SQdist[1464,1464]
    levels[1,9]


    This matrix only account for only for 8,199 sequences. This number can be obtain by counting the total number of sequences reprensent by each sequence in the matrix :


    Code:
    egen sumcase = sum(_SQn)
    tab sumcase
    ​
    sumcase | Freq. Percent Cum.
    ------------+-----------------------------------
    8199 | 1,464 100.00 100.00
    ------------+-----------------------------------
    Total | 1,464 100.00



    For the cases included in the matrix I can generate cluster groups and these groups do make sense. However, I have no idea what happened to the other sequences and why they are not integrated into the matrix. Is this normal ? Am I missing something ?

    -Nicolas Bastien
Working...
X