Hello,
I'm very new to sequence analysis and especially sequence clusters analysis and it's difficult to find detailed tutorials online for big datasets. I want to create clusters from sequences made of 1, 2, and 3 (so eg. 2222222... or 22223333322...), one number for each of the 18 years in the data (so each sequence's length is 18). From previous research and theory I know I want 7 clusters. I need to assign each individual to their cluster for a regression analysis later.
I have a big dataset with around 170,000 individuals, and about 20,000 individual sequences. (I can't share the data because of privacy regulations.)
Right now I'm using the following code:
after running clustermat:
error "unable to allocate real.... function returned error... r(2900);"
Is it because of the size of the dataset / number of sequences? What would you recommend to deal with this? I tried other options of sqom but it didn't help.
I also tried to run this analysis with R, but encountered errors (also most likely due to the size) at a similar step
I'm very new to sequence analysis and especially sequence clusters analysis and it's difficult to find detailed tutorials online for big datasets. I want to create clusters from sequences made of 1, 2, and 3 (so eg. 2222222... or 22223333322...), one number for each of the 18 years in the data (so each sequence's length is 18). From previous research and theory I know I want 7 clusters. I need to assign each individual to their cluster for a regression analysis later.
I have a big dataset with around 170,000 individuals, and about 20,000 individual sequences. (I can't share the data because of privacy regulations.)
Right now I'm using the following code:
Code:
sqom matrix dir sqclusterdat clustermat wardslinkage SQdist, name(myname) add cluster generate cluster = groups(7) sqclusterdat, return keep(cluster myname*)
error "unable to allocate real.... function returned error... r(2900);"
Is it because of the size of the dataset / number of sequences? What would you recommend to deal with this? I tried other options of sqom but it didn't help.
I also tried to run this analysis with R, but encountered errors (also most likely due to the size) at a similar step
Comment