Hi everyone!
I am trying to perform cluster analysis on identify subtypes of a disease. I have more than 50 variables for the analysis, with both continous and categorical variables. I am planning to use Ward’s linkage methods with Gower’s dissimilarity coefficient. However, after reading relevant papers, I got to know that some of the candidate variables could be noise variables, and including them in the cluster analysis will mask the true cluster structure. Therefore, variable selection is recommended before cluster analysis.
My question is: Is there any module in STATA that can perform variable selection for cluster analysis? Any algorithm of variable selection is fine for me.
I am trying to perform cluster analysis on identify subtypes of a disease. I have more than 50 variables for the analysis, with both continous and categorical variables. I am planning to use Ward’s linkage methods with Gower’s dissimilarity coefficient. However, after reading relevant papers, I got to know that some of the candidate variables could be noise variables, and including them in the cluster analysis will mask the true cluster structure. Therefore, variable selection is recommended before cluster analysis.
My question is: Is there any module in STATA that can perform variable selection for cluster analysis? Any algorithm of variable selection is fine for me.
Comment