Hi all,
I am doing cluster analysis and to decide on the "best" number of clusters, I am calculating the indices proposed in the report by Makles(2012, attached): WSS, log(WSS), n2, PRE. I now would like to plot the results for k clusters for multiple starting values in the same plot, as it is done in the attached publication by Liu et al. However, I have problems coding this. The code I apply for plotting the plots for only one starting value is:
forvalues k = 1(1)20 {
cluster kmeans experience_6 heat_experience_6 corona_4 if experience_1 >= 1 & heat_experience_1 >=1, k(`k') start(random(1)) name(cs`k')
}
matrix WSS = J(20,5,.)
matrix colnames WSS = k WSS log(WSS) eta-squared PRE
forvalues k = 1(1)20 {
scalar ws`k' = 0
foreach var of varlist experience_6 heat_experience_6 corona_4 {
quietly anova `var' cs`k'
scalar ws`k' = ws`k' + e(rss)
}
matrix WSS[`k', 1] = `k'
matrix WSS[`k', 2] = ws`k'
matrix WSS[`k', 3] = log(ws`k')
matrix WSS[`k', 4] = 1- ws`k' / WSS[1,2]
matrix WSS[`k', 5] = (WSS[`k'- 1,2] - ws`k') / WSS[`k' - 1,2]
}
matrix list WSS
_matplot WSS, columns (2 1) connect(1) xlabel(#10) name(plot1, replace) nodraw noname
_matplot WSS, columns (3 1) connect(1) xlabel(#10) name(plot2, replace) nodraw noname
_matplot WSS, columns (4 1) connect(1) xlabel(#10) name(plot3, replace) nodraw noname
_matplot WSS, columns (5 1) connect(1) xlabel(#10) name(plot4, replace) nodraw noname
graph combine plot1 plot2 plot3 plot4, name(plot1to4, replace)
I am doing cluster analysis and to decide on the "best" number of clusters, I am calculating the indices proposed in the report by Makles(2012, attached): WSS, log(WSS), n2, PRE. I now would like to plot the results for k clusters for multiple starting values in the same plot, as it is done in the attached publication by Liu et al. However, I have problems coding this. The code I apply for plotting the plots for only one starting value is:
forvalues k = 1(1)20 {
cluster kmeans experience_6 heat_experience_6 corona_4 if experience_1 >= 1 & heat_experience_1 >=1, k(`k') start(random(1)) name(cs`k')
}
matrix WSS = J(20,5,.)
matrix colnames WSS = k WSS log(WSS) eta-squared PRE
forvalues k = 1(1)20 {
scalar ws`k' = 0
foreach var of varlist experience_6 heat_experience_6 corona_4 {
quietly anova `var' cs`k'
scalar ws`k' = ws`k' + e(rss)
}
matrix WSS[`k', 1] = `k'
matrix WSS[`k', 2] = ws`k'
matrix WSS[`k', 3] = log(ws`k')
matrix WSS[`k', 4] = 1- ws`k' / WSS[1,2]
matrix WSS[`k', 5] = (WSS[`k'- 1,2] - ws`k') / WSS[`k' - 1,2]
}
matrix list WSS
_matplot WSS, columns (2 1) connect(1) xlabel(#10) name(plot1, replace) nodraw noname
_matplot WSS, columns (3 1) connect(1) xlabel(#10) name(plot2, replace) nodraw noname
_matplot WSS, columns (4 1) connect(1) xlabel(#10) name(plot3, replace) nodraw noname
_matplot WSS, columns (5 1) connect(1) xlabel(#10) name(plot4, replace) nodraw noname
graph combine plot1 plot2 plot3 plot4, name(plot1to4, replace)
Comment