Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Plotting multiple solutions for kmeans clustering in same plot

    Hi all,

    I am doing cluster analysis and to decide on the "best" number of clusters, I am calculating the indices proposed in the report by Makles(2012, attached): WSS, log(WSS), n2, PRE. I now would like to plot the results for k clusters for multiple starting values in the same plot, as it is done in the attached publication by Liu et al. However, I have problems coding this. The code I apply for plotting the plots for only one starting value is:


    forvalues k = 1(1)20 {
    cluster kmeans experience_6 heat_experience_6 corona_4 if experience_1 >= 1 & heat_experience_1 >=1, k(`k') start(random(1)) name(cs`k')
    }

    matrix WSS = J(20,5,.)
    matrix colnames WSS = k WSS log(WSS) eta-squared PRE

    forvalues k = 1(1)20 {
    scalar ws`k' = 0
    foreach var of varlist experience_6 heat_experience_6 corona_4 {
    quietly anova `var' cs`k'
    scalar ws`k' = ws`k' + e(rss)
    }
    matrix WSS[`k', 1] = `k'
    matrix WSS[`k', 2] = ws`k'
    matrix WSS[`k', 3] = log(ws`k')
    matrix WSS[`k', 4] = 1- ws`k' / WSS[1,2]
    matrix WSS[`k', 5] = (WSS[`k'- 1,2] - ws`k') / WSS[`k' - 1,2]
    }
    matrix list WSS


    _matplot WSS, columns (2 1) connect(1) xlabel(#10) name(plot1, replace) nodraw noname
    _matplot WSS, columns (3 1) connect(1) xlabel(#10) name(plot2, replace) nodraw noname
    _matplot WSS, columns (4 1) connect(1) xlabel(#10) name(plot3, replace) nodraw noname
    _matplot WSS, columns (5 1) connect(1) xlabel(#10) name(plot4, replace) nodraw noname
    graph combine plot1 plot2 plot3 plot4, name(plot1to4, replace)
    https://journals.sagepub.com/doi/10.1177/1536867X1201200213

  • #2
    Hi Lisa. Unfortunately we cannot see the attached Liu et al. publication, so we cannot see which graph you are trying to produce. Maybe you can post a screenshot (if copyright allows) ?
    Jorge Eduardo Pérez Pérez
    www.jorgeperezperez.com

    Comment


    • #3
      Hello, thanks for the quick response. I added a screenshot from a paper where I am sure they did this with state as it is a report published in the state journal. They show the values for WSS, log(WSS) depending on number of clusters (1-20) and 50 different starting values.

      Comment


      • #4
        They show the WSS, log(WSS) etc. depending on number of clusters (1-20) for 50 different starting values


        source: https://journals.sagepub.com/doi/pdf...867X1201200213
        Attached Files
        Last edited by lisa koehler; 13 May 2024, 14:07.

        Comment


        • #5
          This should be easier populating observations in the dataset instead of populating a matrix. This should get you started:


          Code:
          sysuse auto, clear
          
          gen k = .
          gen start = .
          gen WSS = .
          gen logWSS = .
          gen etasquared = .
          gen PRE = .
          
          set obs `=5*20'
          
          loc i = 1
          forvalues kk = 1(1)5 {    
              
              forvalues start = 1(1)20 {
                  
                  replace start = `start' in `i'
                  replace k = `kk' in `i'
                  
                  cluster kmeans turn trunk mpg , k(`kk') start(random(`start')) name(cs`kk'_`start')
                  
                  scalar ws`kk' = 0
                  foreach var of varlist turn trunk mpg {
                      quietly anova `var' cs`kk'_`start'
                      scalar ws`kk' = ws`kk' + e(rss)
                  }
                  
                  replace WSS = ws`kk' in `i'
                  replace logWSS = log(ws`kk') in `i'
                  if `kk'!=1 {
                      su WSS if k == 1, meanonly
                      replace etasquared = (1-ws`kk')/r(mean) in `i'        
                      su WSS if k == `=`kk'-1', meanonly
                      replace PRE = (r(mean)-WSS)/WSS
                  }
                  else {
                      replace etasquared = 0 in `i'
                  }
                  
                  loc ++ i
                  
              }
          }
          
          preserve
          keep k start WSS logWSS etasquared PRE
          reshape wide WSS logWSS etasquared PRE, i(k) j(start)
          line WSS* k
          line logWSS* k
          line etasquared* k
          line PRE* k
          restore
          Jorge Eduardo Pérez Pérez
          www.jorgeperezperez.com

          Comment

          Working...
          X