How to plot several stacked percent bar charts side-by-side with groups of variables and subgraphs?

Lars Hennsky

Join Date: Mar 2016

Posts: 7
#1

How to plot several stacked percent bar charts side-by-side with groups of variables and subgraphs?

25 Mar 2016, 12:40

Hello,

I did a cluster analysis of categorical variables and want to plot the result in a summary graph. There are three groups of variables that contain 'dummy variables'. I'm able to plot one group of these stacked 'dummy variables' with subgraphs by cluster membership. But I want to add two more groups of variables next to the bar.
Thats the first group of variables:

Code:
graph bar a_group1 b_group1 c_group1 d_group1 e_group1 f_group1 x_ group1,
by(, legend(off)) xsize(6) ysize(8) aspectratio(1.2)
by(clus_8_ward_gower) stack percent
How do I add x_group2 and x_group3 that they are displayed each stacked side-by-side by cluster membership (see sketch, ignoring the 'count' bar)?

Is it possible to add a fourth variable next to the percentage-bars that displays a mean on a second scale (see whole sketch)?
I did an extensive google search and read the state documentation, but I couldn't figure out how to do it.
If you don't know a solution, perhaps you have a better idea how to visualize clustered categorical data by groups.

Thanks in advance, Lars

2 Photos

Last edited by Lars Hennsky; 25 Mar 2016, 13:25.
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35651
#2

28 Mar 2016, 06:51

Cross-posted at http://stats.stackexchange.com/quest...cent-bar-chart (and on hold as off topic, but an interesting answer is visible as I write).

"Lars Vegas": please see

http://www.statalist.org/forums/help#crossposting Explicit policy on cross-posting

http://www.statalist.org/forums/help#stata Advice on posting examples

http://www.statalist.org/forums/help#realnames Please use full real names
Comment
Lars Hennsky

Join Date: Mar 2016

Posts: 7
#3

29 Mar 2016, 08:58

Hello Nick,

thank you for your advice.

I contacted the forum administrators to change the name to my real one.

Because I am unable to edit my original post, here is the additional information:

I cross-posted this question to Cross Validated / Stack Exchange (http://stats.stackexchange.com/quest.../204078#204078), where a user already proposed to use the package "combineplot". But I did not mange to stack the variable groups with this package.

In the meantime I made some progress with the following code. But I still can not stack the groups of variables.
graph bar (sum) variable_1 variable_2 (sum) variable_3 variable_4 (sum) variable_5 variable_6 variable_7 variable_8 variable_9 variable_10 variable_11 (sum) variable_12 variable_13, nofill percentages showyvars yvaroptions(relabel(1 group_1 2 group_1 3 group_2 4 group_2 5 group_3 6 group_3 7 group_3 8 group_3 9 group_3 10 group_3 11 group_3 12 group_4 13 group_4) label(angle(forty_five) labsize(vsmall))) by(, legend(off)) name(ward_gower_bar_11, replace) by(clus_11_ward_gower)

Perhaps this clarifies my problem.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35651
#4

29 Mar 2016, 09:08

That helps (thanks), but a sample dataset would help more. See the second link in #2.
Comment

Lars Hennsky

Join Date: Mar 2016
Posts: 7

29 Mar 2016, 11:26

Ok thanks for the advice.
I am using Stata 14.1 on Windows 7.
Here is my excerpt:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int ID byte(variable_1 variable_2 variable_3 variable_4 variable_5 variable_6 variable_7 variable_8 variable_9 variable_10 variable_11 variable_12 variable_13 clus_11_ward_gower)
   1 1 0 1 0 0 1 0 0 0 0 0 1 0  1
   2 1 0 0 1 0 1 0 0 0 0 0 1 0  2
   3 1 0 0 1 0 0 0 0 0 1 0 1 0  3
   4 1 0 1 0 0 0 1 0 0 0 0 1 0  3
   5 1 0 1 0 0 0 0 0 0 0 1 1 0  3
   6 1 0 1 0 1 0 0 0 0 0 0 0 1  6
   7 1 0 1 0 0 1 0 0 0 0 0 1 0  1
   8 1 0 0 1 1 0 0 0 0 0 0 1 0  4
   9 1 0 1 0 0 1 0 0 0 0 0 1 0  1
  10 0 1 0 1 0 1 0 0 0 0 0 1 0  9
  11 1 0 1 0 0 0 0 0 0 1 0 1 0  3
  12 1 0 0 1 0 0 0 1 0 0 0 0 1  7
  13 1 0 0 1 0 0 1 0 0 0 0 1 0  3
  14 1 0 1 0 0 0 0 0 1 0 0 1 0  3
  15 1 0 0 1 0 1 0 0 0 0 0 1 0  2
  16 1 0 1 0 0 0 0 1 0 0 0 1 0  3
  17 0 1 1 0 0 1 0 0 0 0 0 1 0 10
  18 0 1 0 1 1 0 0 0 0 0 0 1 0  9
  19 0 1 1 0 0 1 0 0 0 0 0 1 0 10
  20 1 0 1 0 1 0 0 0 0 0 0 1 0  5
  21 0 1 1 0 1 0 0 0 0 0 0 1 0 11
  22 0 1 1 0 0 0 0 1 0 0 0 1 0 11
  23 1 0 1 0 0 1 0 0 0 0 0 1 0  1
  24 1 0 1 0 0 1 0 0 0 0 0 0 1  6
  25 1 0 1 0 0 1 0 0 0 0 0 1 0  1
  26 0 1 0 1 1 0 0 0 0 0 0 0 1  7
  27 1 0 0 1 0 0 0 1 0 0 0 1 0  3
  28 1 0 0 1 0 1 0 0 0 0 0 1 0  2
  29 1 0 0 1 1 0 0 0 0 0 0 0 1  7
  30 0 1 1 0 1 0 0 0 0 0 0 1 0 11
end

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35651

29 Mar 2016, 11:52

Thanks for the example. The size of the problem and the names of the variables seem to change from post to post! Here I am guessing at what you most want.

My major advice is that you will find graphics a lot easier if you restructure to fewer variables.

My minor advice is that stacking bars doesn't always help to see structure in data. You can just get a fruit salad display of many colours that has to be decoded.

I used tabplot (SSC). Here's my code and the result.

I'd recommend strongly that you use correspondence analysis to produce a seriation here. Unless the names of the variables have inherent meaning, it is highly likely that the original variables and the clusters can be reshuffled to produce a better order.

Code:

clear
set scheme s1color
input int ID byte(variable_1 variable_2 variable_3 variable_4 variable_5 variable_6 variable_7 variable_8 variable_9 variable_10 variable_11 variable_12 variable_13 clus_11_ward_gower)
   1 1 0 1 0 0 1 0 0 0 0 0 1 0  1
   2 1 0 0 1 0 1 0 0 0 0 0 1 0  2
   3 1 0 0 1 0 0 0 0 0 1 0 1 0  3
   4 1 0 1 0 0 0 1 0 0 0 0 1 0  3
   5 1 0 1 0 0 0 0 0 0 0 1 1 0  3
   6 1 0 1 0 1 0 0 0 0 0 0 0 1  6
   7 1 0 1 0 0 1 0 0 0 0 0 1 0  1
   8 1 0 0 1 1 0 0 0 0 0 0 1 0  4
   9 1 0 1 0 0 1 0 0 0 0 0 1 0  1
  10 0 1 0 1 0 1 0 0 0 0 0 1 0  9
  11 1 0 1 0 0 0 0 0 0 1 0 1 0  3
  12 1 0 0 1 0 0 0 1 0 0 0 0 1  7
  13 1 0 0 1 0 0 1 0 0 0 0 1 0  3
  14 1 0 1 0 0 0 0 0 1 0 0 1 0  3
  15 1 0 0 1 0 1 0 0 0 0 0 1 0  2
  16 1 0 1 0 0 0 0 1 0 0 0 1 0  3
  17 0 1 1 0 0 1 0 0 0 0 0 1 0 10
  18 0 1 0 1 1 0 0 0 0 0 0 1 0  9
  19 0 1 1 0 0 1 0 0 0 0 0 1 0 10
  20 1 0 1 0 1 0 0 0 0 0 0 1 0  5
  21 0 1 1 0 1 0 0 0 0 0 0 1 0 11
  22 0 1 1 0 0 0 0 1 0 0 0 1 0 11
  23 1 0 1 0 0 1 0 0 0 0 0 1 0  1
  24 1 0 1 0 0 1 0 0 0 0 0 0 1  6
  25 1 0 1 0 0 1 0 0 0 0 0 1 0  1
  26 0 1 0 1 1 0 0 0 0 0 0 0 1  7
  27 1 0 0 1 0 0 0 1 0 0 0 1 0  3
  28 1 0 0 1 0 1 0 0 0 0 0 1 0  2
  29 1 0 0 1 1 0 0 0 0 0 0 0 1  7
  30 0 1 1 0 1 0 0 0 0 0 0 1 0 11
end

reshape long variable, i(ID) string
rename variable frequency
destring _j, ignore(_) gen(variable)

* to use -tabplot- you must install it first
* ssc inst tabplot

tabplot clus variable [fw=frequency], bfcolor(none) showval(mlabsize(*0.8) mlabcolor(black))

Click image for larger version

Name: clusterbarplot.png
Views: 1
Size: 27.6 KB
ID: 1333118

Here's extra code for a seriation. Experts on correspondence analysis might well quibble here, but I think the main idea is sound. search labmask to find a download location.

Code:

ca variable clus [fw=freq]
predict rowscore, row(1)
predict colscore, col(1)
egen new_variable = group(rowscore variable)
label var new_variable "variable"
labmask new_variable, values(variable)
egen new_cluster = group(colscore clus)
label var new_cluster "clus_11_ward_gower"
labmask new_cluster, values(clus)
tabplot new_clus new_variable [fw=frequency], bfcolor(none) showval(mlabsize(*0.8) mlabcolor(black))

Click image for larger version

Name: clusterbarplot2.png
Views: 1
Size: 27.7 KB
ID: 1333119

Last edited by Nick Cox; 29 Mar 2016, 12:10.

Announcement