Grouped Percentage Bar Graph

Iris Zhao

Join Date: Sep 2020

Posts: 5
#1

Grouped Percentage Bar Graph

05 Sep 2020, 01:43

Hi Everyone,

I am trying to make a stacked bar graph but having some trouble. I have three groups of observations: "random women", "random men", and "male partners" and several conditions (I kept c1, c2, and c3 in the sample data), each observation answer "more" "fewer", or "fixed" for each condition.

The main problem is to get the conditions into the mix. I don't have a "condition" variable, so I don't know really how to put this into the graph hbar command.

The ideal is to have a grouped percentage bar graph similar to this, with bars grouped by three rather than two, and conditions on the left being the c1, c2, c3, etc.:

I am using Stata MP 16.0.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte(c1 c2 c3 group) 2 3 3 2 3 2 3 2 3 3 3 2 2 2 3 2 3 3 3 2 3 3 3 0 1 3 3 0 2 2 3 0 3 3 3 0 3 3 3 0 3 3 3 0 2 3 3 0 2 2 2 0 2 2 3 0 3 3 3 0 2 3 3 0 2 3 3 0 3 3 1 0 3 3 3 0 2 3 3 1 2 3 3 1 2 3 3 1 2 2 2 1 3 3 3 1 2 2 3 1 2 3 3 1 3 3 3 1 3 3 3 1 3 3 3 1 2 3 3 1 end label values c1 number label values c2 number label values c3 number label def number 1 "More", modify label def number 2 "Fewer", modify label def number 3 "Fixed", modify label values group groupl label def groupl 0 "Random Women", modify label def groupl 1 "Random Men", modify label def groupl 2 "Male Partner", modify

Thank you in advance!

Last edited by Iris Zhao; 05 Sep 2020, 01:49.
Tags: graph
Nick Cox

Join Date: Mar 2014

Posts: 35726
#2

05 Sep 2020, 03:33

I don't think I understand the mapping needed for your data example but your image suggests a need for a graph with

* three possible outcomes

* one predictor has two categories

* another predictor has several categories.

The problems with your desired design are manifold

* Rare categories mean short bars, fair enough, but it is then hard to show percents readably

* Categories that don't occur have to be inferred from the absence of any bar at all: sometimes you won't care about that, and sometimes you woill

* The fact that percents stack add to 100 is banal as a consequence of their definition; it's not informative about the data. I often suggest plotting bars separately as discussed at more length in https://www.statalist.org/forums/for...updated-on-ssc and https://www.stata-journal.com/articl...article=gr0066 Then it's easier to compare within categories of the outcome.

I created a loosely similar problem using one of Stata's standard datasets.

Warnings:

* You need to install community-contributed commands from SSC for any of this to work. They all are based on official Stata code so in principle there is a way to draw each graph directly. If any reader has installed any of those, you should not need to re-install.

* Although I show final code, your own data will likely oblige different tweaking of the code to create extra space at the edges of the graph and to move the text labels in tabplot .

Code:

webuse nlswork, clear keep if ind_code <= 7 set scheme s1color ssc install mycolours ssc install catplot ssc install tabplot mycolours catplot race msp ind_code, percent(ind_code msp) horizontal asyvars stack ysize(9) bar(1, lcolor("`ora'"*2) fcolor("`ora'"*0.3)) bar(3, lcolor("`sky'"*2) fcolor("`sky'"*0.3)) bar(2, lcolor("`bla'") fcolor("`bla'"*0.3)) blabel(bar, format(%2.1f) pos(base)) ysc(r(0, 103)) legend(row(1)) ysc(alt) l1title(industrial code and msp) name(G1, replace) tabplot msp race , by(ind_code, compact col(1) note("")) percent(ind_code msp) horizontal ysize(9) separate(race) bar1(lcolor("`ora'"*2) fcolor("`ora'"*0.3)) bar3(lcolor("`sky'"*2) fcolor("`sky'"*0.3)) bar2(lcolor("`bla'") fcolor("`bla'"*0.3)) showval(offset(0.15) mlabsize(medsmall)) subtitle(, pos(9) fcolor(none) nobox nobexpand) ytitle(industrial code and msp) xsc(r(0.8, 3.2)) name(G2, replace)

With a stacked design, there is scope for flexibility about where the legend goes and where the axis labels go (and even for deciding that you don't need axis labels if you show percents). But as said there is some ugliness in showing percents for rare categories, even with one decimal place, not two.

I much prefer this design using tablot myself. For other data, I would expect to see value labels echoed in the graph.

Last edited by Nick Cox; 05 Sep 2020, 03:38.
Comment
Iris Zhao

Join Date: Sep 2020

Posts: 5
#3

06 Sep 2020, 08:37

Thank you Nick! Sorry for the late response. I agree with you that the one using tablot shows the comparison more clearly (especially the category in the middle).

I don't think I understand the mapping needed for your data example

The problem I have here is to transform the dataset into something like this

I have tried to generate dummy variables to help this transformation but it doesn't seem to work.

this is the code I am using:

Code:

tab c1, gen(c1_) tab c2, gen(c2_) tab c3, gen(c3_) gen id=_n expand 3 sort id foreach x of varlist c1_1 c2_1 c3_1 { replace answer=1 if answer==. & `x'==1 } foreach x of varlist c1_2 c2_2 c3_2 { replace answer=1 if answer==. & `x'==1 } foreach x of varlist c1_3 c2_3 c3_3 { replace answer=1 if answer==. & `x'==1 }

I guess I need to specify the condition variable first, but I do not know how to assign a looped number for each id. I tried

Code:

egen condition = group (id)

but it only copies the id variable.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35726
#4

06 Sep 2020, 09:38

Seems to me that the order should be More Fixed Fewer or the reverse. You may find that

.

Code:

gen long id = _n . reshape long c , i(id) j(which)

gets you closer to where you want to be. If you already have an identifier, you won't need to create one.
Comment
Iris Zhao

Join Date: Sep 2020

Posts: 5
#5

06 Sep 2020, 22:08

Thank you so much Nick! Exactly what I needed.
Comment

Announcement

Grouped Percentage Bar Graph

Comment

Comment

Comment

Comment