Bar Chart comparing sample and subgroup

Tim Krause

Join Date: Jan 2022

Posts: 20
#1

Bar Chart comparing sample and subgroup

26 Oct 2022, 01:11

Dear Statalisters,

i think this should be easy, but i can´t find a way to do, so i hope you can help me.....

i would like to create a Barchart (showing percentages) of a categorial Var (cat_var1) with four categories. Than i would like to compare the whole sample and a subgroup (matched dataset(merged)).

Let´s say the graph should look like this:

Bar1: category 1 of cat_var1 (subgroup)
Bar2: category 1 of cat_var1 (whole sample)
Bar3: category 2 of cat_var1 (subgroup)
Bar4: category 2 of cat_var1 (whole sample)
Bar5: category 3 of cat_var1 (subgroup)
Bar6: category 3 of cat_var1 (whole sample)
Bar7: category 4 of cat_var1 (subgroup)
Bar8: category 4 of cat_var1 (whole sample)

thanks and best regards!!
Tim K.
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35758

26 Oct 2022, 01:44

You don't give a data example, contrary to FAQ Advice #12. See https://www.statalist.org/forums/help#stata

This works for the auto data. noting that it is less likely with any 8 bars (in your case) that the bar labels on graph bar would be easy to read. The recipe should be similar for your case.

If you only want to do this once, compiling a little dataset with two variables (subgroup versus total AND frequencies) may be as simple as is needed.

Code:

set scheme s1color 

sysuse auto, clear

contract foreign rep78 

egen _total = total(_freq), by(rep78)

list, sepby(foreign)

gen frequency = cond(foreign == 1, _freq, cond(foreign == 0, _total, .))

label def origin 0 Total, modify 

graph hbar (asis) frequency, over(foreign, descending) over(rep78) ysc(alt) ytitle(Frequency) name(G1, replace)

separate frequency, by(foreign) veryshortlabel 

list, sepby(foreign) 

graph hbar (asis) frequency?, nofill over(foreign, descending) over(rep78) ysc(alt) ytitle(Frequency) name(G2, replace) legend(off)

Click image for larger version

Name: krause_G1.png
Views: 1
Size: 24.1 KB
ID: 1686809

Click image for larger version

Name: krause_G2.png
Views: 1
Size: 22.4 KB
ID: 1686810

Comment

Tim Krause

Join Date: Jan 2022

Posts: 20
#3

26 Oct 2022, 02:19

Thanks a lot Nick!!! This works fine, but is there no way to do this with percentages?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35758
#4

26 Oct 2022, 02:41

You did say percentages in #1 but that didn't register, so my fault, yet at the same time I have difficulty seeing that as a good idea. Are you asking for subgroup bars that are less than or equal to 100%, paired with total bars that are all 100%???

That can be done: just calculate the percents from the frequencies, but I am reluctant to post code for a poor design.
Comment
Tim Krause

Join Date: Jan 2022

Posts: 20
#5

26 Oct 2022, 02:53

sorry, i think i did not explain it well.

Here is my cat_var (whole dataset)
Cat_var
-----------------------------------------------------------
| Freq. Percent Valid Cum.
--------------+--------------------------------------------
Valid 79 | 2412 23.36 23.36 23.36
88 | 3702 35.86 35.86 59.22
91 | 3008 29.14 29.14 88.36
96 | 1202 11.64 11.64 100.00
Total | 10324 100.00 100.00
-----------------------------------------------------------
And here´s my cat_var for subgroup:

cat_var if subgroup== 1
-----------------------------------------------------------
| Freq. Percent Valid Cum.
--------------+--------------------------------------------
Valid 79 | 336 10.55 10.55 10.55
88 | 2015 63.25 63.25 73.79
91 | 730 22.91 22.91 96.70
96 | 105 3.30 3.30 100.00
Total | 3186 100.00 100.00
-----------------------------------------------------------

So i want to compare it like this:
Cat_var
-----------------------------------------------------------
| Freq. Percent Valid Cum.
--------------+--------------------------------------------
Valid 79 (whole dataset) bar1 | 2412 23.36 23.36 23.36
79 (subgroup==1) bar2 | 336 10.55 10.55 10.55
and so on....

what i want to say is like: In total sample category 79 is 23.36 percent, in subgroup only 10.55 percent and so on...
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35758

26 Oct 2022, 03:22

Thanks for the extra detail. I think I understand better. Still no data example, but consider this recipe:

Code:

set scheme s1color 

sysuse auto, clear

contract foreign rep78, zero 

egen _pc = total(_freq), by(rep78)
su _freq, meanonly 
replace _pc = 100 * _pc / r(sum)

egen _subpc = total(_freq * (foreign == 1)), by(rep78)
su _freq if foreign == 1, meanonly 
replace _subpc = 100 * _subpc / r(sum)

list, sepby(foreign)

gen percent = cond(foreign == 1, _subpc, cond(foreign == 0, _pc, .))

label def origin 0 Total, modify 

graph hbar (asis) percent, over(foreign, descending) over(rep78) ysc(alt) ytitle(Percent) name(G3, replace)

separate percent, by(foreign) veryshortlabel 

graph hbar (asis) percent?, nofill over(foreign, descending) over(rep78) ysc(alt) ytitle(Percent) name(G4, replace) legend(off)

Comment

Tim Krause

Join Date: Jan 2022

Posts: 20
#7

26 Oct 2022, 04:07

Thanks so much Nick! It works perfectly. The only Problem i can´t find a solution for is how to remove the value labels (0,1)...
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35758
#8

26 Oct 2022, 06:42

Why would you want to do that? I think a data example is now utterly essential for me to suggest any further code.
Comment
Tim Krause

Join Date: Jan 2022

Posts: 20
#9

26 Oct 2022, 07:41

here´s my graph. I just want to remove the 0 and 1 at the left side, because you already see it in the legend. I think its more esthetic.

Code:
graph hbar (asis) percent?, nofill over(merge, descending) over(LB19_Fördersatz) ///
ysc(alt) ytitle(Anteil Einrichtungen) name(G4, replace) ///
blabel(bar, pos(outside) size(2.5) color(black) format(%2.0f)) ytitle("") ylabel( 0 "0%" 20 "20%" 40 "40%" 60 "60%" 80 "80%") ///
title("") legend(pos(bottom) cols(5)) xsize(7)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35758
#10

26 Oct 2022, 07:58

I think that is a terrible idea. You should improve the value labels and lose the legend instead. Sorry, but my personal rule is that I won't suggest code for something that i think is a terrible idea.

(Still no data example, as requested and explained in #2: that would be for anyone else who has a different view on this. Anyone answering doesn't need the full dataset, just the table counts.)

Last edited by Nick Cox; 26 Oct 2022, 08:01.
Comment
Tim Krause

Join Date: Jan 2022

Posts: 20
#11

27 Oct 2022, 00:49

Hey Nick, it´s me again. Thanks a lot for your help! Table with counts should be in #5?

Maybe it will help if i post the graph like it´s made by code and the one i made with graph editor (this one i would like to make by code)

Here the one with code (like you said without legend):

but i think this one (with legend) is better
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35758
#12

27 Oct 2022, 01:24

Sorry to disappoint, but I haven't changed my mind. The use of a legend obliges the reader to memorise some arbitrary colour distinction, or else to keep referring to it. To paraphrase Penny in The Big Bang Theory, I speak on behalf of all readers -- we had a meeting -- please don't do that. Use direct labels.

Anyone who disagrees is welcome to provide code for you.
Comment

Announcement