bar charts with percentages and standard errors

mathieu nacher

Join Date: Jan 2019
Posts: 41

bar charts with percentages and standard errors

10 Jun 2025, 11:28

hello, i have mental health survey data and i would like to graph prevalence (as percent of persons affected) of ptsd and psychosis by agecategory (1 to 6). i would also like error bars or 95% CIs.
i have been looking at the bar graph command which returns bars that are 100% i presume these are the non missing which is not what i want.
can someone assist?
thanks

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(PTSD Psychosis agecat)
0 0 5
0 0 2
0 0 3
0 0 2
1 0 2
0 0 5
0 0 5
0 0 2
0 0 3
0 0 4
0 0 1
0 0 1
0 0 4
1 0 1
1 0 2
0 0 3
0 0 5
1 0 1
0 0 6
0 0 6
0 0 4
0 0 3
0 0 3
0 0 2
0 0 2
0 0 6
0 0 5
1 0 5
1 0 1
0 0 5
0 0 6
0 0 2
0 1 1
0 0 2
0 0 1
0 0 2
0 0 2
0 0 3
0 0 1
0 0 6
0 0 2
0 0 1
0 0 6
0 0 2
0 0 4
0 0 2
0 0 1
0 0 6
0 0 1
0 0 3
0 0 4
0 0 2
0 0 1
0 0 2
0 0 1
0 0 4
0 0 2
0 0 1
0 0 3
0 0 4
0 0 1
0 0 1
0 0 5
0 0 1
1 0 5
0 0 6
0 0 6
0 0 6
0 0 4
0 0 1
0 0 2
0 0 2
0 0 4
0 0 1
0 0 1
0 0 3
0 0 2
1 0 3
0 0 4
0 0 5
0 0 2
0 0 3
0 0 3
0 0 4
0 0 1
1 0 2
0 0 2
1 0 2
0 0 4
0 0 4
0 0 1
0 0 1
0 0 5
0 0 3
0 0 2
0 0 3
0 0 2
0 0 1
0 0 1
1 0 5
end

Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35637

10 Jun 2025, 12:00

I can't comment easily on the code you used, as you didn't show it. But graph bar is no use for showing confidence intervals too, if that is what you tried.

There are many alternatives using twoway, which have been discussed in many threads here, but for flexibility you need to calculate the means and confidence intervals before you try to plot them.

I use here cisets from SSC for the first calculation, as discussed at https://www.statalist.org/forums/for...-interval-sets. Once you have results, graphics is relatively easy.

But, but, but: confidence interval calculation is far from obvious with data like yours. I use the jeffreys option, which is one of several good choices.

There are many other things to change. For serious work, showing subset sizes would be essential. How to do that is discussed in the thread just linked.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(PTSD Psychosis agecat)
0 0 5
0 0 2
0 0 3
0 0 2
1 0 2
0 0 5
0 0 5
0 0 2
0 0 3
0 0 4
0 0 1
0 0 1
0 0 4
1 0 1
1 0 2
0 0 3
0 0 5
1 0 1
0 0 6
0 0 6
0 0 4
0 0 3
0 0 3
0 0 2
0 0 2
0 0 6
0 0 5
1 0 5
1 0 1
0 0 5
0 0 6
0 0 2
0 1 1
0 0 2
0 0 1
0 0 2
0 0 2
0 0 3
0 0 1
0 0 6
0 0 2
0 0 1
0 0 6
0 0 2
0 0 4
0 0 2
0 0 1
0 0 6
0 0 1
0 0 3
0 0 4
0 0 2
0 0 1
0 0 2
0 0 1
0 0 4
0 0 2
0 0 1
0 0 3
0 0 4
0 0 1
0 0 1
0 0 5
0 0 1
1 0 5
0 0 6
0 0 6
0 0 6
0 0 4
0 0 1
0 0 2
0 0 2
0 0 4
0 0 1
0 0 1
0 0 3
0 0 2
1 0 3
0 0 4
0 0 5
0 0 2
0 0 3
0 0 3
0 0 4
0 0 1
1 0 2
0 0 2
1 0 2
0 0 4
0 0 4
0 0 1
0 0 1
0 0 5
0 0 3
0 0 2
0 0 3
0 0 2
0 0 1
0 0 1
1 0 5
end

cisets proportions PTSD, over(agecat) jeffreys saving(PTSD)

cisets proportions Psychosis, over(agecat) jeffreys saving(Psychosis) 

use Psychosis, clear 

append using PTSD 

scatter point origgvar, mc(stc1) xtitle(Age category) ///
|| rspike ub lb origgvar, lc(stc1) by(varname, legend(off) ///
note("means and 95% confidence intervals" "Jeffreys procedure")) ///
xla(1/6) yla(0 .1 "10" .2 "20" .3 "30" .4 "40" .5 "50" .6 "60") ytitle(% prevalence)

Click image for larger version

Name: PTSD.png
Views: 1
Size: 32.6 KB
ID: 1778713

Comment

mathieu nacher

Join Date: Jan 2019

Posts: 41
#3

10 Jun 2025, 13:12

thank you very much. if i just want a simple bar graph of prevalence (in percent) by age category what would the syntax be?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35637
#4

10 Jun 2025, 15:22

If I understand you correctly, you're asking for a bar plus error bar concoction, often known as a dynamite, detonator or plunger plot, and now widely deprecated, as witness for example.

https://onlinelibrary.wiley.com/doi/10.1111/aab.12734

https://simplystatistics.org/posts/2...lots-must-die/

If you want that, you can get it using twoway bar for the bars, but I recommend against.
Comment
mathieu nacher

Join Date: Jan 2019

Posts: 41
#5

10 Jun 2025, 21:15

ok thank you for the references i will refrain from using dynamite plots
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35637

Yesterday, 09:57

It's often worth working harder to show sample sizes explicitly. Same data example, revised code.

Code:

isets proportions PTSD, over(agecat) jeffreys saving(PTSD, replace)

cisets proportions Psychosis, over(agecat) jeffreys saving(Psychosis, replace) 

use Psychosis, clear 

append using PTSD 

su ub, meanonly 
gen where = -r(max)/20 
gen toshow = "{it: n = }" + strofreal(n)

scatter point origgvar, mc(stc1) xtitle(Age category) ///
|| scatter where origgvar, ms(none) mla(toshow) mlabc(black) mlabpos(0) mlabsize(medium) ///
|| rspike ub lb origgvar, lc(stc1) by(varname, legend(off) ///
note("means and 95% confidence intervals" "Jeffreys procedure")) ///
xla(1/6) yli(0, lc(gs8) lp(solid)) yla(0 .1 "10" .2 "20" .3 "30" .4 "40" .5 "50" .6 "60") xsc(r(0.5 6.5)) ytitle(% prevalence)

Click image for larger version

Name: ptsd2.png
Views: 1
Size: 89.7 KB
ID: 1778744

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35637
#7

Today, 03:04

An extra specific point with these data is that some prevalence values are zeros. Presumably the data example of 100 is just an example but even in a full and larger dataset some values seem on this evidence likely to be small.

The graphical principle is very simple. Bars of zero height are difficult to spot and bars with very small heights are not much better. That all adds weight to the idea that point estimates are better shown by prominent point or marker symbols.

Last edited by Nick Cox; Today, 04:00.
Comment

Announcement

bar charts with percentages and standard errors

Comment

Comment

Comment

Comment

Comment

Comment