Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • bar charts with percentages and standard errors

    hello, i have mental health survey data and i would like to graph prevalence (as percent of persons affected) of ptsd and psychosis by agecategory (1 to 6). i would also like error bars or 95% CIs.
    i have been looking at the bar graph command which returns bars that are 100% i presume these are the non missing which is not what i want.
    can someone assist?
    thanks
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(PTSD Psychosis agecat)
    0 0 5
    0 0 2
    0 0 3
    0 0 2
    1 0 2
    0 0 5
    0 0 5
    0 0 2
    0 0 3
    0 0 4
    0 0 1
    0 0 1
    0 0 4
    1 0 1
    1 0 2
    0 0 3
    0 0 5
    1 0 1
    0 0 6
    0 0 6
    0 0 4
    0 0 3
    0 0 3
    0 0 2
    0 0 2
    0 0 6
    0 0 5
    1 0 5
    1 0 1
    0 0 5
    0 0 6
    0 0 2
    0 1 1
    0 0 2
    0 0 1
    0 0 2
    0 0 2
    0 0 3
    0 0 1
    0 0 6
    0 0 2
    0 0 1
    0 0 6
    0 0 2
    0 0 4
    0 0 2
    0 0 1
    0 0 6
    0 0 1
    0 0 3
    0 0 4
    0 0 2
    0 0 1
    0 0 2
    0 0 1
    0 0 4
    0 0 2
    0 0 1
    0 0 3
    0 0 4
    0 0 1
    0 0 1
    0 0 5
    0 0 1
    1 0 5
    0 0 6
    0 0 6
    0 0 6
    0 0 4
    0 0 1
    0 0 2
    0 0 2
    0 0 4
    0 0 1
    0 0 1
    0 0 3
    0 0 2
    1 0 3
    0 0 4
    0 0 5
    0 0 2
    0 0 3
    0 0 3
    0 0 4
    0 0 1
    1 0 2
    0 0 2
    1 0 2
    0 0 4
    0 0 4
    0 0 1
    0 0 1
    0 0 5
    0 0 3
    0 0 2
    0 0 3
    0 0 2
    0 0 1
    0 0 1
    1 0 5
    end

  • #2
    I can't comment easily on the code you used, as you didn't show it. But graph bar is no use for showing confidence intervals too, if that is what you tried.

    There are many alternatives using twoway, which have been discussed in many threads here, but for flexibility you need to calculate the means and confidence intervals before you try to plot them.

    I use here cisets from SSC for the first calculation, as discussed at https://www.statalist.org/forums/for...-interval-sets. Once you have results, graphics is relatively easy.

    But, but, but: confidence interval calculation is far from obvious with data like yours. I use the jeffreys option, which is one of several good choices.

    There are many other things to change. For serious work, showing subset sizes would be essential. How to do that is discussed in the thread just linked.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(PTSD Psychosis agecat)
    0 0 5
    0 0 2
    0 0 3
    0 0 2
    1 0 2
    0 0 5
    0 0 5
    0 0 2
    0 0 3
    0 0 4
    0 0 1
    0 0 1
    0 0 4
    1 0 1
    1 0 2
    0 0 3
    0 0 5
    1 0 1
    0 0 6
    0 0 6
    0 0 4
    0 0 3
    0 0 3
    0 0 2
    0 0 2
    0 0 6
    0 0 5
    1 0 5
    1 0 1
    0 0 5
    0 0 6
    0 0 2
    0 1 1
    0 0 2
    0 0 1
    0 0 2
    0 0 2
    0 0 3
    0 0 1
    0 0 6
    0 0 2
    0 0 1
    0 0 6
    0 0 2
    0 0 4
    0 0 2
    0 0 1
    0 0 6
    0 0 1
    0 0 3
    0 0 4
    0 0 2
    0 0 1
    0 0 2
    0 0 1
    0 0 4
    0 0 2
    0 0 1
    0 0 3
    0 0 4
    0 0 1
    0 0 1
    0 0 5
    0 0 1
    1 0 5
    0 0 6
    0 0 6
    0 0 6
    0 0 4
    0 0 1
    0 0 2
    0 0 2
    0 0 4
    0 0 1
    0 0 1
    0 0 3
    0 0 2
    1 0 3
    0 0 4
    0 0 5
    0 0 2
    0 0 3
    0 0 3
    0 0 4
    0 0 1
    1 0 2
    0 0 2
    1 0 2
    0 0 4
    0 0 4
    0 0 1
    0 0 1
    0 0 5
    0 0 3
    0 0 2
    0 0 3
    0 0 2
    0 0 1
    0 0 1
    1 0 5
    end
    
    cisets proportions PTSD, over(agecat) jeffreys saving(PTSD)
    
    cisets proportions Psychosis, over(agecat) jeffreys saving(Psychosis) 
    
    use Psychosis, clear 
    
    append using PTSD 
    
    scatter point origgvar, mc(stc1) xtitle(Age category) ///
    || rspike ub lb origgvar, lc(stc1) by(varname, legend(off) ///
    note("means and 95% confidence intervals" "Jeffreys procedure")) ///
    xla(1/6) yla(0 .1 "10" .2 "20" .3 "30" .4 "40" .5 "50" .6 "60") ytitle(% prevalence)
    Click image for larger version

Name:	PTSD.png
Views:	1
Size:	32.6 KB
ID:	1778713

    Comment


    • #3
      thank you very much. if i just want a simple bar graph of prevalence (in percent) by age category what would the syntax be?

      Comment


      • #4
        If I understand you correctly, you're asking for a bar plus error bar concoction, often known as a dynamite, detonator or plunger plot, and now widely deprecated, as witness for example.

        https://onlinelibrary.wiley.com/doi/10.1111/aab.12734

        https://simplystatistics.org/posts/2...lots-must-die/

        If you want that, you can get it using twoway bar for the bars, but I recommend against.

        Comment


        • #5
          ok thank you for the references i will refrain from using dynamite plots

          Comment


          • #6
            It's often worth working harder to show sample sizes explicitly. Same data example, revised code.

            Code:
            isets proportions PTSD, over(agecat) jeffreys saving(PTSD, replace)
            
            cisets proportions Psychosis, over(agecat) jeffreys saving(Psychosis, replace) 
            
            use Psychosis, clear 
            
            append using PTSD 
            
            su ub, meanonly 
            gen where = -r(max)/20 
            gen toshow = "{it: n = }" + strofreal(n)
            
            scatter point origgvar, mc(stc1) xtitle(Age category) ///
            || scatter where origgvar, ms(none) mla(toshow) mlabc(black) mlabpos(0) mlabsize(medium) ///
            || rspike ub lb origgvar, lc(stc1) by(varname, legend(off) ///
            note("means and 95% confidence intervals" "Jeffreys procedure")) ///
            xla(1/6) yli(0, lc(gs8) lp(solid)) yla(0 .1 "10" .2 "20" .3 "30" .4 "40" .5 "50" .6 "60") xsc(r(0.5 6.5)) ytitle(% prevalence)
            Click image for larger version

Name:	ptsd2.png
Views:	1
Size:	89.7 KB
ID:	1778744

            Comment


            • #7
              An extra specific point with these data is that some prevalence values are zeros. Presumably the data example of 100 is just an example but even in a full and larger dataset some values seem on this evidence likely to be small.

              The graphical principle is very simple. Bars of zero height are difficult to spot and bars with very small heights are not much better. That all adds weight to the idea that point estimates are better shown by prominent point or marker symbols.
              Last edited by Nick Cox; Today, 04:00.

              Comment

              Working...
              X