Display values (with error bars) for several percentiles in a bar graph

Chris Boulis

Join Date: Feb 2019
Posts: 368

Display values (with error bars) for several percentiles in a bar graph

04 Aug 2021, 04:27

I would like to create a bar graph (with error bars) that dísplays the values (finr) for two categories of a multiple categorical variable (vtype) on the y-axis with separate values based on the 0/1 values of dummy (at3) for each of these percentiles (5, 10, 25, 50, 75, 90, 95) displayed along the x-axis. I can get close to what I want with:

Code:

graph bar (p5) finr (p10) finr (p25) finr (p50) finr (p75) finr (p90) finr (p95) finr ///
if inlist(vtype, 2, 3), over(at3) over(vtype) yti("") yla(0(500000)1500000) ti(Financial assets, size(small)) ///
legend(lab(1 "5th") lab(2 "10th") lab(3 "25th") lab(4 "50th") lab(5 "75th") lab(6 "90th") lab(7 "95th") col(7))

though I still need to add the error bars. I also tried to replace the (busy) legend with x-axis labels but Stata responded with "xlabels(1 5th" 2 "10th" 3 "25th" 4 "50th" 5 "75th" 6 "90th" 7 "95th") not allowed, " invalid name (r(198));". I came across -cibar- (ssc install cibar), and while it provides confidence intervals, I could not find a way to add percentiles to the x-axis.

Finally, I 'found' -statsby- mentioned by Nick Cox in https://www.statalist.org/forums/for...way-bar-graphs and explained in https://www.stata-journal.com/sjpdf....iclenum=gr0045 - please refer to my initial explanation and code using -graph bar- to understand my intent, the following is an initial attempt

Code:

statsby p5=r(p5) p10=r(p10) p25=r(p25) p50=r(p50) p75=r(p75) p90=r(p90) p95=r(p95) upper=r(ub) mean=r(mean) lower=r(lb) if inlist(vtype, 2, 3), ///
by(vtype at3) saving(finr_ci, replace) 
twoway rcap ub lb finr || scatter mean finr, yti() xti("percentiles") xla(1 "5th" 2 "10th" 3 "25th" 4 "50th" 5 "75th" 6 "90th" 7 "95th") legend(off) > subtitle(95% confidence intervals for mean, place(w))

Sample data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float finr byte(at3 vtype)
117951.12 0 2
 235799.2 1 3
 208530.8 1 3
305687.22 0 3
253412.14 0 3
 413083.6 0 3
280980.72 0 3
28540.285 0 3
  323.646 0 3
223375.38 1 3
147438.67 1 3
 379603.6 0 3
 715984.1 0 3
 515115.4 0 3
581753.56 0 3
 458876.3 0 3
   534214 1 3
 367631.5 1 3
 490251.7 1 3
111402.04 1 3
118890.35 1 3
 95747.05 1 3
 611670.6 1 3
   772532 1 2
1085729.4 1 2
2158285.5 1 2
 82337.68 0 2
105099.09 0 2
 118543.2 0 2
239144.95 1 2
   253273 1 2
 208199.4 1 2
 125484.6 1 2
 365277.2 1 2
 64402.76 1 2
 11397.52 1 2
 204755.6 1 2
 194343.6 1 2
   179749 1 3
end

Tags: None

ericmelse

Join Date: May 2014
Posts: 434

04 Aug 2021, 08:31

Dear Chris,

Possibly useful for your purpose is the community contributed package pshare of Ben Jann.
You can read about it in his The Stata Journal paper that can be downloaded from here.
Here is my rather crude code written to provide you with an example:

Code:

* Preliminaries
ssc install pshare , replace
which pshare
*! version 1.2.8  14jun2018  Ben Jann
h pshare

* Continue running the code below after data input as in #1
tokenize vtype_2_at3_1 vtype_2_at3_2 vtype_3_at3_1 vtype_3_at3_2
forvalue i = 2/3 {
    forvalue j = 1/2 {
    pshare estimate finr if vtype==`i' & at3==`j' , percentiles(5 10 25 50 75 90) density vce(bootstrap)
    pshare histogram, yline(1) aspect(1) ysize(5) xsize(5) graphr(margin(t-1 b-3 l-3 r-2)) ylab(, angle(none)) xtitle(population percentage (`1') , margin(t+1 l-3 r+3)) ytitle(, margin(r+1))legend(off) name(perc`i'`j')
    macro shift    
    }
}
graph combine perc21 perc22 perc31 perc32 , ysize(10) xsize(10) colf rows(2) imargin(t-1 b-1 l-4 r-2) iscale(*.8)

This results into this figure:

Click image for larger version

Name: several_percentiles_in_a_bar_graph.png
Views: 1
Size: 89.7 KB
ID: 1621959

More refinements are possible but that is for you to explore further, should this be what you are looking for.

http://publicationslist.org/eric.melse

Comment

Chris Boulis

Join Date: Feb 2019

Posts: 368
#3

04 Aug 2021, 17:58

Thank you ericmelse. Is there a way to have the bars for vtype2 and vtype 3 in one graph when at3==1 (and when at3==2) as they would then have the same scales to allow for an easier comparison between the two groups?
Comment

ericmelse

Join Date: May 2014
Posts: 434

04 Aug 2021, 22:06

Dear Chris,

I have no quick solution to combine the bars into a single plot but it should be possible to do it with more coding.
Another solution is to code for the control of the yaxis scale, ticks and labels, like:

Code:

tokenize vtype_2_at3_1 vtype_2_at3_2 vtype_3_at3_1 vtype_3_at3_2
forvalue i = 2/3 {
    if `i'==2 {
    local setYaxis "yscale(range(-.2 6)) ylab(0(1)6, angle(horizontal)) ymtick(.1(.1)5.9) "
    }
    else {
    local setYaxis "yscale(range(-.23 3)) ylab(0(1)3, angle(0)) ymtick(.1(.1)2.9) "
}
    forvalue j = 1/2 {
    pshare estimate finr if vtype==`i' & at3==`j' , percentiles(5 10 25 50 75 90) density vce(bootstrap)
    pshare histogram, yline(1) aspect(1) ysize(5) xsize(5) graphr(margin(t-1 b-3 l-3 r-2)) `setYaxis' xtitle(population percentage (`1') , margin(t+1 l-3 r+3)) ytitle(, margin(r+1))legend(off) name(perc`i'`j')
    macro shift    
    }
}
graph combine perc21 perc22 perc31 perc32 , ysize(10) xsize(10) colf rows(2) imargin(t-1 b-1 l-4 r-2) iscale(*.8)

This produces:
:

Click image for larger version

Name: several_percentiles_in_a_bar_graph_scaled.png
Views: 1
Size: 91.2 KB
ID: 1622043

http://publicationslist.org/eric.melse

Comment

Chris Boulis

Join Date: Feb 2019

Posts: 368
#5

05 Aug 2021, 05:15

Hi ericmelse. Thank you for your suggestions. I appreciate your time responding. I hope to achieve an outcome similar to that using -graph bar- in #1.
Comment

Chris Boulis

Join Date: Feb 2019
Posts: 368

05 Aug 2021, 06:01

Note: I have updated my code in #1 using -statsby- to address a couple of silly errors:

Code:

statsby p5=r(p5) p10=r(p10) p25=r(p25) p50=r(p50) p75=r(p75) p90=r(p90) p95=r(p95) ub=r(ub) mean=r(mean) lb=r(lb) if inlist(vtype, 2, 3), by(at3 vtype) subsets total: summarize finr , detail 
twoway rcap ub lb finr if inlist(vtype, 2, 3) || bar mean finr if inlist(vtype, 2, 3) yti(mean) xti("percentiles") ///
xla(1 "5th" 2 "10th" 3 "25th" 4 "50th" 5 "75th" 6 "90th" 7 "95th") legend(off) ///
subtitle(95% confidence intervals for mean, place(w)) saving(finr_ci, replace)

However, after running the first full line, Stata responds with:

Code:

. statsby p5=r(p5) p10=r(p10) p25=r(p25) p50=r(p50) p75=r(p75) p90=r(p90) p95=r(p95) ub=r(ub) mean=r(mean) lb=r(lb), by(at3 vtype) subsets total: summarize finr , detail 
no; data in memory would be lost
r(4)

Any suggestions on how this can be amended to achieve the code (using -graph bar-) in #1? Nick Cox if you have time, could you please share some of your wisdom?

Stata v.15.1. Using panel data.

Comment

Chris Boulis

Join Date: Feb 2019
Posts: 368

05 Aug 2021, 20:23

Update: It appears I was partially successful in running the first part of the code using the sample data set provided in #1 - Stata output:

Code:

. statsby p5=r(p5) p10=r(p10) p25=r(p25) p50=r(p50) p75=r(p75) p90=r(p90) p95=r(p95) ub=r(ub) m
> ean=r(mean) lb=r(lb) ///
> if inlist(vtype, 2, 3) , by(at3 vtype) clear: ci mean finr
(running ci on estimation sample)

      command:  ci mean finr if inlist(vtype, 2, 3)
           p5:  r(p5)
          p10:  r(p10)
          p25:  r(p25)
          p50:  r(p50)
          p75:  r(p75)
          p90:  r(p90)
          p95:  r(p95)
           ub:  r(ub)
         mean:  r(mean)
           lb:  r(lb)
           by:  at3 vtype

Statsby groups
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
....

end of do-file

However, while I have values for the ub, lb and mean there are no values for the percentiles

Code:

. list
     +---------------------------------------------------------------------------------------+
     | at3   vtype   p5   p10   p25   p50   p75   p90   p95         ub       mean         lb |
     |---------------------------------------------------------------------------------------|
  1. |   0       2    .     .     .     .     .     .     .   132937.9   105982.8   79027.67 |
  2. |   0       3    .     .     .     .     .     .     .   503634.8   357578.3   211521.6 |
  3. |   1       2    .     .     .     .     .     .     .   863378.8   473568.8    83758.8 |
  4. |   1       3    .     .     .     .     .     .     .   390818.8   277058.3   163297.9 |
     +---------------------------------------------------------------------------------------+

When I run the first piece of code to graph using twoway:

Code:

twoway rcap ub lb finr if inlist(vtype, 2, 3), by(at3 vtype)

Stata output displays "variable finr not found", although I included data for finr in the sample data set in #1. It seems that -graph bar- is my friend as i am unable to get -statsby- to do what I need it to, at least not without more time (which I am in loan of).

Announcement

Display values (with error bars) for several percentiles in a bar graph

Comment

Comment

Comment

Comment

Comment

Comment