Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Display values (with error bars) for several percentiles in a bar graph

    I would like to create a bar graph (with error bars) that dísplays the values (finr) for two categories of a multiple categorical variable (vtype) on the y-axis with separate values based on the 0/1 values of dummy (at3) for each of these percentiles (5, 10, 25, 50, 75, 90, 95) displayed along the x-axis. I can get close to what I want with:
    Code:
    graph bar (p5) finr (p10) finr (p25) finr (p50) finr (p75) finr (p90) finr (p95) finr ///
    if inlist(vtype, 2, 3), over(at3) over(vtype) yti("") yla(0(500000)1500000) ti(Financial assets, size(small)) ///
    legend(lab(1 "5th") lab(2 "10th") lab(3 "25th") lab(4 "50th") lab(5 "75th") lab(6 "90th") lab(7 "95th") col(7))
    though I still need to add the error bars. I also tried to replace the (busy) legend with x-axis labels but Stata responded with "xlabels(1 5th" 2 "10th" 3 "25th" 4 "50th" 5 "75th" 6 "90th" 7 "95th") not allowed, " invalid name (r(198));". I came across -cibar- (ssc install cibar), and while it provides confidence intervals, I could not find a way to add percentiles to the x-axis.

    Finally, I 'found' -statsby- mentioned by Nick Cox in https://www.statalist.org/forums/for...way-bar-graphs and explained in https://www.stata-journal.com/sjpdf....iclenum=gr0045 - please refer to my initial explanation and code using -graph bar- to understand my intent, the following is an initial attempt
    Code:
    statsby p5=r(p5) p10=r(p10) p25=r(p25) p50=r(p50) p75=r(p75) p90=r(p90) p95=r(p95) upper=r(ub) mean=r(mean) lower=r(lb) if inlist(vtype, 2, 3), ///
    by(vtype at3) saving(finr_ci, replace) 
    twoway rcap ub lb finr || scatter mean finr, yti() xti("percentiles") xla(1 "5th" 2 "10th" 3 "25th" 4 "50th" 5 "75th" 6 "90th" 7 "95th") legend(off) > subtitle(95% confidence intervals for mean, place(w))
    Sample data:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float finr byte(at3 vtype)
    117951.12 0 2
     235799.2 1 3
     208530.8 1 3
    305687.22 0 3
    253412.14 0 3
     413083.6 0 3
    280980.72 0 3
    28540.285 0 3
      323.646 0 3
    223375.38 1 3
    147438.67 1 3
     379603.6 0 3
     715984.1 0 3
     515115.4 0 3
    581753.56 0 3
     458876.3 0 3
       534214 1 3
     367631.5 1 3
     490251.7 1 3
    111402.04 1 3
    118890.35 1 3
     95747.05 1 3
     611670.6 1 3
       772532 1 2
    1085729.4 1 2
    2158285.5 1 2
     82337.68 0 2
    105099.09 0 2
     118543.2 0 2
    239144.95 1 2
       253273 1 2
     208199.4 1 2
     125484.6 1 2
     365277.2 1 2
     64402.76 1 2
     11397.52 1 2
     204755.6 1 2
     194343.6 1 2
       179749 1 3
    end

  • #2
    Dear Chris,

    Possibly useful for your purpose is the community contributed package pshare of Ben Jann.
    You can read about it in his The Stata Journal paper that can be downloaded from here.
    Here is my rather crude code written to provide you with an example:
    Code:
    * Preliminaries
    ssc install pshare , replace
    which pshare
    *! version 1.2.8  14jun2018  Ben Jann
    h pshare
    
    * Continue running the code below after data input as in #1
    tokenize vtype_2_at3_1 vtype_2_at3_2 vtype_3_at3_1 vtype_3_at3_2
    forvalue i = 2/3 {
        forvalue j = 1/2 {
        pshare estimate finr if vtype==`i' & at3==`j' , percentiles(5 10 25 50 75 90) density vce(bootstrap)
        pshare histogram, yline(1) aspect(1) ysize(5) xsize(5) graphr(margin(t-1 b-3 l-3 r-2)) ylab(, angle(none)) xtitle(population percentage (`1') , margin(t+1 l-3 r+3)) ytitle(, margin(r+1))legend(off) name(perc`i'`j')
        macro shift    
        }
    }
    graph combine perc21 perc22 perc31 perc32 , ysize(10) xsize(10) colf rows(2) imargin(t-1 b-1 l-4 r-2) iscale(*.8)
    This results into this figure:

    Click image for larger version

Name:	several_percentiles_in_a_bar_graph.png
Views:	1
Size:	89.7 KB
ID:	1621959


    More refinements are possible but that is for you to explore further, should this be what you are looking for.
    http://publicationslist.org/eric.melse

    Comment


    • #3
      Thank you ericmelse. Is there a way to have the bars for vtype2 and vtype 3 in one graph when at3==1 (and when at3==2) as they would then have the same scales to allow for an easier comparison between the two groups?

      Comment


      • #4
        Dear Chris,

        I have no quick solution to combine the bars into a single plot but it should be possible to do it with more coding.
        Another solution is to code for the control of the yaxis scale, ticks and labels, like:
        Code:
        tokenize vtype_2_at3_1 vtype_2_at3_2 vtype_3_at3_1 vtype_3_at3_2
        forvalue i = 2/3 {
            if `i'==2 {
            local setYaxis "yscale(range(-.2 6)) ylab(0(1)6, angle(horizontal)) ymtick(.1(.1)5.9) "
            }
            else {
            local setYaxis "yscale(range(-.23 3)) ylab(0(1)3, angle(0)) ymtick(.1(.1)2.9) "
        }
            forvalue j = 1/2 {
            pshare estimate finr if vtype==`i' & at3==`j' , percentiles(5 10 25 50 75 90) density vce(bootstrap)
            pshare histogram, yline(1) aspect(1) ysize(5) xsize(5) graphr(margin(t-1 b-3 l-3 r-2)) `setYaxis' xtitle(population percentage (`1') , margin(t+1 l-3 r+3)) ytitle(, margin(r+1))legend(off) name(perc`i'`j')
            macro shift    
            }
        }
        graph combine perc21 perc22 perc31 perc32 , ysize(10) xsize(10) colf rows(2) imargin(t-1 b-1 l-4 r-2) iscale(*.8)
        This produces:
        :
        Click image for larger version

Name:	several_percentiles_in_a_bar_graph_scaled.png
Views:	1
Size:	91.2 KB
ID:	1622043
        http://publicationslist.org/eric.melse

        Comment


        • #5
          Hi ericmelse. Thank you for your suggestions. I appreciate your time responding. I hope to achieve an outcome similar to that using -graph bar- in #1.

          Comment


          • #6
            Note: I have updated my code in #1 using -statsby- to address a couple of silly errors:
            Code:
            statsby p5=r(p5) p10=r(p10) p25=r(p25) p50=r(p50) p75=r(p75) p90=r(p90) p95=r(p95) ub=r(ub) mean=r(mean) lb=r(lb) if inlist(vtype, 2, 3), by(at3 vtype) subsets total: summarize finr , detail 
            twoway rcap ub lb finr if inlist(vtype, 2, 3) || bar mean finr if inlist(vtype, 2, 3) yti(mean) xti("percentiles") ///
            xla(1 "5th" 2 "10th" 3 "25th" 4 "50th" 5 "75th" 6 "90th" 7 "95th") legend(off) ///
            subtitle(95% confidence intervals for mean, place(w)) saving(finr_ci, replace)
            However, after running the first full line, Stata responds with:
            Code:
            . statsby p5=r(p5) p10=r(p10) p25=r(p25) p50=r(p50) p75=r(p75) p90=r(p90) p95=r(p95) ub=r(ub) mean=r(mean) lb=r(lb), by(at3 vtype) subsets total: summarize finr , detail 
            no; data in memory would be lost
            r(4)
            Any suggestions on how this can be amended to achieve the code (using -graph bar-) in #1? Nick Cox if you have time, could you please share some of your wisdom?

            Stata v.15.1. Using panel data.

            Comment


            • #7
              Update: It appears I was partially successful in running the first part of the code using the sample data set provided in #1 - Stata output:
              Code:
              . statsby p5=r(p5) p10=r(p10) p25=r(p25) p50=r(p50) p75=r(p75) p90=r(p90) p95=r(p95) ub=r(ub) m
              > ean=r(mean) lb=r(lb) ///
              > if inlist(vtype, 2, 3) , by(at3 vtype) clear: ci mean finr
              (running ci on estimation sample)
              
                    command:  ci mean finr if inlist(vtype, 2, 3)
                         p5:  r(p5)
                        p10:  r(p10)
                        p25:  r(p25)
                        p50:  r(p50)
                        p75:  r(p75)
                        p90:  r(p90)
                        p95:  r(p95)
                         ub:  r(ub)
                       mean:  r(mean)
                         lb:  r(lb)
                         by:  at3 vtype
              
              Statsby groups
              ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
              ....
              
              end of do-file
              However, while I have values for the ub, lb and mean there are no values for the percentiles
              Code:
              . list
                   +---------------------------------------------------------------------------------------+
                   | at3   vtype   p5   p10   p25   p50   p75   p90   p95         ub       mean         lb |
                   |---------------------------------------------------------------------------------------|
                1. |   0       2    .     .     .     .     .     .     .   132937.9   105982.8   79027.67 |
                2. |   0       3    .     .     .     .     .     .     .   503634.8   357578.3   211521.6 |
                3. |   1       2    .     .     .     .     .     .     .   863378.8   473568.8    83758.8 |
                4. |   1       3    .     .     .     .     .     .     .   390818.8   277058.3   163297.9 |
                   +---------------------------------------------------------------------------------------+
              When I run the first piece of code to graph using twoway:
              Code:
              twoway rcap ub lb finr if inlist(vtype, 2, 3), by(at3 vtype)
              Stata output displays "variable finr not found", although I included data for finr in the sample data set in #1. It seems that -graph bar- is my friend as i am unable to get -statsby- to do what I need it to, at least not without more time (which I am in loan of).

              Comment

              Working...
              X