Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Grouped Percentage Bar Graph

    Hi Everyone,

    I am trying to make a stacked bar graph but having some trouble. I have three groups of observations: "random women", "random men", and "male partners" and several conditions (I kept c1, c2, and c3 in the sample data), each observation answer "more" "fewer", or "fixed" for each condition.

    The main problem is to get the conditions into the mix. I don't have a "condition" variable, so I don't know really how to put this into the graph hbar command.

    The ideal is to have a grouped percentage bar graph similar to this, with bars grouped by three rather than two, and conditions on the left being the c1, c2, c3, etc.:

    Fig1.png

    I am using Stata MP 16.0.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(c1 c2 c3 group)
    2 3 3 2
    3 2 3 2
    3 3 3 2
    2 2 3 2
    3 3 3 2
    3 3 3 0
    1 3 3 0
    2 2 3 0
    3 3 3 0
    3 3 3 0
    3 3 3 0
    2 3 3 0
    2 2 2 0
    2 2 3 0
    3 3 3 0
    2 3 3 0
    2 3 3 0
    3 3 1 0
    3 3 3 0
    2 3 3 1
    2 3 3 1
    2 3 3 1
    2 2 2 1
    3 3 3 1
    2 2 3 1
    2 3 3 1
    3 3 3 1
    3 3 3 1
    3 3 3 1
    2 3 3 1
    end
    label values c1 number
    label values c2 number
    label values c3 number
    label def number 1 "More", modify
    label def number 2 "Fewer", modify
    label def number 3 "Fixed", modify
    label values group groupl
    label def groupl 0 "Random Women", modify
    label def groupl 1 "Random Men", modify
    label def groupl 2 "Male Partner", modify
    Thank you in advance!
    Last edited by Iris Zhao; 05 Sep 2020, 01:49.

  • #2
    I don't think I understand the mapping needed for your data example but your image suggests a need for a graph with

    * three possible outcomes

    * one predictor has two categories

    * another predictor has several categories.

    The problems with your desired design are manifold

    * Rare categories mean short bars, fair enough, but it is then hard to show percents readably

    * Categories that don't occur have to be inferred from the absence of any bar at all: sometimes you won't care about that, and sometimes you woill

    * The fact that percents stack add to 100 is banal as a consequence of their definition; it's not informative about the data. I often suggest plotting bars separately as discussed at more length in https://www.statalist.org/forums/for...updated-on-ssc and https://www.stata-journal.com/articl...article=gr0066 Then it's easier to compare within categories of the outcome.

    I created a loosely similar problem using one of Stata's standard datasets.

    Warnings:

    * You need to install community-contributed commands from SSC for any of this to work. They all are based on official Stata code so in principle there is a way to draw each graph directly. If any reader has installed any of those, you should not need to re-install.

    * Although I show final code, your own data will likely oblige different tweaking of the code to create extra space at the edges of the graph and to move the text labels in tabplot .

    Code:
    webuse nlswork, clear
    keep if ind_code <= 7
    
    set scheme s1color
    
    ssc install mycolours
    ssc install catplot
    ssc install tabplot
    
    mycolours
    
    catplot race msp ind_code, percent(ind_code msp) horizontal asyvars stack ysize(9) bar(1, lcolor("`ora'"*2) fcolor("`ora'"*0.3))   bar(3, lcolor("`sky'"*2) fcolor("`sky'"*0.3)) bar(2, lcolor("`bla'") fcolor("`bla'"*0.3)) blabel(bar, format(%2.1f) pos(base)) ysc(r(0, 103)) legend(row(1)) ysc(alt) l1title(industrial code and msp) name(G1, replace)
    
    tabplot msp race , by(ind_code, compact col(1) note("")) percent(ind_code msp) horizontal ysize(9) separate(race) bar1(lcolor("`ora'"*2) fcolor("`ora'"*0.3))   bar3(lcolor("`sky'"*2) fcolor("`sky'"*0.3)) bar2(lcolor("`bla'") fcolor("`bla'"*0.3)) showval(offset(0.15) mlabsize(medsmall)) subtitle(, pos(9) fcolor(none) nobox nobexpand) ytitle(industrial code and msp) xsc(r(0.8, 3.2)) name(G2, replace)
    With a stacked design, there is scope for flexibility about where the legend goes and where the axis labels go (and even for deciding that you don't need axis labels if you show percents). But as said there is some ugliness in showing percents for rare categories, even with one decimal place, not two.
    Click image for larger version

Name:	stackortab_g1.png
Views:	1
Size:	24.9 KB
ID:	1571545



    I much prefer this design using tablot myself. For other data, I would expect to see value labels echoed in the graph.
    Click image for larger version

Name:	stackortab_g2.png
Views:	1
Size:	22.2 KB
ID:	1571546

    Last edited by Nick Cox; 05 Sep 2020, 03:38.

    Comment


    • #3
      Thank you Nick! Sorry for the late response. I agree with you that the one using tablot shows the comparison more clearly (especially the category in the middle).

      I don't think I understand the mapping needed for your data example
      The problem I have here is to transform the dataset into something like this

      Click image for larger version

Name:	WeChat Image_20200906223037.png
Views:	1
Size:	35.6 KB
ID:	1571679

      I have tried to generate dummy variables to help this transformation but it doesn't seem to work.

      this is the code I am using:
      Code:
      tab c1, gen(c1_)
      tab c2, gen(c2_)
      tab c3, gen(c3_)
      
      gen id=_n 
      expand 3
      sort id
      
      foreach x of varlist c1_1 c2_1 c3_1 {
          replace answer=1 if answer==. & `x'==1
      }
      
      foreach x of varlist c1_2 c2_2 c3_2 {
          replace answer=1 if answer==. & `x'==1
      }
      
      foreach x of varlist c1_3 c2_3 c3_3 {
          replace answer=1 if answer==. & `x'==1
      }
      I guess I need to specify the condition variable first, but I do not know how to assign a looped number for each id. I tried
      Code:
      egen condition = group (id)
      but it only copies the id variable.

      Comment


      • #4
        Seems to me that the order should be More Fixed Fewer or the reverse. You may find that


        .
        Code:
         gen long id = _n
        
        . reshape long c , i(id) j(which)
        gets you closer to where you want to be. If you already have an identifier, you won't need to create one.

        Comment


        • #5
          Thank you so much Nick! Exactly what I needed.

          Comment

          Working...
          X