Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • 'graph bar' with percentages

    Dear all,

    I'm trying to get a bar chart with an overview of the gender proportions within different age groups giving care.

    Doing:
    Code:
    tab        age_cat2        if    r_caregiver==1
        tab    age_cat2        if    r_caregiver==1 & female_r==0
        tab    age_cat2        if    r_caregiver==1 & female_r==1
    (Note: r_caregiver is a sample definition, female_r the sex variable, and age_cat2 the categorical age variable.)

    The results are:
    Code:
    . tab             age_cat2                if      r_caregiver==1
    
          Age of |
      respondent |
     (caregiver) |
           (cat) |      Freq.     Percent        Cum.
    -------------+-----------------------------------
           40-49 |         18        1.46        1.46
           50-59 |        423       34.39       35.85
           60-69 |        502       40.81       76.67
           70-79 |        244       19.84       96.50
    80 und älter |         43        3.50      100.00
    -------------+-----------------------------------
           Total |      1,230      100.00
    
    .         tab     age_cat2                if      r_caregiver==1 & female_r==0
    
          Age of |
      respondent |
     (caregiver) |
           (cat) |      Freq.     Percent        Cum.
    -------------+-----------------------------------
           40-49 |          4        0.67        0.67
           50-59 |        169       28.36       29.03
           60-69 |        236       39.60       68.62
           70-79 |        158       26.51       95.13
    80 und älter |         29        4.87      100.00
    -------------+-----------------------------------
           Total |        596      100.00
    
    .         tab     age_cat2                if      r_caregiver==1 & female_r==1
    
          Age of |
      respondent |
     (caregiver) |
           (cat) |      Freq.     Percent        Cum.
    -------------+-----------------------------------
           40-49 |         14        2.21        2.21
           50-59 |        254       40.06       42.27
           60-69 |        266       41.96       84.23
           70-79 |         86       13.56       97.79
    80 und älter |         14        2.21      100.00
    -------------+-----------------------------------
           Total |        634      100.00
    This I want to display in a bar graph. The command so far is:
    Code:
    graph bar (percent) if r_caregiver==1, over(female_r) over(age_cat2) asyvars    ///
        bar(1, color(dknavy)) bar(2, color(sand))        ///
        ytitle("") ylabel(0(5)25, angle(horizontal) labsize(medsmall))    ///
        blabel(total)        ///
        legend(rows(1) ring(6) position(6) label(1 "Männer") label(2 "Frauen"))
    It results in:
    v1_age#gender.png
    However, the given height of the bars and the numbers in the labels are not the percentages, but the means, I guess. Whiy is that? I've read that with the over option, Stata automatically uses percentages. Apart from that, I specified (percent) in the command - so what did I do wrong?

    Thanks for any hint or idea!

  • #2
    Provide a data example with some explanation of what proportions you need. You could use:

    Code:
    sysuse auto, clear

    Comment


    • #3
      Andrew, thank you for your reply!

      I want to display in the graph the proportions of men and women in the respective age groups giving care. As far as I understand, the percentages given in the conditioned tab-Outputs (see my post above) should be transferred in the graph. But that isn't. What do I understand wrong or did incorrectly in my commands?

      Here is a data example, with the proportions of domestic and foreign cars in different price groups:

      Code:
      sysuse auto, clear
      
      * Variables:
      
      tab        price, m        // age in my analysis
      gen        price_cat = 0    if    price>=3000 & price<4000
      replace price_cat = 1    if    price>=4000 & price<5000
      replace price_cat = 2    if    price>=5000 & price<6000
      replace price_cat = 3    if    price>=6000 & price<7000
      replace price_cat = 4    if    price>=7000 & price<10000
      replace price_cat = 5     if    price>=10000
      lab def    pricecat    0 "3000-3999" 1 "4000-4999" 2 "5000-5999" 3 "6000-6999" 4 "7000-9999" 5 "10000 and more"
      lab val    price_cat pricecat
      tab        price price_cat, m
      tab     price_cat, m    // 'age_cat2' in my analysis
      
      tab     foreign, m        // 'female_r' in my analysis
      
      
      * Distribution:
      
      tab        price_cat
      tab        price_cat     if     foreign==0        // 42.3 % of foreign cars are in price category '4000-4999', whereas ...
      tab        price_cat    if    foreign==1        // ... only 18.2 % of domestic cars in this price category.
      
      
      * Bar graph:
      
      graph bar, over(foreign) over(price_cat) asyvars    ///
          bar(1, color(dknavy)) bar(2, color(sand))        ///
          ytitle("") ylabel(0(5)30, angle(horizontal) labsize(medsmall))    ///
          blabel(total)        ///
          legend(rows(1) ring(6) position(6) label(1 "Domestic") label(2 "Foreign"))
      This is the output of the tab-commands:

      Code:
      . tab             price_cat
      
           price_cat |      Freq.     Percent        Cum.
      ---------------+-----------------------------------
           3000-3999 |         11       14.86       14.86
           4000-4999 |         26       35.14       50.00
           5000-5999 |         14       18.92       68.92
           6000-6999 |          7        9.46       78.38
           7000-9999 |          6        8.11       86.49
      10000 and more |         10       13.51      100.00
      ---------------+-----------------------------------
               Total |         74      100.00
      
      . tab             price_cat       if      foreign==0             
      
           price_cat |      Freq.     Percent        Cum.
      ---------------+-----------------------------------
           3000-3999 |          7       13.46       13.46
           4000-4999 |         22       42.31       55.77
           5000-5999 |          9       17.31       73.08
           6000-6999 |          4        7.69       80.77
           7000-9999 |          2        3.85       84.62
      10000 and more |          8       15.38      100.00
      ---------------+-----------------------------------
               Total |         52      100.00
      
      . tab             price_cat       if      foreign==1
      
           price_cat |      Freq.     Percent        Cum.
      ---------------+-----------------------------------
           3000-3999 |          4       18.18       18.18
           4000-4999 |          4       18.18       36.36
           5000-5999 |          5       22.73       59.09
           6000-6999 |          3       13.64       72.73
           7000-9999 |          4       18.18       90.91
      10000 and more |          2        9.09      100.00
      ---------------+-----------------------------------
               Total |         22      100.00
      The output of the graph does not show 42.3 percent of foreign cars and 18.2 percent of domestic cars in price category '4000-4999', but 29.7 and 5.4 - what are these values, where do they come from?:

      statalist_bsp.png

      I'm sure I'm on the wrong path, but I just don't know which is the error in thought ...

      Any help would be great! Thanks in advance!
      Last edited by Ariane Arbol; 27 Jul 2022, 06:30.

      Comment


      • #4
        generate such a variable. Adjust for missing values below if necessary.

        Code:
        sysuse auto, clear
        
        * Variables:
        
        tab        price, m        // age in my analysis
        gen        price_cat = 0    if    price>=3000 & price<4000
        replace price_cat = 1    if    price>=4000 & price<5000
        replace price_cat = 2    if    price>=5000 & price<6000
        replace price_cat = 3    if    price>=6000 & price<7000
        replace price_cat = 4    if    price>=7000 & price<10000
        replace price_cat = 5     if    price>=10000
        lab def    pricecat    0 "3000-3999" 1 "4000-4999" 2 "5000-5999" 3 "6000-6999" 4 "7000-9999" 5 "10000 and more"
        lab val    price_cat pricecat
        tab        price price_cat, m
        tab     price_cat, m    // 'age_cat2' in my analysis
        
        tab     foreign, m        // 'female_r' in my analysis
        
        
        * Distribution:
        
        tab        price_cat
        tab        price_cat     if     foreign==0        // 42.3 % of foreign cars are in price category '4000-4999', whereas ...
        tab        price_cat    if    foreign==1        // ... only 18.2 % of domestic cars in this price category.
        
        bys foreign price_cat: egen pct= count(price_cat)
        by foreign: replace pct= (pct/_N)*100
        
        * Bar graph:
        set scheme s1color
        graph bar pct, over(foreign) over(price_cat, label(labsize(small))) asyvars    ///
            bar(1, color(dknavy)) bar(2, color(sand))        ///
            ytitle("Percent") ylabel(0(5)45, angle(horizontal) labsize(medsmall))    ///
            blabel(total, format("%3.2f"))       ///
            legend(rows(1) ring(6) position(6) label(1 "Domestic") label(2 "Foreign"))
        Res.:

        Click image for larger version

Name:	Graph.png
Views:	1
Size:	23.4 KB
ID:	1675331

        Last edited by Andrew Musau; 27 Jul 2022, 06:52.

        Comment


        • #5
          Andrew Musau gave a good solution. Here is another, using catplot from SSC. See the first graph below (naturally, the axis labels need some more work, but it's not Ariane's real problem any way).


          And why didn't the syntax in #3 do you want? You want % within each value of foreign, but with that syntax you get % over the cross-combinations of two variables. The second graph makes the point by using a really simple example to make it easy to compare what you got with what you want.


          Code:
          sysuse auto, clear
          
          * Variables:
          
          tab        price, m        // age in my analysis
          gen        price_cat = 0    if    price>=3000 & price<4000
          replace price_cat = 1    if    price>=4000 & price<5000
          replace price_cat = 2    if    price>=5000 & price<6000
          replace price_cat = 3    if    price>=6000 & price<7000
          replace price_cat = 4    if    price>=7000 & price<10000
          replace price_cat = 5     if    price>=10000
          lab def    pricecat    0 "3000-3999" 1 "4000-4999" 2 "5000-5999" 3 "6000-6999" 4 "7000-9999" 5 "10000 and more"
          lab val    price_cat pricecat
          tab        price price_cat, m
          tab     price_cat, m    // 'age_cat2' in my analysis
          
          tab     foreign, m        // 'female_r' in my analysis
          
          
          * Distribution:
          
          tab        price_cat
          tab        price_cat     if     foreign==0        // 42.3 % of foreign cars are in price category '4000-4999', whereas ...
          tab        price_cat    if    foreign==1        // ... only 18.2 % of domestic cars in this price category.
          
          * NJC
          
          set scheme s1color
          
          * Bar graph:
          
          * original code: G1 is the original graph from #3 (modulo using the default scheme)
          graph bar, over(foreign) over(price_cat) asyvars    ///
              bar(1, color(dknavy)) bar(2, color(sand))        ///
              ytitle("") ylabel(0(5)30, angle(horizontal) labsize(medsmall))    ///
              blabel(total)        ///
              legend(rows(1) ring(6) position(6) label(1 "Domestic") label(2 "Foreign")) name(G1, replace)
              
          
          catplot foreign price_cat, percent(foreign) asyvars   ///
              bar(1, color(dknavy)) bar(2, color(sand))        ///
              ytitle("") ylabel(0(5)45, angle(horizontal) labsize(medsmall))    ///
              recast(bar) blabel(total, format(%3.2f))    ///
              legend(rows(1) ring(6) position(6) label(1 "Domestic") label(2 "Foreign")) name(G2, replace)
              
              
          clear
          
          set obs 10
          egen price_cat = seq(), to(5)
          gen foreign = _n > 5
          
          graph bar, over(foreign) over(price_cat) asyvars    ///
              bar(1, color(dknavy)) bar(2, color(sand))        ///
              ytitle("") ylabel(0(5)30, angle(horizontal) labsize(medsmall))    ///
              blabel(total)        ///
              legend(rows(1) ring(6) position(6) label(1 "Domestic") label(2 "Foreign")) name(G3, replace)
          Click image for larger version

Name:	arbolG2.png
Views:	1
Size:	25.1 KB
ID:	1675340

          Click image for larger version

Name:	arbolG3.png
Views:	1
Size:	17.3 KB
ID:	1675341

          Last edited by Nick Cox; 27 Jul 2022, 09:06.

          Comment


          • #6
            Andrew and Nick, thank you so much for your reply and explanation!

            I was able to reproduce both solutions with the 'auto' dataset, but unfortunately only the 'catplot' command worked with my real dataset. I guess I did a mistake somewhere in the 'graph bar' command Andrew suggested, but as I'm a little under time pressure and the 'catplot' worked fine, I will sort that out later.

            Thanks again for the help!

            Comment


            • #7
              A (Cleveland) dot chart can be competitive. (In informal polls with students, I find that they always prefer bar charts to dot charts, but never give a reason beyond familiarity with bar charts.)

              Here's a sample. If you find the text labels too busy, as I think I do, then chop them out. On the other hand, I strongly recommend linetype(line) lines(lc(gs12) lw(vthin)) as the default dotted lines in my experience degrade on export.

              See also https://www.stata-journal.com/articl...article=gr0034 and the original Cleveland books and papers. (Detail: there were examples avant la lettre in editions of Snedecor's text from 1937 to 1956 until removed by Cochran!

              Code:
              sysuse auto, clear
              
              * Variables:
              
              gen        price_cat = 0    if    price>=3000 & price<4000
              replace price_cat = 1    if    price>=4000 & price<5000
              replace price_cat = 2    if    price>=5000 & price<6000
              replace price_cat = 3    if    price>=6000 & price<7000
              replace price_cat = 4    if    price>=7000 & price<10000
              replace price_cat = 5     if    price>=10000
              lab def    pricecat    0 "3000-3999" 1 "4000-4999" 2 "5000-5999" 3 "6000-6999" 4 "7000-9999" 5 "10000 and more"
              lab val    price_cat pricecat
              
              set scheme s1color
              
              
              catplot foreign price_cat, percent(foreign) asyvars   ///
                  marker(1, ms(Oh) mcolor(blue))  marker(2, ms(+)  mcolor(red))        ///
                  ytitle("") ylabel(0(5)45, angle(horizontal) labsize(medsmall))    ///
                  recast(dot) blabel(total, format(%3.2f))  linetype(line) lines(lc(gs12) lw(vthin))  ///
                  legend(rows(1) ring(6) position(6) label(1 "Domestic") label(2 "Foreign")) l1title(Something sensible here)

              Click image for larger version

Name:	ariane_dot.png
Views:	1
Size:	28.7 KB
ID:	1675653

              Comment


              • #8
                Nick, thanks for this great suggestion and alternative to bar charts! I like it a lot and for some of my descriptions it really is a better solution!

                Off the top of your head, do you know how to reshape the chart so that the percentages are on the y-axis and the groups are on the 'x-axis'? It's not the 'recast' option, as I've already figured out ..

                Comment


                • #9
                  There is an undocumented vertical option to graph dot if I recall correctly. If you adopt you then have the problem of accommodating the category labels comfortably.

                  Comment


                  • #10
                    Here is some code modifying #7

                    Code:
                    catplot foreign price_cat, percent(foreign) asyvars var2opts(label(labsize(small))) ///
                        marker(1, ms(Oh) mcolor(blue))  marker(2, ms(+)  mcolor(red))        ///
                        ytitle("") ylabel(0(5)45, angle(horizontal) labsize(medsmall))    ///
                        recast(dot) blabel(total, format(%3.2f))  linetype(line) lines(lc(gs12) lw(vthin))  ///
                        legend(col(1) ring(0) position(1) label(1 "Domestic") label(2 "Foreign")) ///
                        b2title("explain here") l1title(Something sensible here) vertical

                    Comment


                    • #11
                      Nick, thanks so much for your help, I really appreciate your time and effort! Your suggestions helped me a lot and I was able to produce some good graphs. :-)

                      (Also, sorry for the late reply. I was a bit overwhelmed with work and failed to close the thread in a timely manner .. won't happen again!)

                      Comment

                      Working...
                      X