Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bar Chart comparing sample and subgroup

    Dear Statalisters,

    i think this should be easy, but i can´t find a way to do, so i hope you can help me.....

    i would like to create a Barchart (showing percentages) of a categorial Var (cat_var1) with four categories. Than i would like to compare the whole sample and a subgroup (matched dataset(merged)).

    Let´s say the graph should look like this:

    Bar1: category 1 of cat_var1 (subgroup)
    Bar2: category 1 of cat_var1 (whole sample)
    Bar3: category 2 of cat_var1 (subgroup)
    Bar4: category 2 of cat_var1 (whole sample)
    Bar5: category 3 of cat_var1 (subgroup)
    Bar6: category 3 of cat_var1 (whole sample)
    Bar7: category 4 of cat_var1 (subgroup)
    Bar8: category 4 of cat_var1 (whole sample)


    thanks and best regards!!
    Tim K.





  • #2
    You don't give a data example, contrary to FAQ Advice #12. See https://www.statalist.org/forums/help#stata

    This works for the auto data. noting that it is less likely with any 8 bars (in your case) that the bar labels on graph bar would be easy to read. The recipe should be similar for your case.

    If you only want to do this once, compiling a little dataset with two variables (subgroup versus total AND frequencies) may be as simple as is needed.

    Code:
    set scheme s1color 
    
    sysuse auto, clear
    
    contract foreign rep78 
    
    egen _total = total(_freq), by(rep78)
    
    list, sepby(foreign)
    
    gen frequency = cond(foreign == 1, _freq, cond(foreign == 0, _total, .))
    
    label def origin 0 Total, modify 
    
    graph hbar (asis) frequency, over(foreign, descending) over(rep78) ysc(alt) ytitle(Frequency) name(G1, replace)
    
    separate frequency, by(foreign) veryshortlabel 
    
    list, sepby(foreign) 
    
    graph hbar (asis) frequency?, nofill over(foreign, descending) over(rep78) ysc(alt) ytitle(Frequency) name(G2, replace) legend(off)
    Click image for larger version

Name:	krause_G1.png
Views:	1
Size:	24.1 KB
ID:	1686809
    Click image for larger version

Name:	krause_G2.png
Views:	1
Size:	22.4 KB
ID:	1686810

    Comment


    • #3
      Thanks a lot Nick!!! This works fine, but is there no way to do this with percentages?

      Comment


      • #4
        You did say percentages in #1 but that didn't register, so my fault, yet at the same time I have difficulty seeing that as a good idea. Are you asking for subgroup bars that are less than or equal to 100%, paired with total bars that are all 100%???

        That can be done: just calculate the percents from the frequencies, but I am reluctant to post code for a poor design.

        Comment


        • #5
          sorry, i think i did not explain it well.

          Here is my cat_var (whole dataset)
          Cat_var
          -----------------------------------------------------------
          | Freq. Percent Valid Cum.
          --------------+--------------------------------------------
          Valid 79 | 2412 23.36 23.36 23.36
          88 | 3702 35.86 35.86 59.22
          91 | 3008 29.14 29.14 88.36
          96 | 1202 11.64 11.64 100.00
          Total | 10324 100.00 100.00
          -----------------------------------------------------------
          And here´s my cat_var for subgroup:

          cat_var if subgroup== 1
          -----------------------------------------------------------
          | Freq. Percent Valid Cum.
          --------------+--------------------------------------------
          Valid 79 | 336 10.55 10.55 10.55
          88 | 2015 63.25 63.25 73.79
          91 | 730 22.91 22.91 96.70
          96 | 105 3.30 3.30 100.00
          Total | 3186 100.00 100.00
          -----------------------------------------------------------

          So i want to compare it like this:
          Cat_var
          -----------------------------------------------------------
          | Freq. Percent Valid Cum.
          --------------+--------------------------------------------
          Valid 79 (whole dataset) bar1 | 2412 23.36 23.36 23.36
          79 (subgroup==1) bar2 | 336 10.55 10.55 10.55
          and so on....

          what i want to say is like: In total sample category 79 is 23.36 percent, in subgroup only 10.55 percent and so on...

          Comment


          • #6
            Thanks for the extra detail. I think I understand better. Still no data example, but consider this recipe:

            Code:
            set scheme s1color 
            
            sysuse auto, clear
            
            contract foreign rep78, zero 
            
            egen _pc = total(_freq), by(rep78)
            su _freq, meanonly 
            replace _pc = 100 * _pc / r(sum)
            
            egen _subpc = total(_freq * (foreign == 1)), by(rep78)
            su _freq if foreign == 1, meanonly 
            replace _subpc = 100 * _subpc / r(sum)
            
            list, sepby(foreign)
            
            gen percent = cond(foreign == 1, _subpc, cond(foreign == 0, _pc, .))
            
            label def origin 0 Total, modify 
            
            graph hbar (asis) percent, over(foreign, descending) over(rep78) ysc(alt) ytitle(Percent) name(G3, replace)
            
            separate percent, by(foreign) veryshortlabel 
            
            graph hbar (asis) percent?, nofill over(foreign, descending) over(rep78) ysc(alt) ytitle(Percent) name(G4, replace) legend(off)

            Comment


            • #7
              Thanks so much Nick! It works perfectly. The only Problem i can´t find a solution for is how to remove the value labels (0,1)...

              Comment


              • #8
                Why would you want to do that? I think a data example is now utterly essential for me to suggest any further code.

                Comment


                • #9






                  Click image for larger version

Name:	G4.png
Views:	1
Size:	19.4 KB
ID:	1686862



                  here´s my graph. I just want to remove the 0 and 1 at the left side, because you already see it in the legend. I think its more esthetic.

                  Code:
                  graph hbar (asis) percent?, nofill over(merge, descending) over(LB19_Fördersatz) ///
                  ysc(alt) ytitle(Anteil Einrichtungen) name(G4, replace) ///
                  blabel(bar, pos(outside) size(2.5) color(black) format(%2.0f)) ytitle("") ylabel( 0 "0%" 20 "20%" 40 "40%" 60 "60%" 80 "80%") ///
                  title("") legend(pos(bottom) cols(5)) xsize(7)





                  Comment


                  • #10
                    I think that is a terrible idea. You should improve the value labels and lose the legend instead. Sorry, but my personal rule is that I won't suggest code for something that i think is a terrible idea.

                    (Still no data example, as requested and explained in #2: that would be for anyone else who has a different view on this. Anyone answering doesn't need the full dataset, just the table counts.)
                    Last edited by Nick Cox; 26 Oct 2022, 08:01.

                    Comment


                    • #11
                      Hey Nick, it´s me again. Thanks a lot for your help! Table with counts should be in #5?

                      Maybe it will help if i post the graph like it´s made by code and the one i made with graph editor (this one i would like to make by code)

                      Here the one with code (like you said without legend):

                      Click image for larger version

Name:	G4_unbearbeitet.png
Views:	1
Size:	27.8 KB
ID:	1686972



                      but i think this one (with legend) is better

                      Click image for larger version

Name:	G4_bearbeitet.png
Views:	1
Size:	22.1 KB
ID:	1686973

                      Comment


                      • #12
                        Sorry to disappoint, but I haven't changed my mind. The use of a legend obliges the reader to memorise some arbitrary colour distinction, or else to keep referring to it. To paraphrase Penny in The Big Bang Theory, I speak on behalf of all readers -- we had a meeting -- please don't do that. Use direct labels.

                        Anyone who disagrees is welcome to provide code for you.

                        Comment

                        Working...
                        X