Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bar chart with frequencies of one variable stacked up to 100%

    Hello,

    I have a variable with 10,000 observations. The observations can take on 5 different values (1 - 5). I would like to make a bar chart that shows the frequencies of responses stacked up to 100%. How do I do that?

    Thanks.
    Florian

  • #2
    You mean something like

    Code:
    sysuse auto, clear
    graph bar (count), over(rep78) percent stack asyvars
    ???
    Last edited by Nick Cox; 22 Jun 2017, 04:22.

    Comment


    • #3
      Yes, that helps. How do I do that when I have multiple variables and want to show them in a bar plot? I was able to collapse the data for one variable, but how can I collapse 10+ variables to show the frequencies of the individual responses?

      Comment


      • #4
        Concrete example please.

        Comment


        • #5
          I want to analyse 16 variables. There are about 10,000 observations and I want to create a stacked bar graph for each variable that shows the frequency of each response.
          Here's part of the data:

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input double(_GERMINATION_CROP_STAND _HEIGHT _STALK_THICKNESS _NUMBER_OF_COBS _COB_SIZE)
          5 4 3 5 3
          4 5 5 4 5
          5 4 3 4 3
          5 4 4 4 4
          5 5 5 4 4
          4 5 4 4 4
          5 5 5 4 5
          4 4 3 2 2
          4 5 3 4 3
          5 3 3 3 4
          4 4 4 4 3
          4 4 3 4 3
          4 4 3 3 2
          4 4 3 4 4
          4 3 4 3 4
          4 5 5 4 4
          5 5 5 4 5
          5 5 3 4 4
          5 4 4 5 5
          3 4 4 4 4
          4 3 3 3 3
          5 4 3 3 3
          5 3 3 4 3
          4 3 2 2 2
          5 3 2 4 3
          5 4 4 5 5
          3 4 4 5 5
          4 4 3 4 3
          3 2 2 2 2
          5 4 3 5 5
          5 2 1 1 2
          5 4 5 4 2
          5 2 4 4 4
          5 5 5 5 5
          3 3 3 4 5
          4 3 4 4 5
          5 5 4 5 5
          5 5 5 5 4
          5 4 4 4 5
          5 4 5 4 4
          5 4 3 5 5
          5 5 5 4 5
          5 5 5 4 5
          5 3 4 3 5
          5 3 4 4 5
          5 3 5 5 5
          5 3 4 5 4
          5 5 4 5 4
          5 4 4 3 5
          4 4 4 5 5
          5 4 3 4 4
          5 5 5 5 5
          5 5 4 4 5
          4 5 3 3 4
          5 4 3 4 4
          4 4 5 3 3
          5 5 5 4 3
          5 5 4 5 5
          4 3 4 4 4
          5 5 5 4 5
          5 5 5 4 5
          5 5 5 4 5
          4 5 5 3 4
          5 5 5 4 5
          5 4 3 2 2
          5 4 5 5 5
          5 5 5 4 2
          5 1 5 1 5
          4 5 4 5 5
          3 5 3 5 5
          4 3 3 5 5
          5 3 2 3 4
          4 3 5 5 5
          4 4 5 4 3
          5 3 4 3 4
          4 3 4 5 5
          5 4 4 5 5
          5 5 4 5 5
          3 5 5 5 4
          4 4 4 4 4
          5 4 5 4 5
          5 4 4 5 4
          4 4 4 5 5
          5 5 5 5 5
          5 3 5 1 4
          5 4 5 5 5
          4 5 3 5 5
          5 4 4 5 5
          5 5 4 3 5
          5 5 4 3 3
          5 5 4 3 4
          4 5 5 3 3
          5 5 4 4 4
          5 3 4 4 4
          5 4 5 3 5
          5 5 2 5 5
          5 5 5 5 5
          5 5 4 5 3
          5 4 4 5 4
          5 5 4 5 5
          end
          label values _GERMINATION_CROP_STAND GERMINATION_CROP_STAND
          label def GERMINATION_CROP_STAND 3 "C", modify
          label def GERMINATION_CROP_STAND 4 "B", modify
          label def GERMINATION_CROP_STAND 5 "A", modify
          label values _HEIGHT HEIGHT
          label def HEIGHT 1 "E", modify
          label def HEIGHT 2 "D", modify
          label def HEIGHT 3 "C", modify
          label def HEIGHT 4 "B", modify
          label def HEIGHT 5 "A", modify
          label values _STALK_THICKNESS STALK_THICKNESS
          label def STALK_THICKNESS 1 "E", modify
          label def STALK_THICKNESS 2 "D", modify
          label def STALK_THICKNESS 3 "C", modify
          label def STALK_THICKNESS 4 "B", modify
          label def STALK_THICKNESS 5 "A", modify
          label values _NUMBER_OF_COBS NUMBER_OF_COBS
          label def NUMBER_OF_COBS 1 "E", modify
          label def NUMBER_OF_COBS 2 "D", modify
          label def NUMBER_OF_COBS 3 "C", modify
          label def NUMBER_OF_COBS 4 "B", modify
          label def NUMBER_OF_COBS 5 "A", modify
          label values _COB_SIZE COB_SIZE
          label def COB_SIZE 2 "D", modify
          label def COB_SIZE 3 "C", modify
          label def COB_SIZE 4 "B", modify
          label def COB_SIZE 5 "A", modify

          Comment


          • #6
            At first sight the principles here were all covered in your previous thread. https://www.statalist.org/forums/for...bles-on-x-axis

            Comment


            • #7
              Somehow I cannot get the collapse command to work for count with multiple variables. I've tried seemingly forever. I managed to do it with one variable, when I created a duplicate that contains 1s and used that as the clist variable in the collapse command:

              Code:
              gen xgermination = _GERMINATION_CROP_STAND
              replace  xgermination = 1
              collapse (count) xgermination, by(_GERMINATION_CROP_STAND)
              In order to create the graph I would need the collapsed data of all 16 variables, though. I somehow cannot get it done.

              Comment


              • #8
                Originally posted by Florian Neubauer View Post
                Somehow I cannot get the collapse command to work for count with multiple variables.
                Please show the commands that you tried.

                Comment


                • #9
                  Code:
                  foreach var of varlist _GERMINATION_CROP_STAND-_COB_SIZE{
                   gen x`var' =1
                  }
                  
                  collapse (count) x_GERMINATION_CROP_STAND-x_COB_SIZE, by(_GERMINATION_CROP_STAND-_COB_SIZE)
                  I tried it with a loop as well but that was even worse. It seems I cannot manage to pair each variable with its specific duplicate.

                  Comment


                  • #10
                    I would propose that you change the structure of your data and use graph bar, over() by(), building on Nick's example in post #2. The variable names in the example below are not optimal but you can change them to something more meaningful.
                    Code:
                    ren _GERMINATION_CROP_STAND var1
                    ren _HEIGHT var2
                    ren _STALK_THICKNESS var3
                    ren _NUMBER_OF_COBS var4
                    
                    stack var1 var2 var3 var4, into(var) clear
                    lab def var 1 "E" 2 "D" 3 "C" 4 "B" 5 "A"
                    lab val var var
                    lab def stack 1 "GERMINATION_CROP_STAND" 2 "HEIGHT" 3 "STALK_THICKNESS" 4 "NUMBER_OF_COBS"
                    lab val _stack stack
                    
                    graph bar (count), over(var) percent stack asyvars by(_stack, note("")) legend(row(1))
                    Click image for larger version

Name:	graph.png
Views:	1
Size:	16.2 KB
ID:	1398986

                    Comment


                    • #11
                      Thanks a lot! This is very helpful.

                      Comment


                      • #12
                        Friedrich did the hard work of showing you what to do to get you asked, but that still leaves scope for comment.

                        The stacked design is popular -- but is it effective? For the same data, I suggest as one alternative a multiple bar chart in table layout.

                        Below is the complete code for anyone's convenience. You need to install tabplot from the Stata Journal. See the 2016 paper if you have access

                        http://www.stata-journal.com/article...article=gr0066

                        and otherwise

                        https://www.statalist.org/forums/for...updated-on-ssc

                        and if you need more examples search this forum for mentions.

                        Code:
                        * Example generated by -dataex-. To install: ssc install dataex
                        clear
                        input double(_GERMINATION_CROP_STAND _HEIGHT _STALK_THICKNESS _NUMBER_OF_COBS _COB_SIZE)
                        5 4 3 5 3
                        4 5 5 4 5
                        5 4 3 4 3
                        5 4 4 4 4
                        5 5 5 4 4
                        4 5 4 4 4
                        5 5 5 4 5
                        4 4 3 2 2
                        4 5 3 4 3
                        5 3 3 3 4
                        4 4 4 4 3
                        4 4 3 4 3
                        4 4 3 3 2
                        4 4 3 4 4
                        4 3 4 3 4
                        4 5 5 4 4
                        5 5 5 4 5
                        5 5 3 4 4
                        5 4 4 5 5
                        3 4 4 4 4
                        4 3 3 3 3
                        5 4 3 3 3
                        5 3 3 4 3
                        4 3 2 2 2
                        5 3 2 4 3
                        5 4 4 5 5
                        3 4 4 5 5
                        4 4 3 4 3
                        3 2 2 2 2
                        5 4 3 5 5
                        5 2 1 1 2
                        5 4 5 4 2
                        5 2 4 4 4
                        5 5 5 5 5
                        3 3 3 4 5
                        4 3 4 4 5
                        5 5 4 5 5
                        5 5 5 5 4
                        5 4 4 4 5
                        5 4 5 4 4
                        5 4 3 5 5
                        5 5 5 4 5
                        5 5 5 4 5
                        5 3 4 3 5
                        5 3 4 4 5
                        5 3 5 5 5
                        5 3 4 5 4
                        5 5 4 5 4
                        5 4 4 3 5
                        4 4 4 5 5
                        5 4 3 4 4
                        5 5 5 5 5
                        5 5 4 4 5
                        4 5 3 3 4
                        5 4 3 4 4
                        4 4 5 3 3
                        5 5 5 4 3
                        5 5 4 5 5
                        4 3 4 4 4
                        5 5 5 4 5
                        5 5 5 4 5
                        5 5 5 4 5
                        4 5 5 3 4
                        5 5 5 4 5
                        5 4 3 2 2
                        5 4 5 5 5
                        5 5 5 4 2
                        5 1 5 1 5
                        4 5 4 5 5
                        3 5 3 5 5
                        4 3 3 5 5
                        5 3 2 3 4
                        4 3 5 5 5
                        4 4 5 4 3
                        5 3 4 3 4
                        4 3 4 5 5
                        5 4 4 5 5
                        5 5 4 5 5
                        3 5 5 5 4
                        4 4 4 4 4
                        5 4 5 4 5
                        5 4 4 5 4
                        4 4 4 5 5
                        5 5 5 5 5
                        5 3 5 1 4
                        5 4 5 5 5
                        4 5 3 5 5
                        5 4 4 5 5
                        5 5 4 3 5
                        5 5 4 3 3
                        5 5 4 3 4
                        4 5 5 3 3
                        5 5 4 4 4
                        5 3 4 4 4
                        5 4 5 3 5
                        5 5 2 5 5
                        5 5 5 5 5
                        5 5 4 5 3
                        5 4 4 5 4
                        5 5 4 5 5
                        end
                        
                        label values _GERMINATION_CROP_STAND GERMINATION_CROP_STAND
                        label def GERMINATION_CROP_STAND 3 "C", modify
                        label def GERMINATION_CROP_STAND 4 "B", modify
                        label def GERMINATION_CROP_STAND 5 "A", modify
                        label values _HEIGHT HEIGHT
                        label def HEIGHT 1 "E", modify
                        label def HEIGHT 2 "D", modify
                        label def HEIGHT 3 "C", modify
                        label def HEIGHT 4 "B", modify
                        label def HEIGHT 5 "A", modify
                        label values _STALK_THICKNESS STALK_THICKNESS
                        label def STALK_THICKNESS 1 "E", modify
                        label def STALK_THICKNESS 2 "D", modify
                        label def STALK_THICKNESS 3 "C", modify
                        label def STALK_THICKNESS 4 "B", modify
                        label def STALK_THICKNESS 5 "A", modify
                        label values _NUMBER_OF_COBS NUMBER_OF_COBS
                        label def NUMBER_OF_COBS 1 "E", modify
                        label def NUMBER_OF_COBS 2 "D", modify
                        label def NUMBER_OF_COBS 3 "C", modify
                        label def NUMBER_OF_COBS 4 "B", modify
                        label def NUMBER_OF_COBS 5 "A", modify
                        label values _COB_SIZE COB_SIZE
                        label def COB_SIZE 2 "D", modify
                        label def COB_SIZE 3 "C", modify
                        label def COB_SIZE 4 "B", modify
                        label def COB_SIZE 5 "A", modify
                        
                        ren (_GERMINATION_CROP_STAND _HEIGHT _STALK_THICKNESS _NUMBER_OF_COBS) (var#), addnumber 
                        stack var?, into(var) clear
                        lab def var 1 "E" 2 "D" 3 "C" 4 "B" 5 "A"
                        lab val var var
                        lab def stack 1 "germination crop stand"  2 "height" 3 "stalk thickness" 4 "number of cobs"
                        lab val _stack stack
                        
                        tabplot _stack var , percent(_stack) xtitle("") ytitle("") bfcolor(green*0.2) showval subtitle(%)
                        There are many tweaks possible, e.g. xsc(reverse) and horizontal bars, not to mention by() if there is another layer of structure in the data.

                        Click image for larger version

Name:	maize2.png
Views:	1
Size:	19.3 KB
ID:	1399065


                        With this plot I think you get a more readable display with the scope to look at fine structure (e.g. the low numbers of D and E gradings) and without the distractions of a legend and arbitrary colours.

                        Comment


                        • #13
                          Originally posted by Nick Cox View Post
                          With this plot I think you get a more readable display with the scope to look at fine structure (e.g. the low numbers of D and E gradings) and without the distractions of a legend and arbitrary colours.
                          I agree with Nick. This should also work with your 16 variables if you increase the height of the graph.

                          Comment


                          • #14
                            Thanks for this option. This looks much better indeed. I only get to have the first 9 variables in the graph though. How can I display all 16?

                            Comment


                            • #15
                              Code:
                              stack var*, into(var) clear
                              plus other changes to match.

                              Comment

                              Working...
                              X