Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to do a bar chart excluding the most frequent one?


    Hello,
    I am trying to demonstrate in a bar chart the percentage of people in various ethnic groups. I have 12 categories of ethnic groups, however, one groups, white, dominates (more than 80%) so I get tiny bars for the rest which is not very informative/ demonstrative. Alternatively, I would like my bar chart to have all bars well visible in percentages excluding the frequent one. I mean, to have the percentages of minor ethnicity groups as they actually are in the sample, but the height of their bars would be taller because the white category has been kept aside.

    I am trying this at the moment:

    [QUOTE]
    graph hbar, over(ethnicity)
    [QUOTE]

    hiding the white ethnicity in the edit feature does not solve the issue. I still get tiny bars.


  • #2
    If you are graphing the proportions directly, then excluding a category does not change the remaining proportions.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float share str2 name
    10 "A"
    80 "B"
     5 "C"
     4 "D"
     1 "E"
    end
    
    gr bar share, over(name) blab(total) ytitle(percent)
    gr bar share if name!="B", over(name) blab(total) ytitle(percent)

    Comment


    • #3

      The problem is that you can omit the most frequent category but then the percents will be wrong, as they will be percents of what you show. So you need to calculate them in advance. This shows some technique:


      Code:
      sysuse nlsw88, clear
      
      preserve 
      
      contract occupation, percent(pc)
      list 
      graph hbar (asis) pc if pc < 30 , over(occupation, sort(1) descending) blabel(bar, format(%3.2f))
      
      restore
      Here the list gives me a quick rule for ignoring the highest, which would work in your case too, as if the highest percent is over 80, no others can be more than 30.

      More general code to omit the highest would be to sort the reduced dataset on percent and ignore the last observation.

      Comment


      • #4
        Note that #2 and #3 are really the same idea. You need what you want to show in a separate variable.

        Comment


        • #5
          Andrew Musau Nick Cox

          It worked perfectly well! many thanks.
          Last edited by Danah Abdul; 13 Apr 2021, 14:30.

          Comment


          • #6
            2 decimal places.

            Comment

            Working...
            X