Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Percentages showing in bar graph plotted are not matching with proportion command

    Hi,

    I have an English language speaking categorical variable and a group categorical variable. I want to know the percent English speaking ability of different groups.

    Code:
    graph bar (percent), over(ED3) over(groups) asyvars blabel(bar, format(%9.1f))
    The prop command for the group taking the value 1 is showing:

    Proportion estimation Number of obs = 42,939

    --------------------------------------------------------------
    | Logit
    | Proportion Std. Err. [95% Conf. Interval]
    -------------+------------------------------------------------
    ED3 |
    none | .6077226 .0023563 .6030948 .6123311
    Little 1 | .2936724 .0021979 .289383 .2979987
    Fluent 2 | .098605 .0014387 .0958207 .1014611
    --------------------------------------------------------------

    The bar graph shows percentages for the first group as follows:

    1. 12.9% for none
    2. 6.2% for little
    3. 2.1% for fluent

    I have attached an example of the data set using dataex

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int(ED3 groups)
    0 4
    1 4
    1 4
    0 4
    0 4
    0 4
    0 4
    0 4
    1 4
    0 4
    0 4
    1 4
    2 4
    0 4
    0 4
    2 4
    2 4
    0 4
    1 4
    0 4
    0 4
    0 4
    0 4
    1 4
    0 4
    0 4
    1 4
    1 4
    0 4
    0 4
    0 4
    0 4
    0 4
    0 4
    1 4
    0 4
    . 4
    1 4
    0 4
    1 4
    0 4
    0 4
    0 4
    0 4
    0 4
    1 4
    1 4
    1 4
    0 4
    2 4
    1 4
    0 4
    0 4
    0 4
    0 4
    0 4
    1 4
    2 4
    1 3
    1 4
    0 4
    1 4
    0 4
    1 4
    1 4
    0 4
    1 4
    0 4
    0 4
    0 4
    0 4
    0 4
    1 4
    0 4
    0 4
    1 4
    0 4
    1 4
    0 4
    0 4
    0 4
    0 4
    0 4
    0 4
    1 4
    1 4
    0 4
    1 4
    0 4
    0 4
    2 4
    0 4
    0 4
    0 4
    1 4
    0 4
    1 4
    0 4
    0 4
    1 4
    end
    label values ED3 ED3
    label def ED3 0 "none", modify
    label def ED3 1 "Little 1", modify
    label def ED3 2 "Fluent 2", modify
    label values groups groups
    label def groups 3 "SC/STs", modify
    label def groups 4 "Muslims", modify

  • #2
    See this recent thread: https://www.statalist.org/forums/for...th-percentages

    Comment


    • #3
      Code:
      graph bar (percent)
      shows the percents of the data according to each cross-combination of those variables specified, namely their joint distribution, whereas what interests you are the conditional distributions of ED3 given groups.

      Otherwise put, in your graph command, Stata has no way of knowing that you regard the categorical variables as predictor and response, but must treat them symmetrically. The order of over() options affects the ordering of bars, not what is shown.

      A starker illustration of how these are different beasts comes from running this code:

      Code:
      sysuse auto, clear
      graph bar (percent) , over(rep78) over(foreign) scheme(s1color) name(G1, replace) blabel(bar, format(%2.1f))
      gen foreign2 = foreign * 100
      graph bar (mean) foreign2 , over(rep78)  scheme(s1color) name(G2, replace) blabel(bar, format(%2.1f)) ytitle(% foreign) ysc(r(. 85))
      graph combine G1 G2
      If you are still puzzled, please show the results of

      Code:
      preserve 
      
      contract ED3 groups 
      
      dataex 
      
      restore
      where what is most useful to people answering are the dataex results: the preserve and restore ensure that your data are only changed temporarily.

      Comment


      • #4
        #2 and #3 stop short of direct advice for your set-up, as the code in #3 hinges on there being 2 distinct values of foreign, which does not correspond to your outcome variable. .

        Evidently there are 3 distinct values of ED3 and let's say 5 distinct values of groups, although the approach doesn't hinge on the latter.

        The graph bar (percent) syntax is relatively recent and I am not as fluent in it as I am with other commands I've been using longer.

        Here is one, tabplot from the Stata Journal. I suggest that this form has various advantages, including

        1. Showing zero or very small amounts clearly. (That is a frequent problem with stacked bars.)

        2. Allowing a hybrid of graph and table form in which numbers are shown too for those who want to read them off, which should be anyone interested.

        3. Avoiding a legend or key. A legend is at best a necessary evil and obliges mental back and forth: which category is shown by blue bars? and so on. Lose the legend! Kill the key! if you can.

        Naturally no graph in this territory has all ideal features. In this display, comparisons across rows are easier than those within columns. Accordingly, choose axes to set up comparisons of most interest as those that are easy to discern.

        More could be done. In your real data, value labels for groups would show up on the x axis, although long labels could be a problem, which might lead to swapping axes. Colours can be changed, naturally.


        Code:
        clear 
        set obs 100 
        set seed 2803 
        gen ED3 = runiformint(0, 2)
        label def ED3 0 none 1 little 2 fluent 
        label val ED3 ED3 
        
        gen groups = ceil(_n/20)
        
        set scheme s1color 
        
        tabplot ED3 groups, percent(groups) subtitle(% by group) showval yreverse name(G1, replace) separate(ED3)
        Click image for larger version

Name:	williams.png
Views:	1
Size:	20.5 KB
ID:	1676359


        More resources are shown by

        Code:
        search tabplot, sj
        which will show the most recent update for code, which as I write is



        Code:
        SJ-22-2 gr0066_3  . . . . . . . . . . . . . . . .  Software update for tabplot
                (help tabplot if installed) . . . . . . . . . . . . . . . .  N. J. Cox
                Q2/22   SJ 22(2):467
                bug fixed; help file updated to include further references
        and the original article, which was

        Code:
        SJ-16-2 gr0066  . . . . . .  Speaking Stata: Multiple bar charts in table form
                (help tabplot if installed) . . . . . . . . . . . . . . . .  N. J. Cox
                Q2/16   SJ 16(2):491--510
                provides multiple bar charts in table form representing
                contingency tables for one, two, or three categorical variables
        noting that some options have been added since 2016.

        https://www.statalist.org/forums/for...updated-on-ssc gives a quick overview.


        Comment

        Working...
        X