Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adding percentages to bars in an histogram

    Dear Statalisters

    I want to create an histogram representing the percentages corresponding to each one of three-category variable by country. The name of the categorical variable is rel_ed in the code below.

    want the bars stacked and... each bar to be accompanied by its corresponding percentage.

    I have tried with catplot and graph bar. See next

    Code:
    catplot rel_ed country3, percent(country3) asyvars stack recast(bar)
    Code:
    graph bar, over(rel_ed) over(country3) percentage asyvars stack
    The result is the graph below....


    Click image for larger version

Name:	Graph.png
Views:	1
Size:	108.8 KB
ID:	1491397


    But I am not able to include an option that would allow me to attach a percentage to each corresponding bar. This is precisely what I would like to do.

    Do you happen to know if there is any option with catplot or graph bar that would allow to do so?

    Many thanks for your attention

    Luis Ortiz


  • #2
    Spellings there should be hypergamy and hypogamy.

    Naturally there is an option to add the numbers, but where are the labels to go, to be readable, on a stacked design?

    Code:
    help blabel_option

    Otherwise see this thread today https://www.statalist.org/forums/for...with-by-option

    and any others mentioning tabplot (Stata Journal).

    Comment


    • #3
      Many thanks for this, Nick....

      And my apologies for the misspelling.

      Thanks for guiding me to blabel_option. It worked.

      Next I copy the code, in case someone finds herself / himself in the same situation. And further below, I copy the graph.

      Code:
      graph bar if hisced3==4 & country3!=51, over(rel_ed) over(country3) percentage asyvars stack blabel(bar, position(inside) format(%9.1f))
      Click image for larger version

Name:	Graph2.png
Views:	1
Size:	112.7 KB
ID:	1491414


      Again, many thanks for your attention and your help

      Best

      Luis Ortiz

      Comment


      • #4
        I should be pleased you're pleased, but are you happy with that graph? It will be improved if you rotate it using graph hbar, change the aspect ratio and change the fill colours to much lighter, so that the numbers can be read. Whether that's enough to make it readable I don't know.

        There is also scope to change the sort order. Sort on one of the categories, not country name in English.

        Comment


        • #5
          No, I do not particularly like the graph, Nick.

          But I list I managed (following your suggestion) to attach percentages to bars.

          I very much appreciate your further suggestion (change bar colors and sort) to improve the readability of the graph.

          Many thanks again

          Best

          Luis

          Comment


          • #6
            OK, but I need a data example to test ideas seriously. All you need to do is

            Code:
            contract country3 rel_ed if hisced3==4 & country3!=51
            dataex
            and show us the results by copying and pasting between code delimiters.
            Last edited by Nick Cox; 02 Apr 2019, 11:53.

            Comment


            • #7
              Many thanks, Nick

              Here it goes

              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input float rel_ed double country3 long _freq
              1  3 2788
              2  3 1964
              3  3 1353
              .  3  131
              1  4  779
              2  4  459
              3  4  720
              .  4   28
              1  5 2146
              2  5 1131
              3  5 1100
              .  5   89
              1  8 5364
              2  8 3150
              3  8 2067
              .  8  115
              1  9  798
              2  9  437
              3  9  804
              .  9   29
              1 10 1534
              2 10  497
              3 10  811
              . 10   57
              1 13  834
              2 13  561
              3 13  661
              . 13   53
              1 15 1911
              2 15 1413
              3 15  710
              . 15   64
              1 18 1317
              2 18  760
              3 18  591
              . 18   23
              1 19 1056
              2 19  968
              3 19  371
              . 19   47
              1 20 1816
              2 20 1088
              3 20  579
              . 20   27
              1 21 1015
              2 21  602
              3 21  690
              . 21   35
              1 24 1170
              2 24  671
              3 24  589
              . 24    5
              1 27 1459
              2 27  653
              3 27  405
              . 27   28
              1 29 1088
              2 29  746
              3 29  608
              . 29   35
              1 30 1026
              2 30  742
              3 30  275
              . 30   25
              1 32 1621
              2 32 1185
              3 32  916
              . 32   42
              1 34 1289
              2 34  341
              3 34 1371
              . 34   80
              1 35  165
              2 35  108
              3 35  328
              1 39  987
              2 39  404
              3 39  567
              . 39   56
              1 40  747
              2 40  901
              3 40  338
              . 40   44
              1 43  637
              2 43  454
              3 43  658
              . 43   37
              1 47  384
              2 47  276
              3 47  586
              . 47    6
              1 48  748
              2 48  559
              3 48  716
              . 48   29
              1 52  657
              2 52  594
              3 52  333
              . 52   21
              1 64  877
              end
              label values rel_ed rel_ed
              label def rel_ed 1 "Homogamy", modify
              label def rel_ed 2 "Hipogamy", modify
              label def rel_ed 3 "Hipergamy", modify
              label values country3 cnt
              label def cnt 3 "Australia", modify
              label def cnt 4 "Austria", modify
              label def cnt 5 "Belgium", modify
              label def cnt 8 "Canada", modify
              label def cnt 9 "Switzerland", modify
              label def cnt 10 "Chile", modify
              label def cnt 13 "Czech Republic", modify
              label def cnt 15 "Denmark", modify
              label def cnt 18 "Spain", modify
              label def cnt 19 "Estonia", modify
              label def cnt 20 "Finland", modify
              label def cnt 21 "France", modify
              label def cnt 24 "Greece", modify
              label def cnt 27 "Hungary", modify
              label def cnt 29 "Ireland", modify
              label def cnt 30 "Iceland", modify
              label def cnt 32 "Italy", modify
              label def cnt 34 "Japan", modify
              label def cnt 35 "Korea", modify
              label def cnt 39 "Luxembourg", modify
              label def cnt 40 "Latvia", modify
              label def cnt 43 "Mexico", modify
              label def cnt 47 "Netherlands", modify
              label def cnt 48 "Norway", modify
              label def cnt 52 "Portugal", modify
              label def cnt 64 "Slovenia", modify
              I hope I'm doing it correctly

              Best

              Luis Ortiz

              Comment


              • #8
                Thanks. From #1 and #3 it seems that you have 29 countries. Using the dataex default of 100 observations means that you lose a few and Slovenia is truncated, but 25 is enough for me to play.

                I added two-letter country codes to the data for a reason you'll see shortly. It seems to me that hypo, homo, hyper is an ordered scale and once again I corrected the spellings. In producing a bar chart -- apart from ensuring readability -- the most important detail in my view is getting countries and response in a sensible order. As mentioned in #2 I used tabplot from the Stata Journal.

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input float rel_ed double country3 long _freq str2 code
                1  3 2788 "AU"
                2  3 1964 "AU"
                3  3 1353 "AU"
                .  3  131 "AU"
                1  4  779 "AT"
                2  4  459 "AT"
                3  4  720 "AT"
                .  4   28 "AT"
                1  5 2146 "BE"
                2  5 1131 "BE"
                3  5 1100 "BE"
                .  5   89 "BE"
                1  8 5364 "CA"
                2  8 3150 "CA"
                3  8 2067 "CA"
                .  8  115 "CA"
                1  9  798 "CH"
                2  9  437 "CH"
                3  9  804 "CH"
                .  9   29 "CH"
                1 10 1534 "CL"
                2 10  497 "CL"
                3 10  811 "CL"
                . 10   57 "CL"
                1 13  834 "CZ"
                2 13  561 "CZ"
                3 13  661 "CZ"
                . 13   53 "CZ"
                1 15 1911 "DK"
                2 15 1413 "DK"
                3 15  710 "DK"
                . 15   64 "DK"
                1 18 1317 "ES"
                2 18  760 "ES"
                3 18  591 "ES"
                . 18   23 "ES"
                1 19 1056 "EE"
                2 19  968 "EE"
                3 19  371 "EE"
                . 19   47 "EE"
                1 20 1816 "FI"
                2 20 1088 "FI"
                3 20  579 "FI"
                . 20   27 "FI"
                1 21 1015 "FR"
                2 21  602 "FR"
                3 21  690 "FR"
                . 21   35 "FR"
                1 24 1170 "GR"
                2 24  671 "GR"
                3 24  589 "GR"
                . 24    5 "GR"
                1 27 1459 "HU"
                2 27  653 "HU"
                3 27  405 "HU"
                . 27   28 "HU"
                1 29 1088 "IE"
                2 29  746 "IE"
                3 29  608 "IE"
                . 29   35 "IE"
                1 30 1026 "IS"
                2 30  742 "IS"
                3 30  275 "IS"
                . 30   25 "IS"
                1 32 1621 "IT"
                2 32 1185 "IT"
                3 32  916 "IT"
                . 32   42 "IT"
                1 34 1289 "JP"
                2 34  341 "JP"
                3 34 1371 "JP"
                . 34   80 "JP"
                1 35  165 "KR"
                2 35  108 "KR"
                3 35  328 "KR"
                1 39  987 "LU"
                2 39  404 "LU"
                3 39  567 "LU"
                . 39   56 "LU"
                1 40  747 "LV"
                2 40  901 "LV"
                3 40  338 "LV"
                . 40   44 "LV"
                1 43  637 "MX"
                2 43  454 "MX"
                3 43  658 "MX"
                . 43   37 "MX"
                1 47  384 "NL"
                2 47  276 "NL"
                3 47  586 "NL"
                . 47    6 "NL"
                1 48  748 "NO"
                2 48  559 "NO"
                3 48  716 "NO"
                . 48   29 "NO"
                1 52  657 "PT"
                2 52  594 "PT"
                3 52  333 "PT"
                . 52   21 "PT"
                1 64  877 "SI"
                end
                label values rel_ed rel_ed
                label def rel_ed 1 "Homogamy", modify
                label def rel_ed 2 "Hypogamy", modify
                label def rel_ed 3 "Hypergamy", modify
                label values country3 cnt
                label def cnt 3 "Australia", modify
                label def cnt 4 "Austria", modify
                label def cnt 5 "Belgium", modify
                label def cnt 8 "Canada", modify
                label def cnt 9 "Switzerland", modify
                label def cnt 10 "Chile", modify
                label def cnt 13 "Czech Republic", modify
                label def cnt 15 "Denmark", modify
                label def cnt 18 "Spain", modify
                label def cnt 19 "Estonia", modify
                label def cnt 20 "Finland", modify
                label def cnt 21 "France", modify
                label def cnt 24 "Greece", modify
                label def cnt 27 "Hungary", modify
                label def cnt 29 "Ireland", modify
                label def cnt 30 "Iceland", modify
                label def cnt 32 "Italy", modify
                label def cnt 34 "Japan", modify
                label def cnt 35 "Korea", modify
                label def cnt 39 "Luxembourg", modify
                label def cnt 40 "Latvia", modify
                label def cnt 43 "Mexico", modify
                label def cnt 47 "Netherlands", modify
                label def cnt 48 "Norway", modify
                label def cnt 52 "Portugal", modify
                label def cnt 64 "Slovenia", modify
                
                * drop what's useless 
                drop if missing(rel_ed)
                drop in L
                
                * percents, and then rank by homogamy (arbitrary choice)
                egen percent = pc(_freq) , by(country)
                egen rank = rank(percent) if rel_ed == 1
                bysort country (rank) : replace rank = rank[1] 
                
                * countries in rank order as a variable to use as one axis 
                egen group = group(rank country)
                * -labmask- is from the Stata Journal 
                labmask group, values(country) decode 
                
                * get marriage categories into order 
                recode rel_ed 1=2 2=1 3=3, gen(which)
                label def which 1 Hypogamy 2 Homogamy 3 Hypergamy 
                label val which which 
                
                * drop what we no longer need 
                drop rel_ed _freq 
                
                tabplot group which [iw=percent] , barw(0.8) yla(, labsize(small)) ///
                showval(offset(0.7) format(%2.0f)) horiz ytitle("") xtitle("") bfcolor(eltgreen*0.5) name(G1, replace)

                Click image for larger version

Name:	ortiz_G1.png
Views:	1
Size:	34.2 KB
ID:	1491570


                Perhaps better, but we still need to add four more countries!

                Naturally, the identity that the fractions in the three categories add to 100% = proportion 1 allows a triangular (trilinear, etc.) plot, but as often happens it doesn't help much as only some of the space is used. Hence as suggested in

                Cox, N.J. 2008. Trilinear plots and some alternatives. https://www.stata.com/meeting/uk08/abstracts.html

                an alternative to a plot of %x %y %z is a scatter plot of %z - %x versus %y. (That also preserves the information, as a little algebra shows.)

                Code:
                reshape wide percent , i(code) j(which)
                gen diff = (percent3 - percent1)
                scatter diff percent2, ms(none) mla(code) mlabpos(0) mlabc(blue) mlabsize(medsmall) ///
                xtitle(% homogamy) ytitle(% hypergamy {&minus} % hypogamy)                               ///
                yli(0, lc(gs12) lw(thin)) xla(20(10)60) yla(-40(10)40, ang(h)) aspect(1)          ///
                text(35 55 "hyper > hypo", color(red)) text(-35 55 "hypo > hyper", color(red)) name(G2, replace)
                Click image for larger version

Name:	ortiz_G2.png
Views:	1
Size:	29.5 KB
ID:	1491571


                So, presumably you're the sociologist, anthropologist, epidemiologist, whatever here -- or talking to some because you're the statistics/computing person -- does either help?

                Thinking about which parts of the graph above can't be reached could be an important detail.


                Comment


                • #9
                  That's a great help, Nick. Many thanks

                  Any one of the two alternative graphs that you have generated are far more readable, informative and appealing than the one I first showed in this post. Although, the second one is definitely better (clearer, more informative), the first one possible suits me better, because the graph is the result of a reviewer's request for providing some descriptive statistics of a key independent variable, which is precisely this one, classifying individuals according to the relative parental education of their parents: hypogamy, homogamy, hypergamy.

                  I have just a doubt, though.

                  It is disturbing to know that 4 countries were lost. I did not understand this. Is it because 'dataex' sampled data in my dataset but no observation of these four countries happened to appear in the resulting sample?

                  Again, many thanks for this wonderful session, which is being really instructive to me; and I hope to others too

                  Best

                  Luis

                  PD: Yes, I am a Sociologist/Demographer

                  Comment


                  • #10
                    Not to worry. As said, the only thing biting is the default of dataex, as explained in the help:

                    count(#) specifies a limit to the number of observations listed. The default is count(100).

                    Comment


                    • #11
                      Much relieved. Many thanks again

                      LO

                      Comment

                      Working...
                      X