Adding percentages to bars in an histogram

Luis Ortiz

Join Date: Dec 2014

Posts: 97
#1

Adding percentages to bars in an histogram

02 Apr 2019, 09:08

Dear Statalisters

I want to create an histogram representing the percentages corresponding to each one of three-category variable by country. The name of the categorical variable is rel_ed in the code below.

want the bars stacked and... each bar to be accompanied by its corresponding percentage.

I have tried with catplot and graph bar. See next

Code:

catplot rel_ed country3, percent(country3) asyvars stack recast(bar)

Code:

graph bar, over(rel_ed) over(country3) percentage asyvars stack

The result is the graph below....

But I am not able to include an option that would allow me to attach a percentage to each corresponding bar. This is precisely what I would like to do.

Do you happen to know if there is any option with catplot or graph bar that would allow to do so?

Many thanks for your attention

Luis Ortiz
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35698
#2

02 Apr 2019, 09:26

Spellings there should be hypergamy and hypogamy.

Naturally there is an option to add the numbers, but where are the labels to go, to be readable, on a stacked design?

Code:

help blabel_option

Otherwise see this thread today https://www.statalist.org/forums/for...with-by-option

and any others mentioning tabplot (Stata Journal).
Comment
Luis Ortiz

Join Date: Dec 2014

Posts: 97
#3

02 Apr 2019, 09:54

Many thanks for this, Nick....

And my apologies for the misspelling.

Thanks for guiding me to blabel_option. It worked.

Next I copy the code, in case someone finds herself / himself in the same situation. And further below, I copy the graph.

Code:

graph bar if hisced3==4 & country3!=51, over(rel_ed) over(country3) percentage asyvars stack blabel(bar, position(inside) format(%9.1f))

Again, many thanks for your attention and your help

Best

Luis Ortiz
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#4

02 Apr 2019, 10:02

I should be pleased you're pleased, but are you happy with that graph? It will be improved if you rotate it using graph hbar, change the aspect ratio and change the fill colours to much lighter, so that the numbers can be read. Whether that's enough to make it readable I don't know.

There is also scope to change the sort order. Sort on one of the categories, not country name in English.
Comment
Luis Ortiz

Join Date: Dec 2014

Posts: 97
#5

02 Apr 2019, 10:29

No, I do not particularly like the graph, Nick.

But I list I managed (following your suggestion) to attach percentages to bars.

I very much appreciate your further suggestion (change bar colors and sort) to improve the readability of the graph.

Many thanks again

Best

Luis
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#6

02 Apr 2019, 11:42

OK, but I need a data example to test ideas seriously. All you need to do is

Code:

contract country3 rel_ed if hisced3==4 & country3!=51 dataex

and show us the results by copying and pasting between code delimiters.

Last edited by Nick Cox; 02 Apr 2019, 11:53.
Comment

Luis Ortiz

Join Date: Dec 2014
Posts: 97

03 Apr 2019, 03:31

Many thanks, Nick

Here it goes

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float rel_ed double country3 long _freq
1  3 2788
2  3 1964
3  3 1353
.  3  131
1  4  779
2  4  459
3  4  720
.  4   28
1  5 2146
2  5 1131
3  5 1100
.  5   89
1  8 5364
2  8 3150
3  8 2067
.  8  115
1  9  798
2  9  437
3  9  804
.  9   29
1 10 1534
2 10  497
3 10  811
. 10   57
1 13  834
2 13  561
3 13  661
. 13   53
1 15 1911
2 15 1413
3 15  710
. 15   64
1 18 1317
2 18  760
3 18  591
. 18   23
1 19 1056
2 19  968
3 19  371
. 19   47
1 20 1816
2 20 1088
3 20  579
. 20   27
1 21 1015
2 21  602
3 21  690
. 21   35
1 24 1170
2 24  671
3 24  589
. 24    5
1 27 1459
2 27  653
3 27  405
. 27   28
1 29 1088
2 29  746
3 29  608
. 29   35
1 30 1026
2 30  742
3 30  275
. 30   25
1 32 1621
2 32 1185
3 32  916
. 32   42
1 34 1289
2 34  341
3 34 1371
. 34   80
1 35  165
2 35  108
3 35  328
1 39  987
2 39  404
3 39  567
. 39   56
1 40  747
2 40  901
3 40  338
. 40   44
1 43  637
2 43  454
3 43  658
. 43   37
1 47  384
2 47  276
3 47  586
. 47    6
1 48  748
2 48  559
3 48  716
. 48   29
1 52  657
2 52  594
3 52  333
. 52   21
1 64  877
end
label values rel_ed rel_ed
label def rel_ed 1 "Homogamy", modify
label def rel_ed 2 "Hipogamy", modify
label def rel_ed 3 "Hipergamy", modify
label values country3 cnt
label def cnt 3 "Australia", modify
label def cnt 4 "Austria", modify
label def cnt 5 "Belgium", modify
label def cnt 8 "Canada", modify
label def cnt 9 "Switzerland", modify
label def cnt 10 "Chile", modify
label def cnt 13 "Czech Republic", modify
label def cnt 15 "Denmark", modify
label def cnt 18 "Spain", modify
label def cnt 19 "Estonia", modify
label def cnt 20 "Finland", modify
label def cnt 21 "France", modify
label def cnt 24 "Greece", modify
label def cnt 27 "Hungary", modify
label def cnt 29 "Ireland", modify
label def cnt 30 "Iceland", modify
label def cnt 32 "Italy", modify
label def cnt 34 "Japan", modify
label def cnt 35 "Korea", modify
label def cnt 39 "Luxembourg", modify
label def cnt 40 "Latvia", modify
label def cnt 43 "Mexico", modify
label def cnt 47 "Netherlands", modify
label def cnt 48 "Norway", modify
label def cnt 52 "Portugal", modify
label def cnt 64 "Slovenia", modify

I hope I'm doing it correctly

Best

Luis Ortiz

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35698

03 Apr 2019, 05:41

Thanks. From #1 and #3 it seems that you have 29 countries. Using the dataex default of 100 observations means that you lose a few and Slovenia is truncated, but 25 is enough for me to play.

I added two-letter country codes to the data for a reason you'll see shortly. It seems to me that hypo, homo, hyper is an ordered scale and once again I corrected the spellings. In producing a bar chart -- apart from ensuring readability -- the most important detail in my view is getting countries and response in a sensible order. As mentioned in #2 I used tabplot from the Stata Journal.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float rel_ed double country3 long _freq str2 code
1  3 2788 "AU"
2  3 1964 "AU"
3  3 1353 "AU"
.  3  131 "AU"
1  4  779 "AT"
2  4  459 "AT"
3  4  720 "AT"
.  4   28 "AT"
1  5 2146 "BE"
2  5 1131 "BE"
3  5 1100 "BE"
.  5   89 "BE"
1  8 5364 "CA"
2  8 3150 "CA"
3  8 2067 "CA"
.  8  115 "CA"
1  9  798 "CH"
2  9  437 "CH"
3  9  804 "CH"
.  9   29 "CH"
1 10 1534 "CL"
2 10  497 "CL"
3 10  811 "CL"
. 10   57 "CL"
1 13  834 "CZ"
2 13  561 "CZ"
3 13  661 "CZ"
. 13   53 "CZ"
1 15 1911 "DK"
2 15 1413 "DK"
3 15  710 "DK"
. 15   64 "DK"
1 18 1317 "ES"
2 18  760 "ES"
3 18  591 "ES"
. 18   23 "ES"
1 19 1056 "EE"
2 19  968 "EE"
3 19  371 "EE"
. 19   47 "EE"
1 20 1816 "FI"
2 20 1088 "FI"
3 20  579 "FI"
. 20   27 "FI"
1 21 1015 "FR"
2 21  602 "FR"
3 21  690 "FR"
. 21   35 "FR"
1 24 1170 "GR"
2 24  671 "GR"
3 24  589 "GR"
. 24    5 "GR"
1 27 1459 "HU"
2 27  653 "HU"
3 27  405 "HU"
. 27   28 "HU"
1 29 1088 "IE"
2 29  746 "IE"
3 29  608 "IE"
. 29   35 "IE"
1 30 1026 "IS"
2 30  742 "IS"
3 30  275 "IS"
. 30   25 "IS"
1 32 1621 "IT"
2 32 1185 "IT"
3 32  916 "IT"
. 32   42 "IT"
1 34 1289 "JP"
2 34  341 "JP"
3 34 1371 "JP"
. 34   80 "JP"
1 35  165 "KR"
2 35  108 "KR"
3 35  328 "KR"
1 39  987 "LU"
2 39  404 "LU"
3 39  567 "LU"
. 39   56 "LU"
1 40  747 "LV"
2 40  901 "LV"
3 40  338 "LV"
. 40   44 "LV"
1 43  637 "MX"
2 43  454 "MX"
3 43  658 "MX"
. 43   37 "MX"
1 47  384 "NL"
2 47  276 "NL"
3 47  586 "NL"
. 47    6 "NL"
1 48  748 "NO"
2 48  559 "NO"
3 48  716 "NO"
. 48   29 "NO"
1 52  657 "PT"
2 52  594 "PT"
3 52  333 "PT"
. 52   21 "PT"
1 64  877 "SI"
end
label values rel_ed rel_ed
label def rel_ed 1 "Homogamy", modify
label def rel_ed 2 "Hypogamy", modify
label def rel_ed 3 "Hypergamy", modify
label values country3 cnt
label def cnt 3 "Australia", modify
label def cnt 4 "Austria", modify
label def cnt 5 "Belgium", modify
label def cnt 8 "Canada", modify
label def cnt 9 "Switzerland", modify
label def cnt 10 "Chile", modify
label def cnt 13 "Czech Republic", modify
label def cnt 15 "Denmark", modify
label def cnt 18 "Spain", modify
label def cnt 19 "Estonia", modify
label def cnt 20 "Finland", modify
label def cnt 21 "France", modify
label def cnt 24 "Greece", modify
label def cnt 27 "Hungary", modify
label def cnt 29 "Ireland", modify
label def cnt 30 "Iceland", modify
label def cnt 32 "Italy", modify
label def cnt 34 "Japan", modify
label def cnt 35 "Korea", modify
label def cnt 39 "Luxembourg", modify
label def cnt 40 "Latvia", modify
label def cnt 43 "Mexico", modify
label def cnt 47 "Netherlands", modify
label def cnt 48 "Norway", modify
label def cnt 52 "Portugal", modify
label def cnt 64 "Slovenia", modify

* drop what's useless 
drop if missing(rel_ed)
drop in L

* percents, and then rank by homogamy (arbitrary choice)
egen percent = pc(_freq) , by(country)
egen rank = rank(percent) if rel_ed == 1
bysort country (rank) : replace rank = rank[1] 

* countries in rank order as a variable to use as one axis 
egen group = group(rank country)
* -labmask- is from the Stata Journal 
labmask group, values(country) decode 

* get marriage categories into order 
recode rel_ed 1=2 2=1 3=3, gen(which)
label def which 1 Hypogamy 2 Homogamy 3 Hypergamy 
label val which which 

* drop what we no longer need 
drop rel_ed _freq 

tabplot group which [iw=percent] , barw(0.8) yla(, labsize(small)) ///
showval(offset(0.7) format(%2.0f)) horiz ytitle("") xtitle("") bfcolor(eltgreen*0.5) name(G1, replace)

Click image for larger version

Name: ortiz_G1.png
Views: 1
Size: 34.2 KB
ID: 1491570

Perhaps better, but we still need to add four more countries!

Naturally, the identity that the fractions in the three categories add to 100% = proportion 1 allows a triangular (trilinear, etc.) plot, but as often happens it doesn't help much as only some of the space is used. Hence as suggested in

Cox, N.J. 2008. Trilinear plots and some alternatives. https://www.stata.com/meeting/uk08/abstracts.html

an alternative to a plot of %x %y %z is a scatter plot of %z - %x versus %y. (That also preserves the information, as a little algebra shows.)

Code:

reshape wide percent , i(code) j(which)
gen diff = (percent3 - percent1)
scatter diff percent2, ms(none) mla(code) mlabpos(0) mlabc(blue) mlabsize(medsmall) ///
xtitle(% homogamy) ytitle(% hypergamy {&minus} % hypogamy)                               ///
yli(0, lc(gs12) lw(thin)) xla(20(10)60) yla(-40(10)40, ang(h)) aspect(1)          ///
text(35 55 "hyper > hypo", color(red)) text(-35 55 "hypo > hyper", color(red)) name(G2, replace)

Click image for larger version

Name: ortiz_G2.png
Views: 1
Size: 29.5 KB
ID: 1491571

So, presumably you're the sociologist, anthropologist, epidemiologist, whatever here -- or talking to some because you're the statistics/computing person -- does either help?

Thinking about which parts of the graph above can't be reached could be an important detail.

Comment

Luis Ortiz

Join Date: Dec 2014

Posts: 97
#9

03 Apr 2019, 06:36

That's a great help, Nick. Many thanks

Any one of the two alternative graphs that you have generated are far more readable, informative and appealing than the one I first showed in this post. Although, the second one is definitely better (clearer, more informative), the first one possible suits me better, because the graph is the result of a reviewer's request for providing some descriptive statistics of a key independent variable, which is precisely this one, classifying individuals according to the relative parental education of their parents: hypogamy, homogamy, hypergamy.

I have just a doubt, though.

It is disturbing to know that 4 countries were lost. I did not understand this. Is it because 'dataex' sampled data in my dataset but no observation of these four countries happened to appear in the resulting sample?

Again, many thanks for this wonderful session, which is being really instructive to me; and I hope to others too

Best

Luis

PD: Yes, I am a Sociologist/Demographer
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#10

03 Apr 2019, 09:46

Not to worry. As said, the only thing biting is the default of dataex, as explained in the help:

count(#) specifies a limit to the number of observations listed. The default is count(100).
Comment
Luis Ortiz

Join Date: Dec 2014

Posts: 97
#11

03 Apr 2019, 10:28

Much relieved. Many thanks again

LO
Comment

Announcement