Bar Graph: Categorical variables

Fernando Bastidas

Join Date: Nov 2020

Posts: 28
#1

Bar Graph: Categorical variables

22 May 2023, 20:43

Hi!

I want to make a bar graph comparing the educational level between male and females in a specific dataset. The educational level is categorical variable which takes the values from 0 to 6 and the sex variable is a dummy which takes the values 0 or 1. I want to make a bar graph where the x axis shows the educational level for each sex, and the y axis the values or the % and in each category of educational level.

I would really appreciate your help.
Tags: bar, categorical, graph

Nick Cox

Join Date: Mar 2014
Posts: 35589

23 May 2023, 00:55

"values" I guess means counts or frequencies. There is no data example here, but this sandbox dataset shows some technique to be adapted. I use catplot from SSC which is a wrapper here for graph hbar, recast to graph bar.

Copy the script to your do-file editor, run to see some possibilities and then decide what to do different, such as specify 6 colours.

Code:

sysuse auto, clear 
rename rep78 Education
rename foreign Female 
label def female 0 Male 1 Female 
label val Female female 

* omit if installed 
ssc install catplot 

catplot Education Female , recast(bar) name(G1, replace)

catplot Education Female , percent(Female) recast(bar) name(G2, replace)

local opts bar(1, color(red*0.6)) bar(2, col(red*0.2)) bar(3, col(blue*0.2)) bar(4, col(blue*0.6))  bar(5, col(blue))

catplot Education Female , percent(Female) recast(bar) asyvars `opts' name(G3, replace)

catplot Female Education, percent(Female) recast(bar) asyvars name(G4, replace)

Comment

Marc Kaulisch

Join Date: Jan 2016

Posts: 182
#3

23 May 2023, 00:57

I highly recommend looking into -tabplot- from Stata Journal in your case.
Depending on the message you like to send I would also look at -waffle- chart (search in Stata and have a look at https://medium.com/the-stata-guide/s...s-32afc7d6f6dd) or Mosaic plots (search marimekko or look at https://medium.com/the-stata-guide/s...s-49caa27c5554).
1 like
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35589

23 May 2023, 02:54

If you're tempted by waffle charts, first read https://www.perceptualedge.com/artic...e_for_kids.pdf and see if it changes your mind.

tabplot is a familiar command which I too am happy to endorse. Here are some examples. See also the yreverse option.

Code:

sysuse auto, clear 
rename rep78 Education
rename foreign Female 
label def female 0 Male 1 Female 
label val Female female 

label var Education 
label var Female 

tabplot Education Female , showval horizontal xtitle("") name(G5, replace)

tabplot Education Female , percent(Female) horizontal showval xtitle("") name(G6, replace)

Comment

Marc Kaulisch

Join Date: Jan 2016

Posts: 182
#5

23 May 2023, 08:49

Originally posted by Nick Cox View Post

If you're tempted by waffle charts, first read https://www.perceptualedge.com/artic...e_for_kids.pdf and see if it changes your mind.

Indeed, the author has convincing arguments againts unit charts / waffle plots. Nonetheless, I think that in the case of displaying percentages of a binary variable (female/male) by a multi-categorical variable - it may be an illustrative way to show larger differences between the categories. More detailed information (like percentages and N) needs to be placed within the plot.

Code:

ssc install waffle_plot waffle_plot Female, by(Education) name(waffle_1, replace)

I took a closer look at my second recommendation. It would most likely be achieved by -spineplot- from Stata Journal (authored by Nick Cox)

Code:

spineplot Education Female, percent name(spineplot, replace)

In my opinion this gives the best overview of the whole sample and its composition - but is less convincing when the graph is used to compare the categories.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35589
#6

23 May 2023, 08:56

My guess is that being female is a predictor here, not an outcome. So, the main focus would be composition of education levels by female.

We can't easily further discuss what works well, or best, with Fernando Bastidas's data without seeing those data. The 2 x 6 table of counts would naturally be enough to calculate percentages too.
Comment

Announcement