Coloring box plot according to value of another variable

Jonathan Seiden

Join Date: Aug 2017

Posts: 28
#1

Coloring box plot according to value of another variable

31 Aug 2017, 09:46

Hello,

Apologies if this is answered elsewhere, I have read through help files and done a bit of googling, but cannot find the answer.

I have a fairly simple dataset of student scores (idelapct) nested in ECCD Centers (centerid) nested in districts (districtid). After looking through the data, it's clear to me that the ICC in one district (which contains a lot of minority language speakers) is dramatically higher than the other three districts. I would like to represent this graphically through a series of boxplots showing the scores within each school.

This is easy enough to do with

graph box, over(centerid)

But I would also like to differentiate the boxplots and color them according to their districtid to highlight this disparity. In R what I am trying to do would be achieved by:

boxplot(mydata$idelapct ~ mydata$centerid, col = mydata$centerid)

Is there an equivalent expression in Stata that I could use?

Thanks!

Jonathan

Attached Files
Tags: None
eric_a_booth

Join Date: Apr 2014

Posts: 292
#2

31 Aug 2017, 09:53

You can use -separate- to split the variable of interest by some value and then use an 'over()' option in your boxplot command to get what you need. See the example on page 234 here for an example of what you're describing: http://www.stata-journal.com/article...article=gr0058

Eric A. Booth | Senior Director of Research | Far Harbor | Austin TX
1 like
Comment
Jonathan Seiden

Join Date: Aug 2017

Posts: 28
#3

31 Aug 2017, 11:59

Thank you! This is exactly what I was looking for. This is what I ended up doing:

preserve

separate idelapct, by(district)

label variable idelapct1 "Pakokku Township"
label variable idelapct2 "Min Bu Township"
label variable idelapct3 "Saw Township"
label variable idelapct4 "Hpruso Township"

graph box idelapct?, over(centerid, gap(*-2) label(labsize(vsmall) alt)) ///
name(IDELA_by_Center_District, replace) xsize(8) b1title("ECCD Centers") ///
ylab( 0 "0" .2 "20%" .4 "40%" .6 "60%" .8 "80%" 1 "100%", ///
angle(horizontal)) title("Total IDELA Scores by District") ///
ytitle("IDELA Score Range")

restore, preserve

Which gave me this beautiful result:
1 like
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35699

31 Aug 2017, 12:15

Consider the further option here.

Code:

graph box idelapct?, over(centerid, gap(*-2) label(labsize(vsmall) alt)) ///
name(IDELA_by_Center_District, replace) xsize(8) b1title("ECCD Centers") ///
ylab( 0 "0" .2 "20%" .4 "40%" .6 "60%" .8 "80%" 1 "100%", ///
angle(horizontal)) title("Total IDELA Scores by District") ///
ytitle("IDELA Score Range") by(district, compact row(1) xrescale)

and using scheme(s1color) to get rid of the blue backdrop (unless you like it!).

Announcement

Coloring box plot according to value of another variable

Comment

Comment

Comment