Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Coloring box plot according to value of another variable

    Hello,

    Apologies if this is answered elsewhere, I have read through help files and done a bit of googling, but cannot find the answer.

    I have a fairly simple dataset of student scores (idelapct) nested in ECCD Centers (centerid) nested in districts (districtid). After looking through the data, it's clear to me that the ICC in one district (which contains a lot of minority language speakers) is dramatically higher than the other three districts. I would like to represent this graphically through a series of boxplots showing the scores within each school.

    This is easy enough to do with

    graph box, over(centerid)

    Click image for larger version

Name:	GraphStata.png
Views:	3
Size:	55.2 KB
ID:	1408642

    But I would also like to differentiate the boxplots and color them according to their districtid to highlight this disparity. In R what I am trying to do would be achieved by:

    boxplot(mydata$idelapct ~ mydata$centerid, col = mydata$centerid)


    Click image for larger version

Name:	Rplot.png
Views:	2
Size:	7.2 KB
ID:	1408644
    Is there an equivalent expression in Stata that I could use?

    Thanks!

    Jonathan
    Attached Files

  • #2
    You can use -separate- to split the variable of interest by some value and then use an 'over()' option in your boxplot command to get what you need. See the example on page 234 here for an example of what you're describing: http://www.stata-journal.com/article...article=gr0058
    Eric A. Booth | Senior Director of Research | Far Harbor | Austin TX

    Comment


    • #3
      Thank you! This is exactly what I was looking for. This is what I ended up doing:

      preserve

      separate idelapct, by(district)

      label variable idelapct1 "Pakokku Township"
      label variable idelapct2 "Min Bu Township"
      label variable idelapct3 "Saw Township"
      label variable idelapct4 "Hpruso Township"

      graph box idelapct?, over(centerid, gap(*-2) label(labsize(vsmall) alt)) ///
      name(IDELA_by_Center_District, replace) xsize(8) b1title("ECCD Centers") ///
      ylab( 0 "0" .2 "20%" .4 "40%" .6 "60%" .8 "80%" 1 "100%", ///
      angle(horizontal)) title("Total IDELA Scores by District") ///
      ytitle("IDELA Score Range")

      restore, preserve


      Which gave me this beautiful result:

      Click image for larger version

Name:	IDELA_by_Center_District.png
Views:	1
Size:	57.3 KB
ID:	1408663

      Comment


      • #4
        Consider the further option here.

        Code:
        graph box idelapct?, over(centerid, gap(*-2) label(labsize(vsmall) alt)) ///
        name(IDELA_by_Center_District, replace) xsize(8) b1title("ECCD Centers") ///
        ylab( 0 "0" .2 "20%" .4 "40%" .6 "60%" .8 "80%" 1 "100%", ///
        angle(horizontal)) title("Total IDELA Scores by District") ///
        ytitle("IDELA Score Range") by(district, compact row(1) xrescale)
        and using scheme(s1color) to get rid of the blue backdrop (unless you like it!).

        Comment

        Working...
        X