Changing location of labels in box plot graphs

Jonas Boehlke

Join Date: May 2019

Posts: 22
#1

Changing location of labels in box plot graphs

22 May 2019, 04:08

Hello,

I'm trying to create box plots displaying changes in occupational shares for occupation 1-9, separated by the indicator variable "advanced economies" which is 1 for advanced economies and 0 otherwise. The graph in its original form looks like this

The code I used is

Code:

graph hbox mean_d_occ, over(o_id, relabel(1 "Occ1" 2 "Occ2" 3 "Occ3" 4 "Occ4" 5 "Occ5" 6 "Occ6" 7 "Occ7" 8 "Occ8" 9 "Occ9")) /// ytitle(, alignment(middle)) nooutsides by(ae)

Now I wanted to change the colors of certain occupation to red, for which I changed the code as follows

Code:

graph hbox mean_d_occ, over(o_id, relabel(1 "Occ1" 2 "Occ2" 3 "Occ3" 4 "Occ4" 5 "Occ5" 6 "Occ6" 7 "Occ7" 8 "Occ8" 9 "Occ9")) /// asyvar box(1, color(navy)) box(2, color(navy)) box(3, color(navy)) box(5, color(navy)) box(4, color(maroon)) box(6, color(maroon)) box(7, color(maroon)) box(8, color(navy)) box(9, color(maroon)) /// ytitle(, alignment(middle)) nooutsides by(ae)

But by adding the color line (beginning with asyvar) to my code, the occupation labels moved to the legend
box, which looks like this:

Now I would like to get the labels back to the vertical axis. Any ideas on this?

Also I would like to move the advanced economies to the left and name them "advanced economies" instead of 1 and the others "emerging and developing economies" instead of 0.
Lastly, is there a way to get rid of the labels "Graphs by Advanced Economies" and "exclude outside values?

Thanks in advance for your help! :-)

Regards,
Jonas

Attached Files
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35727
#2

22 May 2019, 05:47

No data example here, but you want

horizontal box plots omitting outside values (!!!)

with one categorical variable defining rows, another categorical variable defining panels and yet another colouring differently within panels.

This shows some technique.

Code:

clear set obs `=9 * 2 * 5' set seed 2803 gen y = rnormal() egen x1 = seq(), to(9) block(5) egen x2 = seq(), to(2) block(45) separate y, by(inlist(x1, 4, 6, 7, 9)) veryshortlabel graph hbox y?, nooutsides over(x1) by(x2, note("") legend(off)) nofill note("")

Here's the graph.

I'd advise that omitting outside values should be explained somewhere. Otherwise the rest is just value labels, variable labels, preferred colours, and so forth.

Attached Files
Comment

Jonas Boehlke

Join Date: May 2019
Posts: 22

23 May 2019, 01:48

Thanks for your reply Nick! The graphs you've created look exactly as I want.
However, I'm having some trouble adapting the code you proposed to my case, as you seem to use a bit of a different setup than the STATA embedded box plot function.
Would it be possible to set up the code with my data with a data example? That would help me to place my variable names in the code.
The main variable on the horizontal dimension is mean_d_occ (mean percentage point change in occupational share), which I group over o_id (occupation ID), by the categorical variable ae (advanced economy).

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str33 country byte o_id str52 occupation float mean_d_occ byte ae
"Brazil"        1 "1. Legislators, senior officials and managers"        -.12857142 0
"Brazil"        2 "2. Professionals"                                       .1642857 0
"Brazil"        3 "3. Technicians and associate professionals"             .3142857 0
"Brazil"        4 "4. Clerks"                                             .14285715 0
"Brazil"        5 "5. Service workers and shop and market sales workers" -.47857144 0
"Brazil"        6 "6. Skilled agricultural and fishery workers"          -.27857143 0
"Brazil"        7 "7. Craft and related trades workers"                   -.6285715 0
"Brazil"        8 "8. Plant and machine operators and assemblers"                .4 0
"Brazil"        9 "9. Elementary occupations"                              .6928571 0
"United States" 1 "1. Legislators, senior officials and managers"          -.074684 1
"United States" 2 "2. Professionals"                                        .108574 1
"United States" 3 "3. Technicians and associate professionals"                .0265 1
"United States" 4 "4. Clerks"                                              -.117526 1
"United States" 5 "5. Service workers and shop and market sales workers"    .101038 1
"United States" 6 "6. Skilled agricultural and fishery workers"             .012752 1
"United States" 7 "7. Craft and related trades workers"                    -.068306 1
"United States" 8 "8. Plant and machine operators and assemblers"           -.14748 1
"United States" 9 "9. Elementary occupations"                               .159132 1
end

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35727
#4

23 May 2019, 02:11

I am away from computers for a while but hasten to confirm that my Stata code is utterly standard. Someone else may be able to help.
Comment
Jonas Boehlke

Join Date: May 2019

Posts: 22
#5

23 May 2019, 05:48

Okay, would anyone else be able to help?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35727
#6

26 May 2019, 00:23

I am looking at this again.Your data example doesn't lead to a box plot as single values don't result in a box plot being drawn.

In principle, you should just need to change the variable names in my example.

In practice I note that your very long occupation names will cause severe space problems.
Comment
Jonas Boehlke

Join Date: May 2019

Posts: 22
#7

28 May 2019, 02:05

Thanks for your response, Nick!
Well this was just a data excerpt for two, out of a sample of 90 countries.
Each country has a value for "mean_d_occ" in each of the 9 occupations.
For each of these 9 occupations I want to create a boxplot, based on around 90 datapoints, as I already did in my example.
On the left side for advanced economies (ae==1), on the right side for emerging and developing economies (ae==0).
With regards to the variable names the titles ("Occ1" 2 "Occ2" 3 "Occ3" 4 "Occ4" 5 "Occ5" 6 "Occ6" 7 "Occ7" 8 "Occ8" 9 "Occ9") or even number 1-9 are sufficient.
I just included the complete names in the data example to make it more understandable.

My main problem is still the coloring of boxes for occupations 4,6,7 and 9, which led to the unwanted legend, instead of vertical titles, next to the plots, as can be seen in the screenshots.
(By the way: sorry about posting the same picture about 5 times in my original post. I wasn't really familiar with the upload function.)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35727
#8

28 May 2019, 02:18

Sorry, but I am still at a loss to know what your precise problem is. In #3 you reported

I'm having some trouble adapting the code you proposed to my case,

"some trouble" is not a precise problem report

as you seem to use a bit of a different setup than the [Stata] embedded box plot [command]

not so, as confirmed in #4

Would it be possible to set up the code with my data with a data example?

Yes, in principle the code in #2 could be translated for any equivalent data example, but the data example in #3 yields no useful box plots, and otherwise doesn't show any problem with the code in #2.

Now in #7 you say

My main problem is still the coloring of boxes for occupations 4,6,7 and 9, which led to the unwanted legend

but already in #2 my code showed how to remove the legend.

Most of the code in #2 is just setting up a sandbox dataset. The key lines are just

Code:

separate y, by(inlist(x1, 4, 6, 7, 9)) veryshortlabel graph hbox y?, nooutsides over(x1) by(x2, note("") legend(off)) nofill note("")

which you need to translate to your data. y x1 x2 appear to correspond to your mean_occ o_id ae.
Comment
Jonas Boehlke

Join Date: May 2019

Posts: 22
#9

28 May 2019, 04:04

which you need to translate to your data. y x1 x2 appear to correspond to your mean_occ o_id ae.

Thanks Nick, that was the hint I needed. Now it looks perfect!

One final question on this: Can I change the order of the two sets of graphs i.e. have advanced economies on the left and
emerging and developing on the right?

My code is the following:

Code:

separate mean_d_occ, by(inlist(o_id, 4, 6, 7, 9)) veryshortlabel label define newlab 0 "Emerging and Developing Economies" 1 "Advanced Economies" label values ae newlab graph hbox mean_d_occ?, nooutsides over(o_id, relabel(1 "Occ1" 2 "Occ2" 3 "Occ3" 4 "Occ4" 5 "Occ5" 6 "Occ6" 7 "Occ7" 8 "Occ8" 9 "Occ9")) by(ae, note("") legend(off)) nofill note("")

And my graph looks like this:

Thanks,
Jonas
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35727
#10

28 May 2019, 04:46

Indeed. Just recode ae to say 1 0 rather than 0 and 1.
Comment
Jonas Boehlke

Join Date: May 2019

Posts: 22
#11

28 May 2019, 05:43

As simple as that! ;-)
Thanks Nick, you were of great help!
Comment

Announcement