Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Changing location of labels in box plot graphs

    Hello,

    I'm trying to create box plots displaying changes in occupational shares for occupation 1-9, separated by the indicator variable "advanced economies" which is 1 for advanced economies and 0 otherwise. The graph in its original form looks like this
    Click image for larger version

Name:	Graph_without colors.png
Views:	2
Size:	31.8 KB
ID:	1499487

    The code I used is
    Code:
    graph hbox mean_d_occ, over(o_id, relabel(1 "Occ1" 2 "Occ2" 3 "Occ3" 4 "Occ4" 5 "Occ5" 6 "Occ6" 7 "Occ7" 8 "Occ8" 9 "Occ9")) ///
    ytitle(, alignment(middle)) nooutsides  by(ae)
    Now I wanted to change the colors of certain occupation to red, for which I changed the code as follows
    Code:
    graph hbox mean_d_occ, over(o_id, relabel(1 "Occ1" 2 "Occ2" 3 "Occ3" 4 "Occ4" 5 "Occ5" 6 "Occ6" 7 "Occ7" 8 "Occ8" 9 "Occ9")) ///
    asyvar box(1, color(navy)) box(2, color(navy)) box(3, color(navy)) box(5, color(navy)) box(4, color(maroon)) box(6, color(maroon)) box(7, color(maroon)) box(8, color(navy)) box(9, color(maroon)) ///
    ytitle(, alignment(middle)) nooutsides  by(ae)
    But by adding the color line (beginning with asyvar) to my code, the occupation labels moved to the legend
    box, which looks like this:

    Click image for larger version

Name:	Graph_with colors.png
Views:	5
Size:	72.9 KB
ID:	1499492


    Now I would like to get the labels back to the vertical axis. Any ideas on this?

    Also I would like to move the advanced economies to the left and name them "advanced economies" instead of 1 and the others "emerging and developing economies" instead of 0.
    Lastly, is there a way to get rid of the labels "Graphs by Advanced Economies" and "exclude outside values?

    Thanks in advance for your help! :-)

    Regards,
    Jonas

    Attached Files

  • #2
    No data example here, but you want

    horizontal box plots omitting outside values (!!!)

    with one categorical variable defining rows, another categorical variable defining panels and yet another colouring differently within panels.

    This shows some technique.


    Code:
    clear 
    set obs `=9 * 2 * 5' 
    set seed 2803 
    gen y = rnormal()
    egen x1 = seq(), to(9) block(5) 
    egen x2 = seq(), to(2) block(45) 
    
    separate y, by(inlist(x1, 4, 6, 7, 9)) veryshortlabel
    graph hbox y?, nooutsides over(x1) by(x2, note("") legend(off)) nofill note("")
    Here's the graph.


    Click image for larger version

Name:	boxnooutsides.png
Views:	2
Size:	21.4 KB
ID:	1499509


    I'd advise that omitting outside values should be explained somewhere. Otherwise the rest is just value labels, variable labels, preferred colours, and so forth.



    Attached Files

    Comment


    • #3
      Thanks for your reply Nick! The graphs you've created look exactly as I want.
      However, I'm having some trouble adapting the code you proposed to my case, as you seem to use a bit of a different setup than the STATA embedded box plot function.
      Would it be possible to set up the code with my data with a data example? That would help me to place my variable names in the code.
      The main variable on the horizontal dimension is mean_d_occ (mean percentage point change in occupational share), which I group over o_id (occupation ID), by the categorical variable ae (advanced economy).


      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str33 country byte o_id str52 occupation float mean_d_occ byte ae
      "Brazil"        1 "1. Legislators, senior officials and managers"        -.12857142 0
      "Brazil"        2 "2. Professionals"                                       .1642857 0
      "Brazil"        3 "3. Technicians and associate professionals"             .3142857 0
      "Brazil"        4 "4. Clerks"                                             .14285715 0
      "Brazil"        5 "5. Service workers and shop and market sales workers" -.47857144 0
      "Brazil"        6 "6. Skilled agricultural and fishery workers"          -.27857143 0
      "Brazil"        7 "7. Craft and related trades workers"                   -.6285715 0
      "Brazil"        8 "8. Plant and machine operators and assemblers"                .4 0
      "Brazil"        9 "9. Elementary occupations"                              .6928571 0
      "United States" 1 "1. Legislators, senior officials and managers"          -.074684 1
      "United States" 2 "2. Professionals"                                        .108574 1
      "United States" 3 "3. Technicians and associate professionals"                .0265 1
      "United States" 4 "4. Clerks"                                              -.117526 1
      "United States" 5 "5. Service workers and shop and market sales workers"    .101038 1
      "United States" 6 "6. Skilled agricultural and fishery workers"             .012752 1
      "United States" 7 "7. Craft and related trades workers"                    -.068306 1
      "United States" 8 "8. Plant and machine operators and assemblers"           -.14748 1
      "United States" 9 "9. Elementary occupations"                               .159132 1
      end

      Comment


      • #4
        I am away from computers for a while but hasten to confirm that my Stata code is utterly standard. Someone else may be able to help.

        Comment


        • #5
          Okay, would anyone else be able to help?

          Comment


          • #6
            I am looking at this again.Your data example doesn't lead to a box plot as single values don't result in a box plot being drawn.

            In principle, you should just need to change the variable names in my example.

            In practice I note that your very long occupation names will cause severe space problems.

            Comment


            • #7
              Thanks for your response, Nick!
              Well this was just a data excerpt for two, out of a sample of 90 countries.
              Each country has a value for "mean_d_occ" in each of the 9 occupations.
              For each of these 9 occupations I want to create a boxplot, based on around 90 datapoints, as I already did in my example.
              On the left side for advanced economies (ae==1), on the right side for emerging and developing economies (ae==0).
              With regards to the variable names the titles ("Occ1" 2 "Occ2" 3 "Occ3" 4 "Occ4" 5 "Occ5" 6 "Occ6" 7 "Occ7" 8 "Occ8" 9 "Occ9") or even number 1-9 are sufficient.
              I just included the complete names in the data example to make it more understandable.

              My main problem is still the coloring of boxes for occupations 4,6,7 and 9, which led to the unwanted legend, instead of vertical titles, next to the plots, as can be seen in the screenshots.
              (By the way: sorry about posting the same picture about 5 times in my original post. I wasn't really familiar with the upload function.)



              Comment


              • #8
                Sorry, but I am still at a loss to know what your precise problem is. In #3 you reported

                I'm having some trouble adapting the code you proposed to my case,
                "some trouble" is not a precise problem report

                as you seem to use a bit of a different setup than the [Stata] embedded box plot [command]
                not so, as confirmed in #4

                Would it be possible to set up the code with my data with a data example?
                Yes, in principle the code in #2 could be translated for any equivalent data example, but the data example in #3 yields no useful box plots, and otherwise doesn't show any problem with the code in #2.

                Now in #7 you say

                My main problem is still the coloring of boxes for occupations 4,6,7 and 9, which led to the unwanted legend
                but already in #2 my code showed how to remove the legend.

                Most of the code in #2 is just setting up a sandbox dataset. The key lines are just

                Code:
                separate y, by(inlist(x1, 4, 6, 7, 9)) veryshortlabel  
                graph hbox y?, nooutsides over(x1) by(x2, note("") legend(off)) nofill note("")
                which you need to translate to your data. y x1 x2 appear to correspond to your mean_occ o_id ae.

                Comment


                • #9
                  which you need to translate to your data. y x1 x2 appear to correspond to your mean_occ o_id ae.
                  Thanks Nick, that was the hint I needed. Now it looks perfect!

                  One final question on this: Can I change the order of the two sets of graphs i.e. have advanced economies on the left and
                  emerging and developing on the right?

                  My code is the following:
                  Code:
                  separate mean_d_occ, by(inlist(o_id, 4, 6, 7, 9)) veryshortlabel
                  label define newlab 0 "Emerging and Developing Economies" 1 "Advanced Economies"
                  label values ae newlab  
                  graph hbox mean_d_occ?, nooutsides over(o_id, relabel(1 "Occ1" 2 "Occ2" 3 "Occ3" 4 "Occ4" 5 "Occ5" 6 "Occ6" 7 "Occ7" 8 "Occ8" 9 "Occ9")) by(ae, note("") legend(off)) nofill note("")

                  And my graph looks like this:

                  Click image for larger version

Name:	Graph.png
Views:	1
Size:	29.0 KB
ID:	1500419


                  Thanks,
                  Jonas

                  Comment


                  • #10
                    Indeed. Just recode ae to say 1 0 rather than 0 and 1.

                    Comment


                    • #11
                      As simple as that! ;-)
                      Thanks Nick, you were of great help!

                      Comment

                      Working...
                      X