Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Boxplot (?) and scatterplot in the same figure?

    Dear all, I found this question (how to draw a figure like below) somewhere and wish to see if anyone can have a suggestion.
    Click image for larger version

Name:	scatterplot and box.png
Views:	1
Size:	33.2 KB
ID:	1432127
    Ho-Chuan (River) Huang
    Stata 19.0, MP(4)

  • #2
    Code:
    sysuse nlsw88, clear
    
    gen minus = "-"
    gen pipe = "|"
    gen zero = 0
    gen hoursj = hours + .4*runiform()
    twoway scatter wage zero,                    ///
               msymbol(i) mlab(minus) mlabpos(9) ///
               mlabcolor(%10)                 || ///
           scatter zero hoursj, msymbol(i)       ///
               mlab(pipe) mlabpos(0)             ///
               mlabcolor(%10) mlabsize(*.35)  || ///
           scatter wage hours, msymbol(oh)       ///
               legend(off) xscale(range(-1 80))  ///
               mcolor(%50)
    Click image for larger version

Name:	Graph.png
Views:	1
Size:	220.2 KB
ID:	1432140
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      From Stata8 on, the command in #1 requires use of graph7 not graph.

      Comment


      • #4
        Dear Maarten, Thanks for the example. The graph seems not exactly what I had in mind. I expected a boxplot in the top and in the right. Thanks anyway.
        Ho-Chuan (River) Huang
        Stata 19.0, MP(4)

        Comment


        • #5
          Dear Nick, Thanks for the tip, It works. However, I wonder if there is a modern equivalent command to do the job.
          Ho-Chuan (River) Huang
          Stata 19.0, MP(4)

          Comment


          • #6
            Hi Nick, By the way, how can I change the color of background (say, from black to white).
            Code:
            sysuse auto, clear
            
            graph7 price mpg, two box ylab(4000(4000)16000) xlab(10(10)40)
            
            graph export "E:\graph7-box-scatterplot.png", as(png) replace
            Click image for larger version

Name:	graph7-box-scatterplot.png
Views:	1
Size:	64.6 KB
ID:	1432158

            Ho-Chuan (River) Huang
            Stata 19.0, MP(4)

            Comment


            • #7
              I might have known that in 2002, but some things one just forgets if you never use them. I don't think you can change the background colour.

              Best short over-simplified advice

              1. graph7 is there but a distraction to anyone who never used it in Stata before Stata 8.

              2. None of the new graphics should be assumed compatible with graph7.

              Comment


              • #8
                Hi, Nick: I see, and thanks again for your advices.
                Ho-Chuan (River) Huang
                Stata 19.0, MP(4)

                Comment


                • #9
                  By the way, you are finding examples elsewhere and often just saying "somewhere". I'd recommend giving the original site as URL or another reference. Many readers here read Chinese (I guess that's what it is; forgive my ignorance) and might find the original of interest. Those whi don't (like me) can just ignore the references. You can please both groups of readers by saying something like

                  I found this at <reference> (original in Chinese).

                  Comment


                  • #10
                    Do it yourself solution follows below, which is flexible, but 'heavy duty computing' is required, based on:
                    How can I best get box plots on logarithmic scales?
                    and
                    Cox, N. J. 2009. Speaking Stata: Creating and varying box plots. Stata Journal 9: 478–496

                    Code:
                    * Load example file:
                    sysuse nlsw88, clear
                    
                    * The distribution underlying the box plot for inspection:
                    histogram wage, width(0.1) start(0) lc(navy%50) horizontal yla(, ang(h))
                    histogram hours, width(0.1) start(0) lc(navy%50) vertical
                    
                    * Graph the box plots for inspection:
                    graph box wage, marker(1, ms(oh) mc(navy%50)) yla(, ang(h))
                    graph box hours, marker(1, ms(oh) mc(navy%50)) yla(, ang(h)) hor
                    
                    * Compute required data for Boxplot of Wage:
                    gen maxX = 84    // boxplot x-scale position
                    * The median and quartiles are easiest:
                    egen med_w = median(wage) //
                    egen upq_w = pctile(wage), p(75) //
                    egen loq_w = pctile(wage), p(25) //
                    egen mean_w = mean(wage) //
                    * We could now get the IQR by subtraction, upq - loq, which would be more efficient,
                    * but we will mention that it has its own egen function.
                    egen iqr_w = iqr(wage) //,
                    * For upper limits of whiskers:
                    egen upper_w = max(min(wage, upq_w + 1.5 * iqr_w)) //,
                    * For lower limits of whiskers:
                    egen lower_w = min(max(wage, loq_w - 1.5 * iqr_w)) //,
                    
                    * Compute required data for Boxplot of Wage:
                    gen maxY = 43 // boxplot y-scale position
                    * The median and quartiles are easiest:
                    egen med_h = median(hours) //
                    egen upq_h = pctile(hours), p(75) //
                    egen loq_h = pctile(hours), p(25) //
                    egen mean_h = mean(hours) //
                    * We could now get the IQR by subtraction, upq - loq, which would be more efficient,
                    * but we will mention that it has its own egen function.
                    egen iqr_h = iqr(hours) //,
                    * For upper limits of whiskers:
                    egen upper_h = max(min(hours, upq_h + 1.5 * iqr_h)) //,
                    * For lower limits of whiskers:
                    egen lower_h = min(max(hours, loq_h - 1.5 * iqr_h)) //,
                    
                    * Boxplots and scatterplot in the same figure:
                    twoway /// Box plot of Wage
                    (rbar med_w upq_w maxX, blc(navy) bfc(white) barw(1.7) ) ///
                    (rbar med_w loq_w maxX, blc(navy) bfc(white) barw(1.7) ) ///
                    (rspike upq_w upper_w maxX , blc(navy) ) ///
                    (rspike loq_w lower_w maxX , blc(navy) ) ///
                    (rcap upper_w upper_w maxX , blc(navy) msize(*1) ) ///
                    (rcap lower_w lower_w maxX , blc(navy) msize(*1) ) ///
                    (scatter mean_w maxX, pstyle(p1) ms(dh) msize(*1) mls(*.8) mlc(gs9) ) /// Diamond for mean
                    (scatter wage maxX if !inrange(wage, lower_w, upper_w), ms(oh) mc(navy%20) legend(off)) /// Outliers
                    (rbar med_h upq_h maxY, blc(navy) bfc(white) barw(1.3) hor) /// Box plot of Hours
                    (rbar med_h loq_h maxY, blc(navy) bfc(white) barw(1.3) hor) ///
                    (rspike upq_h upper_h maxY , blc(navy) hor) ///
                    (rspike loq_h lower_h maxY , blc(navy) hor) ///
                    (rcap upper_h upper_h maxY , blc(navy) msize(*1) hor) ///
                    (rcap lower_h lower_h maxY , blc(navy) msize(*1) hor) ///
                    (scatter maxY mean_h , pstyle(p1) ms(dh) msize(*1) mls(*.8) mlc(gs9) ) /// Diamond for mean
                    (scatter maxY hours if !inrange(hours, lower_h, upper_h), ms(oh) mc(navy%20) legend(off)) /// Outliers
                    (scatter wage hours, msymbol(oh) /// Scatter plot of Wage & Hours
                     legend(off) xscale(range(-1 80)) ///
                     mcolor(navy%50)) ///
                     , ytitle(Hourly wage) /// General plot controls
                     xtitle(Usual hours worked)
                    
                    * The above final code can be edited to modify the plot using the regular twoway options etc.
                    The plot should be like:

                    Click image for larger version

Name:	Boxplot_ScatterPlot_Wages_Hours_2018_03_01.png
Views:	1
Size:	374.7 KB
ID:	1432188
                    Last edited by ericmelse; 01 Mar 2018, 06:42.
                    http://publicationslist.org/eric.melse

                    Comment


                    • #11
                      Eric: Thanks for the reference to which I am duty bound to add the 2013 correction

                      http://www.stata-journal.com/sjpdf.h...lenum=gr0039_1

                      Comment


                      • #12
                        Dear Eric, The figure is great. Thanks you so much.

                        Ho-Chuan (River) Huang
                        Stata 19.0, MP(4)

                        Comment


                        • #13
                          Dear Erin, One additional question is: which part of the code should I modify in order to have different colors of boxplots (say, red for the vertical, blue for the horizontal one). Thanks.

                          Ho-Chuan (River) Huang
                          Stata 19.0, MP(4)

                          Comment


                          • #14

                            Dear Ho-Chuan,

                            Note that the code to create the plot uses Stata's twoway command and thus all the particulars of the visualization are controlled through the options as documented in the help files etc.
                            A great resource to learn about such options A Visual Guide to Stata Graphics

                            Here the example as requested, but note that I also set:
                            • to control the color of the box plots using the option, like: blc(blue)
                            • the width of the box plots for (visually) comparible sizes, e.g.: (rbar med_h upq_h maxY, blc(red) bfc(white) barw(1.5) hor) /// Box plot of Hours
                            • the graph region to a square format: xsize(9) ysize(9)
                            • adjust the graph region itself: graphregion(margin(1 -3 1 -3))
                            • disable axis extension, which works well with the visualization of the box plots: yscale(noextend) xscale(noextend)
                            • change the axis title margin for improved readability, e.g.: ytitle(Hourly wage, margin(0 2 0 0))
                            Code:
                            * Boxplots and scatterplot in the same figure, colors adjusted
                            twoway /// Box plot of Wage (vertical)
                            (rbar med_w upq_w maxX, blc(blue) bfc(white) barw(3) ) /// 
                            (rbar med_w loq_w maxX, blc(blue) bfc(white) barw(3) ) ///
                            (rspike upq_w upper_w maxX , blc(blue) ) ///
                            (rspike loq_w lower_w maxX , blc(blue) ) ///
                            (rcap upper_w upper_w maxX , blc(blue) msize(*1) ) ///
                            (rcap lower_w lower_w maxX , blc(blue) msize(*1) ) ///
                            (scatter mean_w maxX, pstyle(p1) ms(dh) msize(*1) mls(*.8) mlc(navy%20) ) /// Diamond for mean (horizontal)
                            (scatter wage maxX if !inrange(wage, lower_w, upper_w), ms(oh) mc(blue%20) legend(off)) /// Outliers
                            (rbar med_h upq_h maxY, blc(red) bfc(white) barw(1.5) hor) /// Box plot of Hours 
                            (rbar med_h loq_h maxY, blc(red) bfc(white) barw(1.5) hor) ///
                            (rspike upq_h upper_h maxY , blc(red) hor) ///
                            (rspike loq_h lower_h maxY , blc(red) hor) ///
                            (rcap upper_h upper_h maxY , blc(red) msize(*1) hor) ///
                            (rcap lower_h lower_h maxY , blc(red) msize(*1) hor) ///
                            (scatter maxY mean_h , pstyle(p1) ms(dh) msize(*1) mls(*.8) mlc(red%20) ) /// Diamond for mean
                            (scatter maxY hours if !inrange(hours, lower_h, upper_h), ms(oh) mc(red%20) legend(off)) /// Outliers
                            (scatter wage hours, msymbol(oh) /// Scatter plot of Wage & Hours
                                       legend(off) xscale(range(-1 80)) ///
                                       mcolor(navy%50)) ///
                             , xsize(9) ysize(9) graphregion(margin(1 -3 1 -3)) /// General plot controls from here
                             yscale(noextend) xscale(noextend) /// Disable axis extension
                             ytitle(Hourly wage, margin(0 2 0 0)) /// specified margins; left, right, bottom, top
                             xtitle(Usual hours worked, margin(0 0 0 2)) // last code line
                            Click image for larger version

Name:	Boxplot_ScatterPlot_Wages_Hours_2018_03_01redBlue.png
Views:	1
Size:	254.0 KB
ID:	1432330
                            http://publicationslist.org/eric.melse

                            Comment


                            • #15
                              Dear Erin, Thank you so much for the reply. It is so wonderful.

                              Ho-Chuan (River) Huang
                              Stata 19.0, MP(4)

                              Comment

                              Working...
                              X