Dear all, I found this question (how to draw a figure like below) somewhere and wish to see if anyone can have a suggestion.
-
Login or Register
- Log in with
sysuse nlsw88, clear gen minus = "-" gen pipe = "|" gen zero = 0 gen hoursj = hours + .4*runiform() twoway scatter wage zero, /// msymbol(i) mlab(minus) mlabpos(9) /// mlabcolor(%10) || /// scatter zero hoursj, msymbol(i) /// mlab(pipe) mlabpos(0) /// mlabcolor(%10) mlabsize(*.35) || /// scatter wage hours, msymbol(oh) /// legend(off) xscale(range(-1 80)) /// mcolor(%50)
* Load example file: sysuse nlsw88, clear * The distribution underlying the box plot for inspection: histogram wage, width(0.1) start(0) lc(navy%50) horizontal yla(, ang(h)) histogram hours, width(0.1) start(0) lc(navy%50) vertical * Graph the box plots for inspection: graph box wage, marker(1, ms(oh) mc(navy%50)) yla(, ang(h)) graph box hours, marker(1, ms(oh) mc(navy%50)) yla(, ang(h)) hor * Compute required data for Boxplot of Wage: gen maxX = 84 // boxplot x-scale position * The median and quartiles are easiest: egen med_w = median(wage) // egen upq_w = pctile(wage), p(75) // egen loq_w = pctile(wage), p(25) // egen mean_w = mean(wage) // * We could now get the IQR by subtraction, upq - loq, which would be more efficient, * but we will mention that it has its own egen function. egen iqr_w = iqr(wage) //, * For upper limits of whiskers: egen upper_w = max(min(wage, upq_w + 1.5 * iqr_w)) //, * For lower limits of whiskers: egen lower_w = min(max(wage, loq_w - 1.5 * iqr_w)) //, * Compute required data for Boxplot of Wage: gen maxY = 43 // boxplot y-scale position * The median and quartiles are easiest: egen med_h = median(hours) // egen upq_h = pctile(hours), p(75) // egen loq_h = pctile(hours), p(25) // egen mean_h = mean(hours) // * We could now get the IQR by subtraction, upq - loq, which would be more efficient, * but we will mention that it has its own egen function. egen iqr_h = iqr(hours) //, * For upper limits of whiskers: egen upper_h = max(min(hours, upq_h + 1.5 * iqr_h)) //, * For lower limits of whiskers: egen lower_h = min(max(hours, loq_h - 1.5 * iqr_h)) //, * Boxplots and scatterplot in the same figure: twoway /// Box plot of Wage (rbar med_w upq_w maxX, blc(navy) bfc(white) barw(1.7) ) /// (rbar med_w loq_w maxX, blc(navy) bfc(white) barw(1.7) ) /// (rspike upq_w upper_w maxX , blc(navy) ) /// (rspike loq_w lower_w maxX , blc(navy) ) /// (rcap upper_w upper_w maxX , blc(navy) msize(*1) ) /// (rcap lower_w lower_w maxX , blc(navy) msize(*1) ) /// (scatter mean_w maxX, pstyle(p1) ms(dh) msize(*1) mls(*.8) mlc(gs9) ) /// Diamond for mean (scatter wage maxX if !inrange(wage, lower_w, upper_w), ms(oh) mc(navy%20) legend(off)) /// Outliers (rbar med_h upq_h maxY, blc(navy) bfc(white) barw(1.3) hor) /// Box plot of Hours (rbar med_h loq_h maxY, blc(navy) bfc(white) barw(1.3) hor) /// (rspike upq_h upper_h maxY , blc(navy) hor) /// (rspike loq_h lower_h maxY , blc(navy) hor) /// (rcap upper_h upper_h maxY , blc(navy) msize(*1) hor) /// (rcap lower_h lower_h maxY , blc(navy) msize(*1) hor) /// (scatter maxY mean_h , pstyle(p1) ms(dh) msize(*1) mls(*.8) mlc(gs9) ) /// Diamond for mean (scatter maxY hours if !inrange(hours, lower_h, upper_h), ms(oh) mc(navy%20) legend(off)) /// Outliers (scatter wage hours, msymbol(oh) /// Scatter plot of Wage & Hours legend(off) xscale(range(-1 80)) /// mcolor(navy%50)) /// , ytitle(Hourly wage) /// General plot controls xtitle(Usual hours worked) * The above final code can be edited to modify the plot using the regular twoway options etc.
* Boxplots and scatterplot in the same figure, colors adjusted twoway /// Box plot of Wage (vertical) (rbar med_w upq_w maxX, blc(blue) bfc(white) barw(3) ) /// (rbar med_w loq_w maxX, blc(blue) bfc(white) barw(3) ) /// (rspike upq_w upper_w maxX , blc(blue) ) /// (rspike loq_w lower_w maxX , blc(blue) ) /// (rcap upper_w upper_w maxX , blc(blue) msize(*1) ) /// (rcap lower_w lower_w maxX , blc(blue) msize(*1) ) /// (scatter mean_w maxX, pstyle(p1) ms(dh) msize(*1) mls(*.8) mlc(navy%20) ) /// Diamond for mean (horizontal) (scatter wage maxX if !inrange(wage, lower_w, upper_w), ms(oh) mc(blue%20) legend(off)) /// Outliers (rbar med_h upq_h maxY, blc(red) bfc(white) barw(1.5) hor) /// Box plot of Hours (rbar med_h loq_h maxY, blc(red) bfc(white) barw(1.5) hor) /// (rspike upq_h upper_h maxY , blc(red) hor) /// (rspike loq_h lower_h maxY , blc(red) hor) /// (rcap upper_h upper_h maxY , blc(red) msize(*1) hor) /// (rcap lower_h lower_h maxY , blc(red) msize(*1) hor) /// (scatter maxY mean_h , pstyle(p1) ms(dh) msize(*1) mls(*.8) mlc(red%20) ) /// Diamond for mean (scatter maxY hours if !inrange(hours, lower_h, upper_h), ms(oh) mc(red%20) legend(off)) /// Outliers (scatter wage hours, msymbol(oh) /// Scatter plot of Wage & Hours legend(off) xscale(range(-1 80)) /// mcolor(navy%50)) /// , xsize(9) ysize(9) graphregion(margin(1 -3 1 -3)) /// General plot controls from here yscale(noextend) xscale(noextend) /// Disable axis extension ytitle(Hourly wage, margin(0 2 0 0)) /// specified margins; left, right, bottom, top xtitle(Usual hours worked, margin(0 0 0 2)) // last code line
Comment