I would like to display data related to medicine samples that are outside specific testing limits (continuous variable dev_all_gr), by medicine type (categorical variable molnum). The data cluster close to the limits, and I would like to show that. Data look like this (sorry, you need quite a bit of it to recreate the probelm):
Code:
* Example generated by -dataex-. For more info, type help dataex clear input byte molnum float dev_all_gr 4 -5.26 5 -7.37 3 -7.75 2 18.57 5 13.97 3 -11.09 5 -10.95 3 -9.05 5 -9.2 1 -54 3 -5.77 3 -5.06 4 -5.52 3 -9.35 4 9.85 1 -53.54 5 -5.32 3 -6.23 1 -5.08 5 -6.3 3 -8.9 4 -10.57 3 -5.3 4 -7.41 5 -8.3 3 -5.34 1 -7.6 3 -5.53 5 -5.02 4 15.4 5 -9.55 5 -6.18 3 -32.96 3 -7.34 5 -6.75 3 -12.05 5 -6.13 1 -5.02 4 -10.8 1 11.53 5 -8 1 -9.7 5 5.83 4 -8.83 4 7.39 4 5.19 3 -6.08 1 -7.54 2 5.336169 5 -8.58 5 -10.17 3 -8.5 2 7.64 5 -6.21 1 -7.47 1 -17.32 1 -44.07 4 -11.15 4 -7.92 4 10.78 3 -12.85 4 -7.13 5 -6.54 1 85.61 1 87.26 1 40.13 3 -5.02 3 -8.91 3 -13.23 4 -7.72 3 -5.8 2 18.48 3 -15.32 5 -12.64 5 -13.71 5 -10.39 3 -7.92 3 -5.29 3 -8.08 3 -12.03 1 -5.56 3 -11.57 5 -11.99 3 -5.64 4 7.97 5 -5.52 5 -8.99 1 8.53 5 -7.56 5 -10.4 3 -8.52 5 -7.7 5 -22.6 1 -6.62 1 -49.75 5 -9.52 5 -11.71 5 -14.92 5 -13.58 5 -9.58 end label values molnum molnum label values dev_all_gr graph_scale
Code:
twoway scatter dev_all_gr molnum, jitter(3) jitterseed(2) msymbol(diamond) msize(tiny) /// yline(-5 5, lpattern(dash) lcolor(orange))
Then I explored the stripplot option, using the very helpful dofiles examples posted by Nick Cox and colleagues in the help files.
I am using stripplot, newly installed from SSC on Stata/MP 18 .0 (revision 13/07/2023) for Mac Silicon.
Code:
stripplot dev_all_gr, over(molnum) stack height(0.8) ms(Sh) vertical bar(level(95)) yla(, ang(h)) /// yline(-5 5, lpattern(dash) lcolor(orange))
Though I use the stack command, the data do not stack. I tried the same command line on the auto file, and it works just fine:
Code:
sysuse auto, clear stripplot mpg, over(foreign) stack height(0.8) ms(Sh) vertical bar(level(95)) yla(, ang(h))
From this I conclude that it must be something to do with the dev_all_gr variable itself, but I am at a loss to guess what.
Is it possible to restrict the jitter command to only jittering horizontally (so that data points to not get jittered vertically into the acceptability zone)? I guess I could come up with some other kludge involving drawing the limit lines (which are in any case simpy a representation) in a different place. But that would be harder to do for other graph formats where I am having similar probelm, for example this: (again, the data points should not overlap the rbar).
(Code for the above -- but not the data) are below FYI:
Code:
twoway rbar p25 med inn_graph if inn==1, barw(.35) color(purple) horizontal || /// rbar med p75 inn_graph if inn==1, barw(.35) color(purple*0.5) horizontal || /// scatter inn_graph price_ratio_med if inn==1 & (price_ratio_med <p25 | price_ratio_med >p75), jitter(5) jitterseed(3) msize(tiny) mcolor(purple) msymbol(circle) || /// rbar p25 med inn_graph if inn==0, barw(.35) color(orange) horizontal || /// rbar med p75 inn_graph if inn==0, barw(.35) color(orange*0.5) horizontal || /// scatter inn_graph price_ratio_med if inn==0 & (price_ratio_med <p25 | price_ratio_med >p75) & price!=0, jitter(5) jitterseed(3) msize(tiny) mcolor(orange) msymbol(circle) /// ytitle("") ylabel(1 2, valuelabel labsize(small) angle(0)) yscale(r(0.5 2.5)) /// legend(off)
For the above twoway rbar/scatter, I spent a while trying to find a way to avoid the mess of overlapping data points completely by overlaying a kernel distribution on each of the values of inn (branded/unrbaded) without drawing separate graphs (for other purposes, we want to do this over a variable with 6 categories). However, I was defeated.
All suggestions gratefully received.
Comment