Display number of observations per category in -graph box-

Max Piper

Join Date: Dec 2015

Posts: 61
#1

Display number of observations per category in -graph box-

04 Apr 2017, 06:33

Given the following code:

Code:

sysuse auto,clear set graphics on graph box mpg, over (foreign) ytitle(fuel economy) noout

how can I display the number of observations that make up each of the two plots? i.e. the numbers 52 and 22 as obtained from

Code:

tab foreign
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35736
#2

04 Apr 2017, 07:11

No outliers; just say No to that!

You can probably do it with graph box using repeated calls to the text() option and perhaps otherwise. I think it's a little easier with stripplot from SSC.

Code:

sysuse auto,clear bysort foreign: egen count = count(mpg) gen y = 8 stripplot mpg, pctile(0) box boffset(0) ms(none) vertical over(foreign) /// addplot(scatter y foreign, ms(none) mlabsize(medium) mla(count) mlabpos(0)) xla(, tlc(none)) aspect(1)
Comment
Max Piper

Join Date: Dec 2015

Posts: 61
#3

04 Apr 2017, 07:59

Originally posted by Nick Cox View Post

No outliers; just say No to that!

So you mean "No" to "no outliers"?

Originally posted by Nick Cox View Post

You can probably do it with graph box using repeated calls to the text() option and perhaps otherwise. I think it's a little easier with stripplot from SSC.

Code:

sysuse auto,clear bysort foreign: egen count = count(mpg) gen y = 8 stripplot mpg, pctile(0) box boffset(0) ms(none) vertical over(foreign) /// addplot(scatter y foreign, ms(none) mlabsize(medium) mla(count) mlabpos(0)) xla(, tlc(none)) aspect(1)

Where exactly can I add what were options to graph box? I used

Code:

graphregion(color(white)) plotregion(color(white)) marker(1, mcolor(black)) box(1, color(black)) intensity(0)

but get the result

variable black not found

Last edited by Max Piper; 04 Apr 2017, 08:12.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35736
#4

04 Apr 2017, 08:17

I was expressing concern at the practice of omitting outliers.

You can work out the answer to your other question quite easily. The stripplot code has an addplot() option which adds an extra plot

Code:

scatter y foreign, ms(none) mlabsize(medium) mla(count) mlabpos(0)

That's a scatter plot of a new variable y against foreign. y has been generated as a constant 8 and looking at the graph in #2 shows that to be the vertical coordinate controlling where the counts are shown as text. The horizontal coordinate is given by foreign, as the options over(foreign) vertical imply that foreign is the x axis variable. The small trick is to suppress the markers that come with a scatter plot but to show marker labels.

8 is empirical. A more general recipe would be

minimum of response variable MINUS fraction of range of response

where the fraction has to be chosen according to what looks good and fits on the graph too. Nothing stops you putting it at the top instead; putting it anywhere else would be legal too. You didn't express a preference for where it should it go.

The extra option

Code:

xsc(alt)

is worthy of experiment.
Comment
Max Piper

Join Date: Dec 2015

Posts: 61
#5

06 Apr 2017, 16:54

Thank you for your very helpful answer.

With stripplot, I have a few more issues.

I am now still facing a "problem" with extreme data values. I think it is reasonable for me to omit an extreme value only for the visualization, since including it would make the whole boxplot unreadable. I tried to use the commands yscale(range(...)) and yla(...) in a vertical stripplot, but to no avail (when specified as smaller than the value range). Since mean, median etc. are defined before these axis specifications, this should have no detrimental effects on the precise quantitative results of my work, as difference-in-mean tests etc:

Next, I am trying to vary the size of the axis labels (both ticks and variable names). In the above, I figured to make the marker labels bigger by factor 1.4, but not the labels on the axes (1,2,3, return on equity, and the unreadable numbers on the left).

Third, I am wondering about the possibility to add those adjacent lines at the top and bottom of the whiskers (1.5*iqr line).

Those are things that I could implement in graph box, but it seems that stripplot has some, but not all commands equivalent.

Given the MWE as follows, can these be implemented at all, and if so, how?

Code:

sysuse auto, clear stripplot mpg, over(foreign) iqr box(barwidth(0.6)) vertical yscale(range(0 20)) yla(0(5)20)

Last edited by Max Piper; 06 Apr 2017, 16:59.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35736
#6

06 Apr 2017, 19:02

stripplot is mine. I guess you know that, but the point is important for anyone reading this to understand what follows. Its implementation is entirely in terms of graph twoway and nothing at all to do with graph box (or graph hbox; I will mention the sibling just once). I haven't tried to mimic any features of the latter except insofar as I want them on other grounds.

Naturally I can see your main difficulty. In most such cases I just use a transformed scale, but you may be unable or unwilling to do that. With distributions such as those I often wouldn't use means any way unless to show their limitations, but again you may be unable or unwilling to do that.

I don't make it easy or even possible to omit data points with stripplot except by using if or in to select data, and then the box (or bar; again I mention that just once) calculations are based entirely on what is selected, which is not what you want here. On a variety of grounds, not least the way I use the program in teaching, I wouldn't want to complicate the program by letting the box and everything else be inconsistently based, even if users request that specifically. In short, I don't think it is a good idea.

Beyond that it is standard Stata graph philosophy that you can never omit data as a side-effect of changing axis scale or axis label options. In that respect and others, whatever I program cannot subvert whatever graph twoway doesn't allow.

But the code is there and anyone can (a) clone and modify the program to do what they want (ideally with whatever acknowledgment is appropriate) or (b) write another program to do what they prefer.So you could plot a box and whatever else you want as well, and for that a better starting point would be code in http://www.stata-journal.com/sjpdf.h...iclenum=gr0039 (noting corrections in http://www.stata-journal.com/sjpdf.h...lenum=gr0039_1).

Note that there is a whiskers() option.
1 like
Comment

Announcement

Display number of observations per category in -graph box-

Comment

Comment

Comment

Comment

Comment