Help: caterpillar plot or ranked box plot using Stata 14

Lynn Larose

Join Date: Feb 2017

Posts: 15
#1

Help: caterpillar plot or ranked box plot using Stata 14

28 Feb 2017, 08:16

Dear all,

I would like to plot mean serum concentrations and confidence intervals of a metabolite for 10 different cohorts. All data needs to appear on the same graph.

My colleagues have used SAS or R - both of which have direct codes for "caterpillar plots"
The caterpillar plots give a nice ranking to the data and automatically adjust the display of the cohorts (or grouping variable) from lowest metabolite mean to highest metabolite mean (for example).

I do not wish to change software, so any help with how to accomplish this using Stata would be most appreciated.

I tried a horizontal box plot which gave me mean metabolite concentration and confidence intervals for all cohort on the same graph (cohort is my grouping variable 1). This gave me an output similar to a "caterpillar plot" with the exception that the cohort means were not ranked from lowest metabolite mean to highest metabolite mean.

I then tried to edit the graph hoping I could drag and drop the cohorts to change the order that they appear on the graph.

I must need to "rank" the data somehow, but it is not obvious to me how to do this.

Please advise.
Thank-you!
Tags: caterpillar plot, graphics, ranked box plot, several cohorts
Nick Cox

Join Date: Mar 2014

Posts: 35713
#2

28 Feb 2017, 08:22

What's a caterpillar plot precisely? Such names are often bestowed on different plots (I can think of three quite distinct "waterfall plots"), so a precise reference would help.

More generally, I see here no example data and no code, so we have to work hard to imagine exactly what you did.
Comment
Lynn Larose

Join Date: Feb 2017

Posts: 15
#3

28 Feb 2017, 08:32

Thanks Nick - I appreciate your help and quick reply.

The SAS website says "The caterpillar plot is a side-by-side bar plot of 95% intervals for multiple parameters"

The code I used in Stata is:
graph hbox log2cot, over(cohort)

where log2cot refers to log base 2 distribution of my metabolite
and cohort is my grouping variable

For the output I see along the y axis all of my cohorts correctly listed but the box and whisker outputs are not presented in a way that shows logical organization of metabolite concentrations such that the first cohort closest to where the y and x axes meet would have the lowest metabolite concentration and every cohort as we go up the y axis would then have an increasing metabolite concentration.

Does that make sense? It seems I have to tell Stata to organize or rank my cohorts based on their metabolite concentration in order to have them appear on the graph in an orderly way.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35713
#4

28 Feb 2017, 08:39

Sorry to be, or to seem, awkward, but that word definition doesn't pin it down for me and I don't want to have to search for a precise reference. URL or literature reference, please.
Comment
Lynn Larose

Join Date: Feb 2017

Posts: 15
#5

28 Feb 2017, 08:47

Of course! Your time is valuable and I appreciate your effort and patience. Here are two URLs for images that show how I would like to display my data.
Both of these are referred to as "caterpillar graph" but they simply look like box and whisker plots with grouped data that has somehow been ranked or ordered for a nicer presentation.

https://www.researchgate.net/figure/...ach-regression
https://www.researchgate.net/figure/...f-brackets-The
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35713
#6

28 Feb 2017, 10:03

That helps greatly, but note that a box plot does not give mean concentrations and confidence intervals. It shows median and quartiles and usually other stuff too.

The main issue here is sorting groups on their means. Some commands do that for you, but it's not difficult to do it yourself. Calculate the means, map the categories to groups 1 upward (taking care to break ties) and then copy across values or value labels and variable labels of the original variable. labmask (search for download location) makes copying the values (in the example below the value labels) a one-line task.

Code:

sysuse auto, clear egen mean = mean(weight), by(foreign) egen group = group(mean foreign) labmask group, values(foreign) decode label var group "`: var label foreign'"

Now in terms of the graph my prejudice is that showing the data too is a really good idea, for example:

Code:

* -ssc inst stripplot- to install stripplot weight, bar vertical over(group)

If you want just means and confidence intervals, I recommend "the statsby strategy".
http://www.stata-journal.com/sjpdf.h...iclenum=gr0045 is a free read.

Code:

local ytitle "`: var label weight'" statsby upper=r(ub) lower=r(lb) mean=r(mean) , by(group) clear : ci means weight twoway scatter mean group || rspike upper lower group, /// xla(1 2, valuelabel) legend(off) xsc(r(0.5 2.5)) ytitle("`ytitle'")

Naturally your ci command might need to differ.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30116
#7

28 Feb 2017, 12:06

FWIW, caterpillar plots are frequently used to visualize the random effect estimates following estimation of a multi-level model. From what I have seen, the horizontal and vertical dimensions are typically the reverse of what is shown in the links at #5, but that is a trivial detail.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35713
#8

28 Feb 2017, 12:11

Clyde: If the text labels can be shown legibly that's fine by me.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30116
#9

28 Feb 2017, 12:18

Good point. In what I have typically seen, the text labels can't be shown legibly and there is a tendency to use rank number rather than a text label on that axis. Evidently, if one is interested in the specific random effects for particular units of analysis this is inconvenient (at best). But often one is only interested in looking the contour of the plot as a whole, or in seeing where particular subsets of the units of analysis fall along the curve. I think these are more typical of the applications I have seen. Ms. Larose's situation may well differ.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35713
#10

28 Feb 2017, 12:26

Indeed. In practice once there are more than about 30 groups one usually only notices the collective pattern and the extremes.
Comment
Lynn Larose

Join Date: Feb 2017

Posts: 15
#11

01 Mar 2017, 02:00

Hello Nick and Clyde - thank you!!! I will try this today and come back if I have any issues. Best, Lynn
Comment
Lynn Larose

Join Date: Feb 2017

Posts: 15
#12

01 Mar 2017, 03:59

Nick: This is very nice. Thank-you.

Re: stripplot log2cot, bar vertical over(cohort)

When I insert "horizontal" for "vertical" I get the following error message:

option horizontal not allowed
r(198);

Is there a way I can change the orientation of the graph so that my cohorts are on the y axis and my means are on the x axis?

Or is the stripplot command not flexible in this way?
Best, Lynn
Attached Files

Last edited by Lynn Larose; 01 Mar 2017, 04:14.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35713
#13

01 Mar 2017, 04:26

Horizontal is just the default. Omit vertical.
Comment
Lynn Larose

Join Date: Feb 2017

Posts: 15
#14

01 Mar 2017, 05:43

Thank-you, Nick.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35713
#15

01 Mar 2017, 05:46

Several cohorts seem to have bimodal distributions. What the mean is then estimating usefully might then be moot.
Comment

Announcement

Help: caterpillar plot or ranked box plot using Stata 14

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment