subsetplot available on SSC

Clyde Schechter

Join Date: Apr 2014

Posts: 30165
#16

30 Sep 2014, 10:14

Nick,

I don't know if they have this expression on your side of the pond, but I would classify -subsetplot- as "the best invention since sliced bread." I've already used it a half dozen times--it's the program I've been waiting for for years and didn't even realize I needed.

Thanks.
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35779
#17

30 Sep 2014, 10:21

Thanks very much. That expression is idiomatic in Britain too.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35779
#18

03 Oct 2014, 02:53

Thanks to Kit Baum, a fixed version is now on SSC.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35779
#19

04 Sep 2016, 05:03

This bumps the thread to add a cross-reference to http://stats.stackexchange.com/quest...es-in-one-plot That thread includes more Stata examples and discussions about R implementations. I add old and new references as I discover them.

This must be old hat in many fields; I just don't know the references.

A well-known graphics expert hinted to me that the approach is trivial for those using fully interactive graphics systems as the idea of brushing (i.e highlighting) a subset is basic in such systems. I can believe that.

Last edited by Nick Cox; 04 Sep 2016, 05:25.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35779
#20

01 Dec 2016, 14:39

To recap: the idea here is

1. plot each subset of the data separately

2. but with the rest of the data as backdrop.

Some references are

Cairo, A. 2016. The Truthful Art: Data, Charts, and Maps for Communication. San Francisco, CA: New Riders. p.211

Camões, J. 2016. Data at Work: Best Practices for Creating Effective Charts and Information Graphics in Microsoft Excel. San Francisco, CA: New Riders. p.354

Cox, N.J. 2010. Graphing subsets. Stata Journal 10: 670-681.

Knaflic, C.N. 2015. Storytelling with Data: A Data Visualization Guide for Business Professionals. Hoboken, NJ: Wiley.

Koenker, R. 2005. Quantile Regression. Cambridge: Cambridge University Press. See pp.12-13.

Rougier, N.P., Droettboom, M. and Bourne, P.E. 2014. Ten simple rules for better figures. PLOS Computational Biology 10(9): e1003833. doi:10.1371/journal.pcbi.1003833 http://journals.plos.org/ploscompbio...l.pcbi.1003833

Schwabish, J.A. 2014. An economist's guide to visualizing data. Journal of Economic Perspectives 28: 209-234.

Unwin, A. 2015. Graphical Data Analysis with R. Boca Raton, FL: CRC Press.

Wallgren, A., B. Wallgren, R. Persson, U. Jorner, and J.-A. Haaland. 1996. Graphing Statistics and Data: Creating Better Charts. Newbury Park, CA: Sage.

-- and there may be examples in Schwabish's 2016 book, not yet to my hand.

Returning to this I got around -- some while after realising that there was another way to do it -- to working out the exact strategy.

This way produces nicer graphs, just using by() alone and without the awkwardness of graph combine, but the price is that you have to take control of the decisions. I don't envisage writing this up as a self-contained command because the approach is extensible so long as you make sure that the data you want for any graphical extras are all still in view.

Here is a step-by-step guide,

Code:

* #1 read in your data sysuse auto, clear * #2 identify the grouping variable * will use -rep78- * #3 if dataset is small, or memory tight, * keep only the observations and variables you need keep if rep78 < . keep mpg weight rep78 * #4 we need as many copies of the dataset as * there are groups * PLUS 1 if a graph is required for all the data * here 5 (+ 1) groups * the other groups will be split, into * subset highlighted PLUS others as backdrop gen long id = _n expand 6 bysort id : gen group = _n * #5 separate data for each group into two separate mpg, by(group == rep78) * #6 optionally fix a label for all the data * and fix the values so that all are shown label def group 6 "all cars" label val group group replace mpg1 = mpg if group == 6 replace mpg0 = . if group == 6 * #7 now we can draw graphs! local note "subsets highlighted in turn" scatter mpg? weight, by(group, note("`note'") legend(off)) /// ms(+ Oh) mc(gs12 blue) subtitle(, fcolor(ltblue*0.5)) /// ytitle("`: var label mpg'") yla(, ang(h)) more scatter mpg? weight, by(group, note("`note'") legend(off)) /// ms(+ none) mc(gs12 blue) /// mlabel(. rep78) mlabcolor(. blue) mlabpos(0 0) mlabsize(. medium) /// subtitle(, fcolor(ltblue*0.5)) /// ytitle("`: var label mpg'") yla(, ang(h))

and here are example graphs.

Last edited by Nick Cox; 01 Dec 2016, 14:42.
Comment
Chris Vecchio

Join Date: Apr 2014

Posts: 25
#21

02 Dec 2016, 06:33

Thanks for the additional references. I've used subsetplot charts many times and have received positive feedback from colleagues. However there are still those in my organization who (strongly) prefer spaghetti plots to subsetplot style charts. Having these additional references will hopefully make it easier to argue my case against spaghetti plots.

Also, I love your latest example. I already see 2 or 3 of my current graphs I can apply the technique to. Thanks for sharing!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35779
#22

02 Dec 2016, 06:53

Chris: Thanks for your interest and encouragement. Producing these graphs depends on being able to differentiate what is highlighted and what is not (e.g. using greyscale or very thin lines for the backdrop), and it's only fairly recently that that has become easy for most of us. Having said that, I may now stumble across an ancient example. At present the Swedish book in 1996 was almost there and Koenker in 2005 definitely was, but there must be other examples -- just far, far fewer in number than spaghetti-type graphs.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35779
#23

21 Dec 2016, 12:24

See now also

Schwabish, J. 2017. Better Presentations: A Guide for Scholars, Researchers, and Wonks. New York: Columbia University Press. p.98.
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35779
#24

13 Apr 2017, 02:24

Another one:

Carr, D.B. and Pickle, L.W. 2010. Visualizing Data Patterns with Micromaps. Boca Raton, FL: CRC Press. p.85.
Comment
Chris Vecchio

Join Date: Apr 2014

Posts: 25
#25

14 Apr 2017, 06:32

Thanks for the additional recommendation! I somehow missed your post about Schwabish but have just purchased it and Carr/Pickle.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35779
#26

18 Apr 2017, 10:06

Chris Vecchio I posted a review of Schwabisch's book at https://www.amazon.com/Better-Presen.../dp/0231175213

Stata users interested in this book should note the one comment in my review on a Stata example.
1 like
Comment
Chris Vecchio

Join Date: Apr 2014

Posts: 25
#27

21 Apr 2017, 08:19

Nick Cox Excellent and fair review. Sadly page 101 is hidden in the preview so I'll have to wait to see what the author claimed was a Stata default graph. To your point on complete sentences, I can say I've greatly benefited from this when referencing your presentations.

I'm happy to report that our library has ordered several visualization technique books, including Better Presentations. These will add to our current collection containing a single (seemingly barely used) book, The Elements of Graphing Data.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35779
#28

21 Apr 2017, 08:45

William S. Cleveland's book The elements of graphing data is in my view the best single book a library could have on statistical graphics,

Ideally you have the second edition from Hobart Press.

I say this while continuing to admire the work of Edward Tufte, particularly the first of his self-published books.

A graph like Schwabish's is easy to emulate.

Code:

sysuse lifeexp.dta, clear scatter lexp gnppc , mla(country)

and yes, it's a mess. But Stata isn't telling you to use it in a presentation.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35779

#29

09 Jun 2018, 09:11

As a sequel particularly to #20 this further experimentation arose out of a thread on Cross Validated: https://stats.stackexchange.com/ques...rends-properly The problem concerns death rates for 9 European countries and 11 years 1927 to 1937, hardly an enormous dataset, but challenging enough!

Here in one are code and graph for one of the suggestions there:

Code:

clear
input int year double(de fr be nl den ch aut cz pl)
1927 10.9 16.5   13 10.2 11.6 12.4   15   16 17.3
1928 11.2 16.4 12.8  9.6   11   12 14.5 15.1 16.4
1929 11.4 17.9 14.4 10.7 11.2 12.5 14.6 15.5 16.7
1930 10.4 15.6 12.8  9.1 10.8 11.6 13.5 14.2 15.6
1931 10.4 16.2 12.7  9.6 11.4 12.1   14 14.4 15.5
1932 10.2 15.8 12.7    9   11 12.2 13.9 14.1   15
1933 10.8 15.8 12.7  8.8 10.6 11.4 13.2 13.7 14.2
1934 10.6 15.1 11.7  8.4 10.4 11.3 12.7 13.2 14.4
1935 11.4 15.7 12.3  8.7 11.1 12.1 13.7 13.5   14
1936 11.7 15.3 12.2  8.7   11 11.4 13.2 13.3 14.2
1937 11.5   15 12.5  8.8 10.8 11.3 13.3 13.3   14
end

rename (de-pl) (death=)
reshape long death, i(year) j(country) string

egen where = group(country), label
gen long id = _n
expand 9
bysort id : gen group = _n
label val group where

separate death, by(group == where)

local note "countries highlighted in turn"
set scheme s1color
sort group country year
twoway  line death0 year, lc(gs12) by(group, compact note("`note'") legend(off)) ///
subtitle(, fcolor(ltblue*0.5)) c(L) xtitle("") ///
ytitle("death rate, yearly deaths per 1000") yla(8(2)18, ang(h)) xla(1927(5)1937, format(%tyY)) ///
|| connected death1 year, lc(blue) mc(blue) ms(oh)

Click image for larger version

Name: deaths3.png
Views: 1
Size: 125.4 KB
ID: 1448301

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35779
#30

14 Jun 2018, 14:09

Thanks to Kit Baum, a new program fabplot is downloadable from SSC. Stata 9 is required.

The name may seem alarming, as if groovyplot were next along, and I am all too vividly recalling 1960s slang or have been overdosing on Austin Powers movies.

Consider the plight of the poor Stata programmer choosing the name of a new program. The name has to follow syntactic rules, it should be easy to spell and pronounce, and ideally it should match some standard name for whatever you're doing. I don't know any standard name for this procedure.

Anyway, the idea is that fabplot can be decoded as front and back plot, or foreground and backdrop plot.

That aside, how does fabplot differ from subsetplot?

In essence fabplot follows the approach of #20 and #29 in rearranging data so that the graph is done using a by() option. That means less repetition of stuff along the axes.

The following examples follow (and slightly correct) the help for fabplot.

Code:

set scheme s1color sysuse auto, clear fabplot scatter mpg weight, by(rep78) name(G1, replace) more fabplot scatter mpg weight, frontopts(ms(none) mla(rep78) mlabsize(*1.5) mlabpos(0) mlabcolor(blue)) by(rep78) name(G2, replace) more webuse grunfeld fabplot line invest year, by(company) xtitle("") ysc(log) yla(1 10 100 1000) name(G3, replace) more fabplot line invest year, by(company) xtitle("") ysc(log) yla(1 10 100 1000) front(connect) frontopts(mc(blue) lc(blue)) name(G4, replace)

Attached Files
3 likes
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment