Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Nick Cox
    started a topic subsetplot available on SSC

    subsetplot available on SSC

    Thanks to Kit Baum as usual, a new program subsetplot is now available to download from SSC. Stata 8.2 is required.

    subsetplot produces an array of scatter or other twoway plots for yvarlist versus xvar according to a further variable byvar. There is one plot for observations for each distinct subset of byvar in which data for that subset are highlighted and the rest of the data shown as backdrop. Graphs are drawn individually and then combined with graph combine.

    That's a little abstract, but some examples should help. We all know that if you want to compare relationships graphically between groups of observations, we can superimpose different groups in a single plot, or juxtapose different groups in several plots. This is a hybrid approach combining elements of those two strategies. Consider this code:

    Code:
    set scheme s1color
    sysuse auto, clear
    subsetplot scatter mpg weight, subset(ms(none) mla(rep78) mlabsize(*1.5) mlabpos(0) mlabcolor(blue)) by(rep78)

    Each subset is shown in turn with the rest of the data as backdrop. In the case of ordered categories such as repair record, each value could serve as its own symbol:
    Click image for larger version

Name:	subsetplot_2.png
Views:	1
Size:	36.7 KB
ID:	270405

    Here's one more example. With panel data in particular, the problem of spaghetti plots is pervasive across several fields. In principle, plotting several time series in one plot is showing all the information. In practice, it can be hard to see the trees for the wood, to change the metaphor.

    Code:
     
    webuse grunfeld
    subsetplot line invest year, by(company) ysc(log) yla(1 10 100 1000)
    Click image for larger version

Name:	subsetplot_3.png
Views:	1
Size:	51.1 KB
ID:	270406

    This approach was discussed in Cox (2010). See also Schwabisch (2014) for an example. Readers knowing interesting or useful examples
    or discussions, especially early in date or comprehensive in detail, are welcome to email the author. It's hard to believe that this simple idea doesn't go way back, but at present I lack the references.

    Cox, N.J. 2010. Graphing subsets. Stata Journal 10: 670-681.

    Schwabish, J.A. 2014. An economist's guide to visualizing data. Journal of Economic Perspectives 28: 209-234.

    Attached Files

  • Nick Cox
    replied
    Then fabplot is irrelevant to you. Use xtline

    Leave a comment:


  • ali farrukh
    replied
    what if I don't want grey lines to appear in grunfield data, let suppose I want only orange line to appear,

    Leave a comment:


  • Nick Cox
    replied
    Thanks to Kit Baum, a new program fabplot is downloadable from SSC. Stata 9 is required.

    The name may seem alarming, as if groovyplot were next along, and I am all too vividly recalling 1960s slang or have been overdosing on Austin Powers movies.

    Consider the plight of the poor Stata programmer choosing the name of a new program. The name has to follow syntactic rules, it should be easy to spell and pronounce, and ideally it should match some standard name for whatever you're doing. I don't know any standard name for this procedure.

    Anyway, the idea is that fabplot can be decoded as front and back plot, or foreground and backdrop plot.

    That aside, how does fabplot differ from subsetplot?

    In essence fabplot follows the approach of #20 and #29 in rearranging data so that the graph is done using a by() option. That means less repetition of stuff along the axes.

    The following examples follow (and slightly correct) the help for fabplot.

    Code:
    set scheme s1color
    sysuse auto, clear
     fabplot scatter mpg weight, by(rep78) name(G1, replace)
     more
     fabplot scatter mpg weight, frontopts(ms(none) mla(rep78) mlabsize(*1.5) mlabpos(0) mlabcolor(blue)) by(rep78) name(G2, replace)
     more
    
     webuse grunfeld
     fabplot line invest year, by(company) xtitle("") ysc(log) yla(1 10 100 1000) name(G3, replace)
     more
     fabplot line invest year, by(company) xtitle("") ysc(log) yla(1 10 100 1000) front(connect) frontopts(mc(blue) lc(blue)) name(G4, replace)



    Click image for larger version

Name:	fabplot1.png
Views:	1
Size:	51.4 KB
ID:	1449052

    Click image for larger version

Name:	fabplot2.png
Views:	2
Size:	51.4 KB
ID:	1449054

    Click image for larger version

Name:	fabplot3.png
Views:	1
Size:	112.2 KB
ID:	1449055

    Click image for larger version

Name:	fabplot4.png
Views:	1
Size:	122.2 KB
ID:	1449056

    Attached Files

    Leave a comment:


  • Nick Cox
    replied
    As a sequel particularly to #20 this further experimentation arose out of a thread on Cross Validated: https://stats.stackexchange.com/ques...rends-properly The problem concerns death rates for 9 European countries and 11 years 1927 to 1937, hardly an enormous dataset, but challenging enough!

    Here in one are code and graph for one of the suggestions there:

    Code:
    clear
    input int year double(de fr be nl den ch aut cz pl)
    1927 10.9 16.5   13 10.2 11.6 12.4   15   16 17.3
    1928 11.2 16.4 12.8  9.6   11   12 14.5 15.1 16.4
    1929 11.4 17.9 14.4 10.7 11.2 12.5 14.6 15.5 16.7
    1930 10.4 15.6 12.8  9.1 10.8 11.6 13.5 14.2 15.6
    1931 10.4 16.2 12.7  9.6 11.4 12.1   14 14.4 15.5
    1932 10.2 15.8 12.7    9   11 12.2 13.9 14.1   15
    1933 10.8 15.8 12.7  8.8 10.6 11.4 13.2 13.7 14.2
    1934 10.6 15.1 11.7  8.4 10.4 11.3 12.7 13.2 14.4
    1935 11.4 15.7 12.3  8.7 11.1 12.1 13.7 13.5   14
    1936 11.7 15.3 12.2  8.7   11 11.4 13.2 13.3 14.2
    1937 11.5   15 12.5  8.8 10.8 11.3 13.3 13.3   14
    end
    
    rename (de-pl) (death=)
    reshape long death, i(year) j(country) string
    
    egen where = group(country), label
    gen long id = _n
    expand 9
    bysort id : gen group = _n
    label val group where
    
    separate death, by(group == where)
    
    local note "countries highlighted in turn"
    set scheme s1color
    sort group country year
    twoway  line death0 year, lc(gs12) by(group, compact note("`note'") legend(off)) ///
    subtitle(, fcolor(ltblue*0.5)) c(L) xtitle("") ///
    ytitle("death rate, yearly deaths per 1000") yla(8(2)18, ang(h)) xla(1927(5)1937, format(%tyY)) ///
    || connected death1 year, lc(blue) mc(blue) ms(oh)
    Click image for larger version

Name:	deaths3.png
Views:	1
Size:	125.4 KB
ID:	1448301

    Leave a comment:


  • Nick Cox
    replied
    William S. Cleveland's book The elements of graphing data is in my view the best single book a library could have on statistical graphics,

    Ideally you have the second edition from Hobart Press.

    I say this while continuing to admire the work of Edward Tufte, particularly the first of his self-published books.

    A graph like Schwabish's is easy to emulate.

    Code:
    sysuse lifeexp.dta, clear
    scatter lexp gnppc , mla(country)
    and yes, it's a mess. But Stata isn't telling you to use it in a presentation.

    Leave a comment:


  • Chris Vecchio
    replied
    Nick Cox Excellent and fair review. Sadly page 101 is hidden in the preview so I'll have to wait to see what the author claimed was a Stata default graph. To your point on complete sentences, I can say I've greatly benefited from this when referencing your presentations.

    I'm happy to report that our library has ordered several visualization technique books, including Better Presentations. These will add to our current collection containing a single (seemingly barely used) book, The Elements of Graphing Data.

    Leave a comment:


  • Nick Cox
    replied
    Chris Vecchio I posted a review of Schwabisch's book at https://www.amazon.com/Better-Presen.../dp/0231175213

    Stata users interested in this book should note the one comment in my review on a Stata example.

    Leave a comment:


  • Chris Vecchio
    replied
    Thanks for the additional recommendation! I somehow missed your post about Schwabish but have just purchased it and Carr/Pickle.

    Leave a comment:


  • Nick Cox
    replied
    Another one:

    Carr, D.B. and Pickle, L.W. 2010. Visualizing Data Patterns with Micromaps. Boca Raton, FL: CRC Press. p.85.

    Leave a comment:


  • Nick Cox
    replied
    See now also

    Schwabish, J. 2017. Better Presentations: A Guide for Scholars, Researchers, and Wonks. New York: Columbia University Press. p.98.

    Leave a comment:


  • Nick Cox
    replied
    Chris: Thanks for your interest and encouragement. Producing these graphs depends on being able to differentiate what is highlighted and what is not (e.g. using greyscale or very thin lines for the backdrop), and it's only fairly recently that that has become easy for most of us. Having said that, I may now stumble across an ancient example. At present the Swedish book in 1996 was almost there and Koenker in 2005 definitely was, but there must be other examples -- just far, far fewer in number than spaghetti-type graphs.

    Leave a comment:


  • Chris Vecchio
    replied
    Thanks for the additional references. I've used subsetplot charts many times and have received positive feedback from colleagues. However there are still those in my organization who (strongly) prefer spaghetti plots to subsetplot style charts. Having these additional references will hopefully make it easier to argue my case against spaghetti plots.

    Also, I love your latest example. I already see 2 or 3 of my current graphs I can apply the technique to. Thanks for sharing!

    Leave a comment:


  • Nick Cox
    replied
    To recap: the idea here is

    1. plot each subset of the data separately

    2. but with the rest of the data as backdrop.

    Some references are

    Cairo, A. 2016. The Truthful Art: Data, Charts, and Maps for Communication. San Francisco, CA: New Riders. p.211

    Camões, J. 2016. Data at Work: Best Practices for Creating Effective Charts and Information Graphics in Microsoft Excel. San Francisco, CA: New Riders. p.354

    Cox, N.J. 2010. Graphing subsets. Stata Journal 10: 670-681.

    Knaflic, C.N. 2015. Storytelling with Data: A Data Visualization Guide for Business Professionals. Hoboken, NJ: Wiley.

    Koenker, R. 2005. Quantile Regression. Cambridge: Cambridge University Press. See pp.12-13.

    Rougier, N.P., Droettboom, M. and Bourne, P.E. 2014. Ten simple rules for better figures. PLOS Computational Biology 10(9): e1003833. doi:10.1371/journal.pcbi.1003833 http://journals.plos.org/ploscompbio...l.pcbi.1003833

    Schwabish, J.A. 2014. An economist's guide to visualizing data. Journal of Economic Perspectives 28: 209-234.

    Unwin, A. 2015. Graphical Data Analysis with R. Boca Raton, FL: CRC Press.

    Wallgren, A., B. Wallgren, R. Persson, U. Jorner, and J.-A. Haaland. 1996. Graphing Statistics and Data: Creating Better Charts. Newbury Park, CA: Sage.

    -- and there may be examples in Schwabish's 2016 book, not yet to my hand.

    Returning to this I got around -- some while after realising that there was another way to do it -- to working out the exact strategy.

    This way produces nicer graphs, just using by() alone and without the awkwardness of graph combine, but the price is that you have to take control of the decisions. I don't envisage writing this up as a self-contained command because the approach is extensible so long as you make sure that the data you want for any graphical extras are all still in view.

    Here is a step-by-step guide,

    Code:
    * #1 read in your data
    sysuse auto, clear
    
    * #2 identify the grouping variable
    * will use -rep78-
    
    * #3 if dataset is small, or memory tight,
    * keep only the observations and variables you need
    
    keep if rep78 < .
    keep mpg weight rep78
    
    * #4 we need as many copies of the dataset as
    * there are groups
    * PLUS 1 if a graph is required for all the data
    * here 5 (+ 1) groups
    
    * the other groups will be split, into
    * subset highlighted PLUS others as backdrop
    
    gen long id = _n
    expand 6
    bysort id : gen group = _n
    
    * #5 separate data for each group into two
    
    separate mpg, by(group == rep78)
    
    * #6 optionally fix a label for all the data
    * and fix the values so that all are shown
    
    label def group 6 "all cars"
    label val group group
    replace mpg1 = mpg if group == 6
    replace mpg0 = . if group == 6
    
    * #7 now we can draw graphs!
    
    local note "subsets highlighted in turn"
    
    scatter mpg? weight, by(group, note("`note'") legend(off)) ///
    ms(+ Oh) mc(gs12 blue) subtitle(, fcolor(ltblue*0.5)) ///
    ytitle("`: var label mpg'") yla(, ang(h))
    
    more
    
    scatter mpg? weight, by(group, note("`note'") legend(off)) ///
    ms(+ none) mc(gs12 blue) ///
    mlabel(. rep78) mlabcolor(. blue) mlabpos(0 0) mlabsize(. medium) ///
    subtitle(, fcolor(ltblue*0.5)) ///
    ytitle("`: var label mpg'") yla(, ang(h))
    and here are example graphs.
    Click image for larger version

Name:	subset1.png
Views:	1
Size:	19.3 KB
ID:	1366448

    Click image for larger version

Name:	subset2.png
Views:	1
Size:	22.3 KB
ID:	1366449


    Last edited by Nick Cox; 01 Dec 2016, 14:42.

    Leave a comment:


  • Nick Cox
    replied
    This bumps the thread to add a cross-reference to http://stats.stackexchange.com/quest...es-in-one-plot That thread includes more Stata examples and discussions about R implementations. I add old and new references as I discover them.

    This must be old hat in many fields; I just don't know the references.

    A well-known graphics expert hinted to me that the approach is trivial for those using fully interactive graphics systems as the idea of brushing (i.e highlighting) a subset is basic in such systems. I can believe that.
    Last edited by Nick Cox; 04 Sep 2016, 05:25.

    Leave a comment:

Working...
X