Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Chris Vecchio
    replied
    Thanks for the additional references. I've used subsetplot charts many times and have received positive feedback from colleagues. However there are still those in my organization who (strongly) prefer spaghetti plots to subsetplot style charts. Having these additional references will hopefully make it easier to argue my case against spaghetti plots.

    Also, I love your latest example. I already see 2 or 3 of my current graphs I can apply the technique to. Thanks for sharing!

    Leave a comment:


  • Nick Cox
    replied
    To recap: the idea here is

    1. plot each subset of the data separately

    2. but with the rest of the data as backdrop.

    Some references are

    Cairo, A. 2016. The Truthful Art: Data, Charts, and Maps for Communication. San Francisco, CA: New Riders. p.211

    Camões, J. 2016. Data at Work: Best Practices for Creating Effective Charts and Information Graphics in Microsoft Excel. San Francisco, CA: New Riders. p.354

    Cox, N.J. 2010. Graphing subsets. Stata Journal 10: 670-681.

    Knaflic, C.N. 2015. Storytelling with Data: A Data Visualization Guide for Business Professionals. Hoboken, NJ: Wiley.

    Koenker, R. 2005. Quantile Regression. Cambridge: Cambridge University Press. See pp.12-13.

    Rougier, N.P., Droettboom, M. and Bourne, P.E. 2014. Ten simple rules for better figures. PLOS Computational Biology 10(9): e1003833. doi:10.1371/journal.pcbi.1003833 http://journals.plos.org/ploscompbio...l.pcbi.1003833

    Schwabish, J.A. 2014. An economist's guide to visualizing data. Journal of Economic Perspectives 28: 209-234.

    Unwin, A. 2015. Graphical Data Analysis with R. Boca Raton, FL: CRC Press.

    Wallgren, A., B. Wallgren, R. Persson, U. Jorner, and J.-A. Haaland. 1996. Graphing Statistics and Data: Creating Better Charts. Newbury Park, CA: Sage.

    -- and there may be examples in Schwabish's 2016 book, not yet to my hand.

    Returning to this I got around -- some while after realising that there was another way to do it -- to working out the exact strategy.

    This way produces nicer graphs, just using by() alone and without the awkwardness of graph combine, but the price is that you have to take control of the decisions. I don't envisage writing this up as a self-contained command because the approach is extensible so long as you make sure that the data you want for any graphical extras are all still in view.

    Here is a step-by-step guide,

    Code:
    * #1 read in your data
    sysuse auto, clear
    
    * #2 identify the grouping variable
    * will use -rep78-
    
    * #3 if dataset is small, or memory tight,
    * keep only the observations and variables you need
    
    keep if rep78 < .
    keep mpg weight rep78
    
    * #4 we need as many copies of the dataset as
    * there are groups
    * PLUS 1 if a graph is required for all the data
    * here 5 (+ 1) groups
    
    * the other groups will be split, into
    * subset highlighted PLUS others as backdrop
    
    gen long id = _n
    expand 6
    bysort id : gen group = _n
    
    * #5 separate data for each group into two
    
    separate mpg, by(group == rep78)
    
    * #6 optionally fix a label for all the data
    * and fix the values so that all are shown
    
    label def group 6 "all cars"
    label val group group
    replace mpg1 = mpg if group == 6
    replace mpg0 = . if group == 6
    
    * #7 now we can draw graphs!
    
    local note "subsets highlighted in turn"
    
    scatter mpg? weight, by(group, note("`note'") legend(off)) ///
    ms(+ Oh) mc(gs12 blue) subtitle(, fcolor(ltblue*0.5)) ///
    ytitle("`: var label mpg'") yla(, ang(h))
    
    more
    
    scatter mpg? weight, by(group, note("`note'") legend(off)) ///
    ms(+ none) mc(gs12 blue) ///
    mlabel(. rep78) mlabcolor(. blue) mlabpos(0 0) mlabsize(. medium) ///
    subtitle(, fcolor(ltblue*0.5)) ///
    ytitle("`: var label mpg'") yla(, ang(h))
    and here are example graphs.
    Click image for larger version

Name:	subset1.png
Views:	1
Size:	19.3 KB
ID:	1366448

    Click image for larger version

Name:	subset2.png
Views:	1
Size:	22.3 KB
ID:	1366449


    Last edited by Nick Cox; 01 Dec 2016, 14:42.

    Leave a comment:


  • Nick Cox
    replied
    This bumps the thread to add a cross-reference to http://stats.stackexchange.com/quest...es-in-one-plot That thread includes more Stata examples and discussions about R implementations. I add old and new references as I discover them.

    This must be old hat in many fields; I just don't know the references.

    A well-known graphics expert hinted to me that the approach is trivial for those using fully interactive graphics systems as the idea of brushing (i.e highlighting) a subset is basic in such systems. I can believe that.
    Last edited by Nick Cox; 04 Sep 2016, 05:25.

    Leave a comment:


  • Nick Cox
    replied
    Thanks to Kit Baum, a fixed version is now on SSC.

    Leave a comment:


  • Nick Cox
    replied
    Thanks very much. That expression is idiomatic in Britain too.

    Leave a comment:


  • Clyde Schechter
    replied
    Nick,

    I don't know if they have this expression on your side of the pond, but I would classify -subsetplot- as "the best invention since sliced bread." I've already used it a half dozen times--it's the program I've been waiting for for years and didn't even realize I needed.

    Thanks.

    Leave a comment:


  • Nick Cox
    replied
    Daniel: I agree with your diagnosis that the subtitle() option needs a fix. I am not going necessarily going to fix it in exactly the same way!

    Leave a comment:


  • Stefan Gawrich
    replied
    Thanks Daniel,

    it works.
    I should have looked into the code before posting.

    Thanks again, Nick. I especially like -subsetplot- with line graphs. Very nice.


    Stefan Gawrich
    Dillenburg
    Germany



    Leave a comment:


  • daniel klein
    replied
    Guess both are caused by line 84 of subsetplot.ado which calls the subtitle option as

    Code:
    ... subtitle(`which')
    This should be an easy fix, and I would suggest

    Code:
    ... subtitle(`"`macval(which)'"')
    because macval() is a trick to also deal with single (unmatched) left quotes in labels. Something that cannot be achieved with compound quotes only.


    By the way, very nice program, Nick. Always happy to read your code for graphic commands, to get and learn from the ideas/technique behind.

    Best
    Daniel

    Leave a comment:


  • Nick Cox
    replied
    Stefan: Thanks for your interest. I can reproduce problem 2 but not (yet?) problem 1. You have unearthed a small bug. I will flag when a fixed version is posted on SSC.

    Leave a comment:


  • Stefan Gawrich
    replied
    The forum -itrim-s text so the first example of my last post worked.

    Here's an altered example:

    sysuse auto, clear
    label define foreignlabel3 0 "1________10________20__(manufacturer)" 1 "foreign", replace
    label values foreign foreignlabel3
    subsetplot scatter price mpg,by(foreign)

    parentheses do not balance
    r(198);



    Best wishes

    Stefan Gawrich
    Dillenburg
    Germany

    Leave a comment:


  • Stefan Gawrich
    replied
    Thanks Nick,

    this is very nice new graph!

    Unfortunately I encounter some problems with value labels of the by() var. (Stata 13.1 MP on Win7)


    1) A left parenthesis in a value label within the first 32 chars of a value label without a right parenthesis within 32 chars leads to an error.

    Example:

    sysuse auto, clear
    label define foreignlabel3 0 "1 10 20 (manufacturer)" 1 "foreign"
    label values foreign foreignlabel3
    subsetplot scatter price mpg,by(foreign)

    parentheses do not balance
    r(198);


    2) Also the use of a comma seems to be misinterpreted.

    sysuse auto, clear
    label define foreignlabel3 0 "Detroit, Michigan" 1 "foreign"
    label values foreign foreignlabel3
    subsetplot scatter price mpg,by(foreign)

    option Michigan not allowed
    r(198);


    Best wishes

    Stefan Gawrich
    Dillenburg
    Germany





    Leave a comment:


  • Nick Cox
    replied
    Thanks.

    Leave a comment:


  • Roman Mostazir
    replied
    Actually, this is cleverer than I thought. Brilliant !!

    Leave a comment:


  • Nick Cox
    replied
    The entire rationale of subsetplot is to include the rest of the data as backdrop, in this case as a set of grey lines!!! If you don't want that, just use some appropriate official command, e.g line with a by() option, as documented.

    Leave a comment:

Working...
X