No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16

    I don't know if they have this expression on your side of the pond, but I would classify -subsetplot- as "the best invention since sliced bread." I've already used it a half dozen times--it's the program I've been waiting for for years and didn't even realize I needed.



    • #17
      Thanks very much. That expression is idiomatic in Britain too.


      • #18
        Thanks to Kit Baum, a fixed version is now on SSC.


        • #19
          This bumps the thread to add a cross-reference to That thread includes more Stata examples and discussions about R implementations. I add old and new references as I discover them.

          This must be old hat in many fields; I just don't know the references.

          A well-known graphics expert hinted to me that the approach is trivial for those using fully interactive graphics systems as the idea of brushing (i.e highlighting) a subset is basic in such systems. I can believe that.
          Last edited by Nick Cox; 04 Sep 2016, 05:25.


          • #20
            To recap: the idea here is

            1. plot each subset of the data separately

            2. but with the rest of the data as backdrop.

            Some references are

            Cairo, A. 2016. The Truthful Art: Data, Charts, and Maps for Communication. San Francisco, CA: New Riders. p.211

            Camões, J. 2016. Data at Work: Best Practices for Creating Effective Charts and Information Graphics in Microsoft Excel. San Francisco, CA: New Riders. p.354

            Cox, N.J. 2010. Graphing subsets. Stata Journal 10: 670-681.

            Knaflic, C.N. 2015. Storytelling with Data: A Data Visualization Guide for Business Professionals. Hoboken, NJ: Wiley.

            Koenker, R. 2005. Quantile Regression. Cambridge: Cambridge University Press. See pp.12-13.

            Rougier, N.P., Droettboom, M. and Bourne, P.E. 2014. Ten simple rules for better figures. PLOS Computational Biology 10(9): e1003833. doi:10.1371/journal.pcbi.1003833

            Schwabish, J.A. 2014. An economist's guide to visualizing data. Journal of Economic Perspectives 28: 209-234.

            Unwin, A. 2015. Graphical Data Analysis with R. Boca Raton, FL: CRC Press.

            Wallgren, A., B. Wallgren, R. Persson, U. Jorner, and J.-A. Haaland. 1996. Graphing Statistics and Data: Creating Better Charts. Newbury Park, CA: Sage.

            -- and there may be examples in Schwabish's 2016 book, not yet to my hand.

            Returning to this I got around -- some while after realising that there was another way to do it -- to working out the exact strategy.

            This way produces nicer graphs, just using by() alone and without the awkwardness of graph combine, but the price is that you have to take control of the decisions. I don't envisage writing this up as a self-contained command because the approach is extensible so long as you make sure that the data you want for any graphical extras are all still in view.

            Here is a step-by-step guide,

            * #1 read in your data
            sysuse auto, clear
            * #2 identify the grouping variable
            * will use -rep78-
            * #3 if dataset is small, or memory tight,
            * keep only the observations and variables you need
            keep if rep78 < .
            keep mpg weight rep78
            * #4 we need as many copies of the dataset as
            * there are groups
            * PLUS 1 if a graph is required for all the data
            * here 5 (+ 1) groups
            * the other groups will be split, into
            * subset highlighted PLUS others as backdrop
            gen long id = _n
            expand 6
            bysort id : gen group = _n
            * #5 separate data for each group into two
            separate mpg, by(group == rep78)
            * #6 optionally fix a label for all the data
            * and fix the values so that all are shown
            label def group 6 "all cars"
            label val group group
            replace mpg1 = mpg if group == 6
            replace mpg0 = . if group == 6
            * #7 now we can draw graphs!
            local note "subsets highlighted in turn"
            scatter mpg? weight, by(group, note("`note'") legend(off)) ///
            ms(+ Oh) mc(gs12 blue) subtitle(, fcolor(ltblue*0.5)) ///
            ytitle("`: var label mpg'") yla(, ang(h))
            scatter mpg? weight, by(group, note("`note'") legend(off)) ///
            ms(+ none) mc(gs12 blue) ///
            mlabel(. rep78) mlabcolor(. blue) mlabpos(0 0) mlabsize(. medium) ///
            subtitle(, fcolor(ltblue*0.5)) ///
            ytitle("`: var label mpg'") yla(, ang(h))
            and here are example graphs.
            Click image for larger version

Name:	subset1.png
Views:	1
Size:	19.3 KB
ID:	1366448

            Click image for larger version

Name:	subset2.png
Views:	1
Size:	22.3 KB
ID:	1366449

            Last edited by Nick Cox; 01 Dec 2016, 14:42.


            • #21
              Thanks for the additional references. I've used subsetplot charts many times and have received positive feedback from colleagues. However there are still those in my organization who (strongly) prefer spaghetti plots to subsetplot style charts. Having these additional references will hopefully make it easier to argue my case against spaghetti plots.

              Also, I love your latest example. I already see 2 or 3 of my current graphs I can apply the technique to. Thanks for sharing!


              • #22
                Chris: Thanks for your interest and encouragement. Producing these graphs depends on being able to differentiate what is highlighted and what is not (e.g. using greyscale or very thin lines for the backdrop), and it's only fairly recently that that has become easy for most of us. Having said that, I may now stumble across an ancient example. At present the Swedish book in 1996 was almost there and Koenker in 2005 definitely was, but there must be other examples -- just far, far fewer in number than spaghetti-type graphs.


                • #23
                  See now also

                  Schwabish, J. 2017. Better Presentations: A Guide for Scholars, Researchers, and Wonks. New York: Columbia University Press. p.98.


                  • #24
                    Another one:

                    Carr, D.B. and Pickle, L.W. 2010. Visualizing Data Patterns with Micromaps. Boca Raton, FL: CRC Press. p.85.


                    • #25
                      Thanks for the additional recommendation! I somehow missed your post about Schwabish but have just purchased it and Carr/Pickle.


                      • #26
                        Chris Vecchio I posted a review of Schwabisch's book at

                        Stata users interested in this book should note the one comment in my review on a Stata example.


                        • #27
                          Nick Cox Excellent and fair review. Sadly page 101 is hidden in the preview so I'll have to wait to see what the author claimed was a Stata default graph. To your point on complete sentences, I can say I've greatly benefited from this when referencing your presentations.

                          I'm happy to report that our library has ordered several visualization technique books, including Better Presentations. These will add to our current collection containing a single (seemingly barely used) book, The Elements of Graphing Data.


                          • #28
                            William S. Cleveland's book The elements of graphing data is in my view the best single book a library could have on statistical graphics,

                            Ideally you have the second edition from Hobart Press.

                            I say this while continuing to admire the work of Edward Tufte, particularly the first of his self-published books.

                            A graph like Schwabish's is easy to emulate.

                            sysuse lifeexp.dta, clear
                            scatter lexp gnppc , mla(country)
                            and yes, it's a mess. But Stata isn't telling you to use it in a presentation.


                            • #29
                              As a sequel particularly to #20 this further experimentation arose out of a thread on Cross Validated: The problem concerns death rates for 9 European countries and 11 years 1927 to 1937, hardly an enormous dataset, but challenging enough!

                              Here in one are code and graph for one of the suggestions there:

                              input int year double(de fr be nl den ch aut cz pl)
                              1927 10.9 16.5   13 10.2 11.6 12.4   15   16 17.3
                              1928 11.2 16.4 12.8  9.6   11   12 14.5 15.1 16.4
                              1929 11.4 17.9 14.4 10.7 11.2 12.5 14.6 15.5 16.7
                              1930 10.4 15.6 12.8  9.1 10.8 11.6 13.5 14.2 15.6
                              1931 10.4 16.2 12.7  9.6 11.4 12.1   14 14.4 15.5
                              1932 10.2 15.8 12.7    9   11 12.2 13.9 14.1   15
                              1933 10.8 15.8 12.7  8.8 10.6 11.4 13.2 13.7 14.2
                              1934 10.6 15.1 11.7  8.4 10.4 11.3 12.7 13.2 14.4
                              1935 11.4 15.7 12.3  8.7 11.1 12.1 13.7 13.5   14
                              1936 11.7 15.3 12.2  8.7   11 11.4 13.2 13.3 14.2
                              1937 11.5   15 12.5  8.8 10.8 11.3 13.3 13.3   14
                              rename (de-pl) (death=)
                              reshape long death, i(year) j(country) string
                              egen where = group(country), label
                              gen long id = _n
                              expand 9
                              bysort id : gen group = _n
                              label val group where
                              separate death, by(group == where)
                              local note "countries highlighted in turn"
                              set scheme s1color
                              sort group country year
                              twoway  line death0 year, lc(gs12) by(group, compact note("`note'") legend(off)) ///
                              subtitle(, fcolor(ltblue*0.5)) c(L) xtitle("") ///
                              ytitle("death rate, yearly deaths per 1000") yla(8(2)18, ang(h)) xla(1927(5)1937, format(%tyY)) ///
                              || connected death1 year, lc(blue) mc(blue) ms(oh)
                              Click image for larger version

Name:	deaths3.png
Views:	1
Size:	125.4 KB
ID:	1448301


                              • #30
                                Thanks to Kit Baum, a new program fabplot is downloadable from SSC. Stata 9 is required.

                                The name may seem alarming, as if groovyplot were next along, and I am all too vividly recalling 1960s slang or have been overdosing on Austin Powers movies.

                                Consider the plight of the poor Stata programmer choosing the name of a new program. The name has to follow syntactic rules, it should be easy to spell and pronounce, and ideally it should match some standard name for whatever you're doing. I don't know any standard name for this procedure.

                                Anyway, the idea is that fabplot can be decoded as front and back plot, or foreground and backdrop plot.

                                That aside, how does fabplot differ from subsetplot?

                                In essence fabplot follows the approach of #20 and #29 in rearranging data so that the graph is done using a by() option. That means less repetition of stuff along the axes.

                                The following examples follow (and slightly correct) the help for fabplot.

                                set scheme s1color
                                sysuse auto, clear
                                 fabplot scatter mpg weight, by(rep78) name(G1, replace)
                                 fabplot scatter mpg weight, frontopts(ms(none) mla(rep78) mlabsize(*1.5) mlabpos(0) mlabcolor(blue)) by(rep78) name(G2, replace)
                                 webuse grunfeld
                                 fabplot line invest year, by(company) xtitle("") ysc(log) yla(1 10 100 1000) name(G3, replace)
                                 fabplot line invest year, by(company) xtitle("") ysc(log) yla(1 10 100 1000) front(connect) frontopts(mc(blue) lc(blue)) name(G4, replace)

                                Click image for larger version

Name:	fabplot1.png
Views:	1
Size:	51.4 KB
ID:	1449052

                                Click image for larger version

Name:	fabplot2.png
Views:	2
Size:	51.4 KB
ID:	1449054

                                Click image for larger version

Name:	fabplot3.png
Views:	1
Size:	112.2 KB
ID:	1449055

                                Click image for larger version

Name:	fabplot4.png
Views:	1
Size:	122.2 KB
ID:	1449056

                                Attached Files