subsetplot available on SSC

Nick Cox started a topic subsetplot available on SSC

29 Sep 2014, 05:30
subsetplot available on SSC
Thanks to Kit Baum as usual, a new program subsetplot is now available to download from SSC. Stata 8.2 is required.

subsetplot produces an array of scatter or other twoway plots for yvarlist versus xvar according to a further variable byvar. There is one plot for observations for each distinct subset of byvar in which data for that subset are highlighted and the rest of the data shown as backdrop. Graphs are drawn individually and then combined with graph combine.

That's a little abstract, but some examples should help. We all know that if you want to compare relationships graphically between groups of observations, we can superimpose different groups in a single plot, or juxtapose different groups in several plots. This is a hybrid approach combining elements of those two strategies. Consider this code:

Code:

set scheme s1color sysuse auto, clear subsetplot scatter mpg weight, subset(ms(none) mla(rep78) mlabsize(*1.5) mlabpos(0) mlabcolor(blue)) by(rep78)

Each subset is shown in turn with the rest of the data as backdrop. In the case of ordered categories such as repair record, each value could serve as its own symbol:

Here's one more example. With panel data in particular, the problem of spaghetti plots is pervasive across several fields. In principle, plotting several time series in one plot is showing all the information. In practice, it can be hard to see the trees for the wood, to change the metaphor.

Code:

webuse grunfeld subsetplot line invest year, by(company) ysc(log) yla(1 10 100 1000)

This approach was discussed in Cox (2010). See also Schwabisch (2014) for an example. Readers knowing interesting or useful examples
or discussions, especially early in date or comprehensive in detail, are welcome to email the author. It's hard to believe that this simple idea doesn't go way back, but at present I lack the references.

Cox, N.J. 2010. Graphing subsets. Stata Journal 10: 670-681.

Schwabish, J.A. 2014. An economist's guide to visualizing data. Journal of Economic Perspectives 28: 209-234.

Attached Files
Tags: None

6 likes
Nick Cox replied

19 Jul 2022, 04:12
subsetplot from SSC (all versions 2014) is, in my view, superseded by fabplot (Stata Journal).

EIther program ignores missing values, so as far as the final plot is concerned, there are no gaps with missing values to take account of.

Here is some technique that may help. If you use connect for the front plot, the markers make clear where data are present.

Code:

webuse grunfeld, clear replace invest = . if year == 1944 fabplot line invest year, by(company) ysc(log) front(connect) scheme(s1color)
1 like
Leave a comment:
yumeng Liu replied

19 Jul 2022, 03:53
Have to say that the program is sooo fancy and efficient. One question about the issue of missing data: I found the subsetplot will only connect the non-missing observations, I'm wondering if there's any options for leaving the missing place as what it is, so that we may be able to get non-continuous lines.
Leave a comment:
Nick Cox replied

02 Jul 2021, 02:05
A paper on this problem and the fabplot command is now published at https://www.stata-journal.com/articl...article=gr0087 and is accessible regardless of whether you or your workplace subscribe to the Journal.
1 like
Leave a comment:
Nick Cox replied

13 May 2021, 03:16
Thanks to Kit Baum as ever, fabplot has been updated on SSC. The immediate prompt for updating was to fix a bug that I noticed (evidently before anyone else) but the help file has also been updated in various details.

If fabplot is of interest, please update your version.
1 like
Leave a comment:

Nick Cox replied

01 May 2021, 23:49

Your original syntax was asking for

regression lines and data points for each subset in front

regression lines and data points for other subsets in back

There are many variants on this. Here's one with extra twists to match the example. rangestat and mylabels are from SSC, as is fabplot.

Code:

webuse grunfeld, clear
gen ln_invest = ln(invest)
rangestat (reg) ln_invest year, int(company 0 0)
gen predicted = b_cons + b_year * year
label var predicted "predicted investment (linear fit on log scale)"
mylabels 1 3 10 30 100 300 1000, myscale(ln(@)) local(yla)
fabplot line predicted year, by(company) select(company <= 4) frontopts(lw(thick)) yla(`yla') xtitle("") xla(1935 " 1935" 1955 "1955 " 1940(5)1950, format(%tyYY))

Click image for larger version

Name: fabplot_regline.png
Views: 1
Size: 60.5 KB
ID: 1607135

Last edited by Nick Cox; 02 May 2021, 00:06.

Announcement

subsetplot available on SSC

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: