Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Nick Cox
    started a topic subsetplot available on SSC

    subsetplot available on SSC

    Thanks to Kit Baum as usual, a new program subsetplot is now available to download from SSC. Stata 8.2 is required.

    subsetplot produces an array of scatter or other twoway plots for yvarlist versus xvar according to a further variable byvar. There is one plot for observations for each distinct subset of byvar in which data for that subset are highlighted and the rest of the data shown as backdrop. Graphs are drawn individually and then combined with graph combine.

    That's a little abstract, but some examples should help. We all know that if you want to compare relationships graphically between groups of observations, we can superimpose different groups in a single plot, or juxtapose different groups in several plots. This is a hybrid approach combining elements of those two strategies. Consider this code:

    Code:
    set scheme s1color
    sysuse auto, clear
    subsetplot scatter mpg weight, subset(ms(none) mla(rep78) mlabsize(*1.5) mlabpos(0) mlabcolor(blue)) by(rep78)

    Each subset is shown in turn with the rest of the data as backdrop. In the case of ordered categories such as repair record, each value could serve as its own symbol:
    Click image for larger version

Name:	subsetplot_2.png
Views:	1
Size:	36.7 KB
ID:	270405

    Here's one more example. With panel data in particular, the problem of spaghetti plots is pervasive across several fields. In principle, plotting several time series in one plot is showing all the information. In practice, it can be hard to see the trees for the wood, to change the metaphor.

    Code:
     
    webuse grunfeld
    subsetplot line invest year, by(company) ysc(log) yla(1 10 100 1000)
    Click image for larger version

Name:	subsetplot_3.png
Views:	1
Size:	51.1 KB
ID:	270406

    This approach was discussed in Cox (2010). See also Schwabisch (2014) for an example. Readers knowing interesting or useful examples
    or discussions, especially early in date or comprehensive in detail, are welcome to email the author. It's hard to believe that this simple idea doesn't go way back, but at present I lack the references.

    Cox, N.J. 2010. Graphing subsets. Stata Journal 10: 670-681.

    Schwabish, J.A. 2014. An economist's guide to visualizing data. Journal of Economic Perspectives 28: 209-234.

    Attached Files

  • Nick Cox
    replied
    subsetplot from SSC (all versions 2014) is, in my view, superseded by fabplot (Stata Journal).

    EIther program ignores missing values, so as far as the final plot is concerned, there are no gaps with missing values to take account of.

    Here is some technique that may help. If you use connect for the front plot, the markers make clear where data are present.

    Code:
    webuse grunfeld, clear
    replace invest = . if year == 1944
    fabplot line invest year, by(company) ysc(log) front(connect) scheme(s1color)

    Leave a comment:


  • yumeng Liu
    replied
    Have to say that the program is sooo fancy and efficient. One question about the issue of missing data: I found the subsetplot will only connect the non-missing observations, I'm wondering if there's any options for leaving the missing place as what it is, so that we may be able to get non-continuous lines.

    Leave a comment:


  • Nick Cox
    replied
    A paper on this problem and the fabplot command is now published at https://www.stata-journal.com/articl...article=gr0087 and is accessible regardless of whether you or your workplace subscribe to the Journal.

    Leave a comment:


  • Nick Cox
    replied
    Thanks to Kit Baum as ever, fabplot has been updated on SSC. The immediate prompt for updating was to fix a bug that I noticed (evidently before anyone else) but the help file has also been updated in various details.

    If fabplot is of interest, please update your version.

    Leave a comment:


  • Nick Cox
    replied
    Your original syntax was asking for

    regression lines and data points for each subset in front

    regression lines and data points for other subsets in back

    There are many variants on this. Here's one with extra twists to match the example. rangestat and mylabels are from SSC, as is fabplot.


    Code:
    webuse grunfeld, clear
    gen ln_invest = ln(invest)
    rangestat (reg) ln_invest year, int(company 0 0)
    gen predicted = b_cons + b_year * year
    label var predicted "predicted investment (linear fit on log scale)"
    mylabels 1 3 10 30 100 300 1000, myscale(ln(@)) local(yla)
    fabplot line predicted year, by(company) select(company <= 4) frontopts(lw(thick)) yla(`yla') xtitle("") xla(1935 " 1935" 1955 "1955 " 1940(5)1950, format(%tyYY))
    Click image for larger version

Name:	fabplot_regline.png
Views:	1
Size:	60.5 KB
ID:	1607135

    Last edited by Nick Cox; 02 May 2021, 00:06.

    Leave a comment:


  • Chris Boulis
    replied
    I will do as you suggested. Thank you for your advice Nick Cox.

    Leave a comment:


  • Nick Cox
    replied
    Sorry, but that goes way beyond what the syntax allows. You can fit the regressions in advance and then save the predictions to a new variable; and then use fabplot line.

    Leave a comment:


  • Chris Boulis
    replied
    Thanks for clarifying - that makes sense.
    You don't show the code that doesn't work
    My apologies. I attempted:
    Code:
    fabplot (scatter totasset_ecpc agegrp) (lfit totasset_ecpc agegrp), by(wave)
    and
    Code:
    fabplot scatter totasset_ecpc agegrp || lfit totasset_ecpc agegrp, by(wave)
    Yes you are correct the code in #42 does "work". but not as I'd hoped - which is to run a 'line of best fit' through the annual data in each graph. Thanks anyway.
    Last edited by Chris Boulis; 01 May 2021, 08:47.

    Leave a comment:


  • Nick Cox
    replied
    Actually, no. subsetplot has a by() option but that just indicates the grouping variable. The juxtaposition of graphs is done by graph combine. So, I am less surprised that the commands behave quite differently in this respect because they were written differently, although the help file doesn't explain that because it doesn't need to.

    But I can't explain why ixlabel doesn't seem to work as I expect it to with fabplot. I just don't see that is my bug.

    You don't show the code that doesn't work. Something like
    Code:
    webuse grunfeld
    fabplot lfit invest year, by(company)  frontopts(lw(thick)) select(company <= 4) xla(, grid)
    "works" but it doesn't do what you probably hope.

    The help for fabplot at least on my machine has a weaselly disclaimer

    fabplot does not attempt to trap calls to twoway that are legal with two numeric variables, but will not be helpful with its design. It is
    most obviously useful with calls to scatter, line and connected and written with those subcommands in mind.
    Last edited by Nick Cox; 01 May 2021, 05:12.

    Leave a comment:


  • Chris Boulis
    replied
    Nick Cox. Thank you for the code and suggestions.

    I don't understand either as subsetplot also uses the by() option, but x-axis labels are given. Anyway, fabplot looks nicer - particularly the removal of repetitive y-axis labels. I agree, using xla(, grid) largely offsets this if there are an even number of graphs. I have five graphs in my case, so I see an inconsistency. I will look to add a summary graph to fill the void.

    Your suggestion to use twowayoptions yline() xline() was perfect - it does just as I wanted - thanks.

    Is lfit() compatible with fabplot? I tried including it, but Stata output read "varlist not allowed r(101);"
    Last edited by Chris Boulis; 01 May 2021, 04:49.

    Leave a comment:


  • Nick Cox
    replied
    @Chris Boulis

    fabplot is using the by() option of twoway so in principle you can add its suboptions, In practice twoway seems to ignore ixlabel and I don't know why but don't think that's any side-effect of my code. I have noted that in other instances.

    In my view adding xla(, grid) is a much better way to help interpretation. There is no data example in #30 but this should give the flavour:

    webuse grunfeld, clear
    fabplot line invest year, by(company) ysc(log) yla(1 10 100 1000) frontopts(lw(thick)) select(company <= 4) xla(, grid)



    "add a line at a specific point" is hard to decode geometrically but, again, twoway options such as xline() and yline() are available as extras.

    Leave a comment:


  • Chris Boulis
    replied
    Hi Nick Cox In #30 you state that fabplot reduces
    repetition of stuff along the axes.
    In some cases, the distance of say three levels without x-axis values (such as in #33) may impact interpretation. Is there a way in fabplot to add x-axis values for graphs like in #33?

    Also, could you kindly show in #33 how one may add a line at a specific point in the graph as a way of delineating between values above and below that point?

    Leave a comment:


  • Nick Cox
    replied
    Another example at https://www.theguardian.com/business...big-to-stomach

    As always, I suspect that this method has been re-invented many times, but it's just not part of everyone's toolkit yet. The help file for fabplot on SSC has several references and I've found more since that was posted in August 2020.

    Leave a comment:


  • Andrew Musau
    replied
    I hope that you got your cut Nick!

    Leave a comment:

Working...
X