Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fighting spaghetti: some small devices using linkplot

    A common kind of question on Statalist concerns plotting multiple time series. Spaghetti -- tangled lines that can hardly be distinguished -- is an ever-present graphical danger.

    At worst one can have

    1. one or more responses

    2. one or more panels or groups

    3. several times of observation.

    Even reduced versions of this problem (one response only OR one group only) can be frustrating. Here I focus mainly on one response and several groups. The ambiguity of the term panel is a little disconcerting: is group of observations or part of graph implied? I will use panel in the graphical sense and group otherwise.

    Most of the solutions hinge, directly or indirectly, on the command twoway line (which can be just line), but even the special extra commands tsline and xtline can be disappointing, at least in my experience.

    I doubt that there is a single solution. At least, I have tried several, including

    sparkline (SSC, 2013) https://www.stata.com/statalist/arch.../msg00922.html Examples at https://www.statalist.org/forums/for...le-time-series https://www.statalist.org/forums/for...hart-correctly

    multiline (SSC, 2017) https://www.statalist.org/forums/for...ailable-on-ssc

    fabplot (SSC, 2018) https://www.statalist.org/forums/for...ailable-on-ssc

    Recently I was playing with some examples for which none of these was quite right and was pondering writing a different command. Then I realised that linkplot (SSC) was sufficiently general to help. linkplot isn't specifically geared to time series data at all, but that doesn't bite.

    linkplot was posted on SSC in 2003 and announced at https://www.stata.com/statalist/arch.../msg00194.html but the email-based server then didn't allow graphical illustrations and that post just hints at time series applications.

    Yesterday I realised that although I had updated the command in 2007 I hadn't updated the version on SSC, but Kit Baum kindly and promptly updated the files. Thus even if you previously installed a copy of linkplot an update is still in order for the syntax below to work.

    The Grunfeld data are a sandbox for problems of this kind. Here's the good news: if a graphical method won't work well with the Grunfeld data, with just 10 companies and 20 years, then you're probably doomed if your data are more complicated.

    Let's look at the investment variable in the Grunfeld data. We'll flag key points, some of which apply much more generally.

    #1: Always consider logarithmic scale for a response. That's really old news for some, but goodness knows how many people don't seem to realise how helpful that can be.

    #1': If zeros are present too, or even negative values, and some kind of transformation is called for, just possibly you could use square roots, cube roots, sign(y) * log(1 + abs(y)), asinh(y), etc.

    Code:
    webuse grunfeld, clear
    set scheme s1color
    
    label var invest "investment"
    
    xtline invest, ysc(log)
    
    xtline invest, overlay ysc(log)
    Click image for larger version

Name:	grunfeld1.png
Views:	1
Size:	31.5 KB
ID:	1484494

    Click image for larger version

Name:	grunfeld2.png
Views:	1
Size:	58.6 KB
ID:	1484495




    #2: Stata default axis labels for logarithmic scales aren't terribly smart. We could just reach in and tell Stata what we want. For other technique see https://www.stata-journal.com/articl...article=gr0072 and/or niceloglabels (Stata Journal). The graphs above show the problem.

    #3. Even with about 10 groups, graphs with a group in each panel may not work well. In principle, the data are shown clearly, but effective comparison is difficult.

    #4. Even with about 10 groups, a superimposed graph may not work well either. A large fraction of the total graph area becomes legend and the mental "back and forth" required to relate graph to legend and legend to graph is often too much like hard work.

    How to do better? The default linkplot doesn't at first look especially promising,even with a logarithmic scale. Note that linkplot knows nothing about any tsset or xtset specification, but here the group identifier is fed to the link() option, which tells the program what should be connected. Minor trickery within the code ensures that incomplete panels won't be connected spuriously.

    Code:
    linkplot invest year, link(company)  ysc(log)
    Click image for larger version

Name:	grunfeld3.png
Views:	1
Size:	59.8 KB
ID:	1484496




    The default of
    Code:
    twoway connect
    could be a good idea for short series (not least panels of lengths 2, say before and after, start and end, and so forth), but it just contributes noise
    here.

    There are several small and large tweaks that can be made to improve the plot. Let me give the remaining code all at once, show the resulting graphs and then draw the morals.

    Code:
    local endlabels addplot(scatter invest year if year == 1954, ms(none) mla(company) mlabc(blue))
    
    gen odd = mod(company, 2) == 0
    
    linkplot invest year, recast(line) link(company) `endlabels' ysc(log) by(odd, legend(off) note("") compact) xla(, labsize(small)) subtitle("", fcolor(none) nobox nobexpand) yla(1000 100 10 1, ang(h)) xsc(r(1935 1956)) xtitle("")  
    
    linkplot invest year, recast(line) link(company) `endlabels' ysc(log) by(odd, legend(off) note("") compact) xla(, labsize(small)) subtitle("", fcolor(none) nobox nobexpand) yla(1000 100 10 1, ang(h)) xsc(r(1935 1956)) asyvars  ytitle(investment)  xtitle("")
    Click image for larger version

Name:	grunfeld4.png
Views:	1
Size:	43.1 KB
ID:	1484497

    Click image for larger version

Name:	grunfeld5.png
Views:	1
Size:	43.2 KB
ID:	1484498




    #5: Trailing end labels. If you care which panel is which, you need to show identifiers, but a legend may be dispensable. One of my slogans is: Lose the legend! Kill the key! (if you can). The Grunfeld data are especially easy (identifiers 1 to 10), but don't sniff at easy answers when available. To show those end labels, you just need an additional scatter plot fed to addplot() with no marker symbol, but a marker label. The default marker label position of 3 pm is exactly right. You might to stretch the x axis a little.

    #5': Related technique is to show start labels, as well or alternatively. Usually the most recent value seems to offer the best place for the identifier.

    #6: A compromise between one graph showing all and one panel for each group is to bundle groups together . Sometimes there is a natural or convenient way to do that (e.g. US states might be grouped by region). Here as the identifiers run from 1 (large company) to 10 (small company) dividing identifiers into odd and even reduces the overlap between series. 3 panels side by side could just about work if the series aren't long. 2 x 2 = 4 panels loses some comparability as some panels are on different rows.

    #7: Sometimes the grouping doesn't need to be explained. Note that while we are using by() we can suppress the note() that appears by default and even the subtitles.

    #8: Suppress x axis titles like "year". Really, who needs them?

    #9: We just reach in and ask for y axis labels 1 10 100 1000 ourselves. 1 3 10 30 100 300 1000 is a good alternative if you want more labels. Much more discussion in the paper cited in #2.

    #10: linkplot has an asyvars option that automatically colours companies separately. Here that might seem a complication too far, but liking the idea doesn't imply that we need a legend too. That's what the end labels do. Conversely, some might want to colour the end labels to match. (No; there isn't an automated way to do that in linkplot, at least not yet.)
    Last edited by Nick Cox; 19 Feb 2019, 09:35.
Working...
X