Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combined plots

    Hi
    I recently came across a graph where instead of making bar graphs the authors/researchers had combined three plots (namely scatter, violin and bar graphs) to show the distribution of a variable over different categories. I am wondering if it can be done in Stata.
    The graph looks as below
    Click image for larger version

Name:	1616770918371.png
Views:	1
Size:	17.1 KB
ID:	1600642



    I am using Stata 15 and my data is a Diversity Index from 1980 to 2015 for different agro-climatic zones.
    Thanks





  • #2
    A source for that example would be appreciated. Where published? I guess it uses R. The dreary grey backdrop colour and white grid lines are reminiscent of ggplot2.

    What you call bar graphs would more usually be called box plots.

    The variable being plotted manifestly takes integer values from 0 to 15 so that the use of jittering and kernel density estimation is obscuring, even obfuscating, what would more plainly and more informatively be shown by histograms with discrete bins for each integer in the sample. The grid lines at 2.5, 7.5 and 12.5 serve no useful purpose. The median and quartiles are all in practice integers although half-integers are possible (and perhaps even other values depending on the precise calculation rule): if any summary measure is to be added I would choose means for such variables; their sensitivity to what is going on in the tails is precisely what is needed and helpful in the absence of marked outliers.

    Still, an example is just an example. And your data are different.

    Diversity index could mean any of several definitions -- some even discrete -- but I would guess at some continuous measure such as Shannon entropy or Gini-Turing-Simpson-Hirschman-Herfindahl-Blau-whoever quadratic entropy, repeat rate or match probability.

    To get exactly that or close to that plot is certainly possible in Stata but I don't think there is one command that does it all. So you would need to write your own program.

    Code:
    search violin
    in Stata points to vioplot on SSC by Nick Winter and Austin Nichols and an older command by Thomas Steichen.

    In official Stata there is naturally dotplot which lets you show medians and quartiles as well as the dot plot (pointillist histogram) that gives the command its name.

    Although not originally written to do that,
    stripplot on SSC has become pretty much a superset of dotplot able to mimic most of what it does and do rather more.

    There are many commands, both official and community-contributed, that will show kernel density estimates.

    The spirit of your example I take to be trying to be respectful of the detail in each distribution while also representing its broad features by helpful summaries. For that there is no universal design that works well independent of your data (including working well for a range of sample sizes) and your purpose and your readers. Increasingly my own starting point is some kind of quantile-box hybrid. A quantile plot shows all the detail that could be of interest without requiring arbitrary decisions about bin origin or bin width or kernel type or kernel bandwidth. An arbitrary decision could well be just to accept a program's defaults, even if they arise from some rule of thumb that is data-dependent.


    If you show the data in detail a box plot need not be too complicated. The most common default -- beyond showing a box with median and quartiles -- is to extend whiskers to the furthest points within 1.5 IQR of the nearer quartile. Although that rule can be encapsulated in a sentence -- I just did that -- it is rarely explained to readers in papers, often mangled in books even by statisticians, and most importantly has lost its original rationale (from some 50 years ago) of being a rule of thumb for deciding which data points should be plotted by hand when using pen and paper [NB]. An older idea than Tukey's whisker rule, which was just one of several he played with, is just to show selected paired quantiles (percentiles, if you like). Back in 1933 Crowe showed octiles.

    Here's one of many examples of what you can do. I show box plots side by side with quantile plots. The box plots are rendered Tufte-style as explained at https://www.statalist.org/forums/for...-without-boxes Since writing that, I have (re-)discovered that the term midgap plot was used by my late colleague at Durham, Allan Seheult.

    To show geometric means using this syntax you need to install egenmore from SSC (or write your own egen function gmean(). instead).

    Again, I use geometric means here as appropriate for these data. Other datasets could easily lead to other choices as better.


    Code:
    webuse nlsw88, clear 
    
    set scheme s1color 
    
    egen median = median(wage), by(race)
    
    gen where = race - 0.12
    
    stripplot wage, over(race) cumul cumprob  box(blcolor(none)) boffset(-0.12) pctile(1) vertical addplot(scatter median where, ms(Dh)) refline reflevel(gmean) xla(, noticks) yla(, ang(h))


    Click image for larger version

Name:	qbox.png
Views:	1
Size:	35.9 KB
ID:	1600701



    Naturally what works best for your own data can only be established by experiment but posting a sample dataset would allow some playing around.


    Comment


    • #3
      Thank you very much for your reply (Apologies for not providing the reference of the paper). Yeah you are right the diversity index is the Simpson diversity index. It is continuous ranging from 0 to 1 (Should have mentioned it)


      The spirit of your example I take to be trying to be respectful of the detail in each distribution while also representing its broad features by helpful summaries.
      Yeah you are absolutely right. That is the broad idea

      Increasingly my own starting point is some kind of quantile-box hybrid. A quantile plot shows all the detail that could be of interest without requiring arbitrary decisions about bin origin or bin width or kernel type or kernel bandwidth.
      Thank you for this suggestion Nick. I will start exploring, starting with the plot you generated.















      Comment

      Working...
      X