Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why line, bar, and pie plot dominate publications

    Dear Stata users,

    There are so many plot types in statistical world, line, bar, pie, box, histogram, violin, radar, mosaic, and diagnose series. And more and more new types are being created, heat plot, joy plot, alluvial plot, upset plot, just to name a few. However, line, bar, and pie plot still dominate publications, whether academic papers or popular research reports, especially in social science. Why?

  • #2
    Uhhh, it really depends on the paper/field.... but to put it simply, operating from momentum. People get their degrees without needing to learn violins or others, so they typically just stick with pie, bar, and lines. Not that lines and bars are useless!!!! Very useful, they are indeed.

    But, I would appreciate more violins, heats, and others, especially since they can provide lots of info compactly. Perhaps others, say Nick Cox or Clyde Schechter , can offer their views, but that's the (perhaps oversimplified) take I believe in.

    Comment


    • #3
      Like Jared Greathouse I think a great deal depends on who you are following in what fields, including whether you include all kinds of statistical use (e.g. graphs for business or management) or are most interested in what is aimed at (good or excellent) research publications.

      I'd back up slightly and see most graphs as static and two-dimensional and based (inevitably) or some of all of point, line and area elements. Some people keep telling me that interactive and dynamic graphics are the next wave, or whatever, but to the extent that almost every one -- at least in forums like Statalist -- is focused on graphics for reports, dissertations, journals. that still doesn't seem to be happening much in what I see.

      Chen Samulsion clearly wasn't trying to mention everything, but a striking gap in #1 is scatter plots -- wide sense including dot or strip plots, quantile plots, and so forth. More than 40 years ago John Chambers called scatter plots the workhorses of statistical graphics, and that is still true for many of us.

      I see just about the entire range of possible behaviour. Sometimes when a junior researcher wants to do something that strikes me as ill-advised, it turns out that a senior researcher off-stage is insisting or instructing on it being drawn. Sticking to what is tried and tested and expected in a particular field can make sense and lead to fewer puzzled or hostile reactions from reviewers, or it may just seem unthinking conservatism. Other way round, willingness to try out new ideas is good, but pursuit of novelty or unorthodoxy or sheer wackiness for its own sake is not so good, and sometimes even juvenile.

      People not following the history of ideas closely can get bizarre ideas about what is new. Smoother density curves from say kernel estimation were thought about long ago, and became easier to compute by (say) about 1980, so why any fuss? Default kernel density calculations can do a terrible job if a distribution is very skew and/or bounded at either or both extremes. Joyplots have an attractive name and appeal for partly dubious reasons -- with their evocation of topographic or even anatomical profiles -- but their characteristic overlap is usually a response to the need to compare several distributions rather than an intrinsic feature. Heat plots and mosaic plots have long histories too. An upsetplot is just yet another bar chart; the effort goes into calculating the frequencies or percents to be shown.

      I am puzzled often at why people want to plot non-circular data in circular spaces. Even when there is some element of periodicity, choosing a circular or spiral representation does not usually make such data easier to think about. Strongly seasonal data are often best plotted against a conventional horizontal time axis, and that can work as well as or better than anything else, but having several years and/or several series is always a challenge.

      William S. Cleveland in his wonderful books on statistical graphics rightly emphasised that each graph is encoded data to be decoded by a reader or readers. (Encoding and decoding as terms seem to come from communications engineering and information theory, and before that from cryptography.) Researchers often struggle with the encoding. Even with some experience in Stata graphics, and when I know what I want, a graph can need prior calculations and numerous small iterations and variations before it is halfway presentable. But the job is not done until the decoding has been thought about. Quite a few of the more novel designs fail the encoding-decoding test, in that it is easy to understand in principle what is being plotted -- say, magnitude is represented by the thickness of this curved or wavy line -- but the effectiveness of the graph in conveying overall patterns or even specific detail is much more in doubt.

      P.S. I didn't discuss pie charts.

      Comment


      • #4
        Jared Greathouse Nick Cox , I'm sorry for too late reply, and thank you very much for your valuable answers. I just read an related article written by Hannah Fry:
        https://www.newyorker.com/magazine/2...life-and-death (for whom cannot acess to newyorker website, here is an substitute: https://disqusrefugees.squarespace.c...life-and-death). William Playfair introduced line, bar, and pie plot in 1786/1810, and since then their dominance in data visualization has been gradually founded. And maybe my doubt of the dominance is just an "aesthetic fatigue". In fact, line, bar, pie and scatter plot are revolutionary creations in statistical graph world. Notwithstanding the great success of traditional statistical graphs, I think editors/readers ought to encourage more use of various and useful graphs in publications. As Michael Friendly and Howard Wainer said in A History of Data Visualization and Graphic Communication
        ... visualization of data is an alchemist that can make good scientists great and transform great scientists into giants.
        And Hannah Fry's comments just hit the nail on the head
        Data visualization has progressed from a means of making things tractable and comprehensible on the page to an automated hunt for clusters and connections, with trained machines that do the searching. Patterns still emerge and drive our understanding of the world forward, even if they are no longer visible to the human eye. But these modern innovations exist only because of the original insight that it was possible to think of numbers visually. The invention of graphs and charts was a much quieter affair than that of the telescope, but these tools have done just as much to change how and what we see.

        Comment


        • #5
          There were line and bar plots before Playfair. But these questions of priority are always vexed. Often most credit goes not to the person who was first, but to the person who makes enough fuss about how useful an idea is, whether by example or exhortation, to change collective minds. That is not so outrageous.

          But memes and myths abound. It's often said that Nightingale invented rose diagrams, but she herself acknowledged that the key idea came from William Farr, and others were there earlier. There are even more absurd statements about box plots before Tukey. Box plot variants under different names became a geographical standard in the 1930s, although only geographers seem to know that.

          Comment


          • #6
            Dear Nick Cox, why not you write a book on those issues? History of statistical graphic, as well as consensus and controversies of their applications, not to mention the aesthetic topic debated often and often in this forum, would be full of fun!

            Comment


            • #7
              OP is raising a very interesting question, and there are books written on visualisation and graphics in Statistics/Econometrics.

              But I think OP is bundling up two very different questions, which have very different answers.

              1) Why are people fascinated by useless plots such as the pie chart and the bar plot? There are articles written on how misleading is the pie chart, and I personally cannot see any value of the bar chart over just reporting the numbers in a table or even just as a sequence on a row. (The bar chart is popular in the finance literature; I never saw the reason why.)
              I guess the answer to 1) is that there is an elementary thinker in each of us that is fascinated by shiny bright and colourful objects, hence the popularity of useless plots.

              2) There are the useful and popular scatter plot, line plot in time series, and histogram/kernel density. Here the answer is that they are popular because they are useful.
              I know this, because I am using these plots for m own pleasure and understanding, rather than say to impress my readers and referees.

              Comment


              • #8
                I find it easier to write papers than to write books, like most people. There are many such comments sprinkled around my Speaking Stata Graphics.

                Comment

                Working...
                X