Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    My 2 cents. I've seen box plots that seems to use the Tukey's convention for whiskers in the biomed literature. One example from Lancet:

    https://www.thelancet.com/journals/l...478-9/fulltext

    My (much shorter experience) says that most of the time, if the authors used Stata for its statistical analysis and they created box plots, odds are they used Tukey's convention for whiskers. From discussions I had with peers in this particular topic, not everyone actually realizes that there are different ways to define the whiskers. In my opinion, I feel there are two issues here.

    The first is regarding Stata whiskers options: I do feel Stata would benefit from allowing different definition of whiskers. Although not all researches pay attention to this, some do and I've had 2 or 3 situations in which this particular topic of whisker's definition was discussed for some hours during papers elaboration. Being able to choose between options would make these discussions easier.

    The second point is not related to Stata. Even big journals (such as Lancet) simply don't require you to describe exactly what you are plotting in a box plot, not even in the figure legend, and this bring some confusion to the table. See this other paper from Lancet:

    https://www.thelancet.com/journals/l...549-1/fulltext

    The box plot used here does not follow Tukey's convention, yet nothing is said on the paper about this. Within Lancet (that is perhaps one of the most influential journals in the biomed literature, which is not an argument in itself, but it does demonstrate that this happens in non-obscure journals), there is not a lot of consistency regarding how you plot the whiskers of your box plot. My opinion is that journals should accommodate for different definitions of whiskers as long as they are clearly stated, which is not what I encounter in papers I read.

    Comment


    • #17
      Without wanting to prolong this too much by repeating points already made:

      Igor #16 I too have often encountered box plots presented without explanation of exactly what conventions were used. I agree that this is very poor practice. It may well arise out of ignorance rather than carelessness. Let's just say that sometimes collaborative work is based on awkward compromises, to say nothing of petty power plays. In my own work I have been overruled by the other authors even when they recognised that I was the expert on the point in question! (That case I will leave vague, but relations remain amicable.)

      Piotr #15 I stand by my impression that the Tukey rule is the most commonly used. That's an impression across several literatures, including mainstream statistics. I can't see it as an exaggeration to report the mode, which is all that "most common" means. I doubt that any other convention comes close, and there are many of them. I've seen cases where the whiskers extend to the extremes and many variants in which particular percentiles are used. Further, many people omit whiskers and show data points instead. In very early work (around 1970) Tukey experimented with 1 and 2 as multipliers before settling on 1.5.

      Stata is really strong in medical statistics and epidemiology. For many years now the two main markets for Stata have been economics, political science and sociology on the one hand and medical statistics and epidemiology on the other. If we were to start talking about other medical sciences, perhaps less so.

      SPSS I don't know much about. I did use it a couple of times in the previous millennium. Although the original market was centred on sociology, I have an impression that it remains really popular in psychology and some branches of biology. A little tongue in cheek, I would say that people who think of statistics as statistical testing are more likely to use SPSS than those who think of it as statistical modelling.

      I'd be happy if StataCorp added more flexibility to graph box and graph hbox. My crystal ball is really cloudy right now, but for reasons given I doubt it's a high priority for them. That's not based on anything said directly by developers.
      Last edited by Nick Cox; 06 Jun 2019, 08:22.

      Comment


      • #18
        Thank you, Igor. Excellent post! I fully agree with most of your comments.

        In the first Lancet paper you cite, the legend to fig. 2 says: "Box and whiskers plots indicate median with IQR (boxes) and range (whiskers)". I am not sure if the Tukey criterion is meant here. At least it is not explicitly mentioned, is it? And the legend is confusing in as much as "range" - to my understanding - should include also outliers; they are shown separately and above the upper whisker (i. e. "range"), however. But I looked at the paper only briefly, so maybe I am wrong.

        Best regards,
        P. Lewczuk

        Comment


        • #19
          Thank you, Nick!
          Indeed, let's stop at this stage. Have a good day.

          Comment

          Working...
          X