Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Manipulating box plots

    Good morning,
    Is it possible to redefine box plots, for example to have something more meaningful than mysterious "adjacent values" ("the most extreme values within 1.5 iqr of the nearer quartile" - ???) as whiskers, for example 10 - 90 percentiles?
    Thank you in advance for answering.
    Piotr Lewczuk

  • #2
    The criterion you mention is meaningful and is the most common still in use and explained in many places.

    But you can get something different with by just calculating what you want to show and then plotting it. egen is a convenient handle.

    Alternatively, stripplot (SSC) allows many variants on this idea, say

    Code:
    sysuse auto, clear
    stripplot mpg, over(foreign) ms(Sh) vertical stack yla(, ang(h)) height(0.3) box(barw(0.08)) pctile(10) boffset(-0.16)
    It's a weakness of stripplot that it can't read your mind on precisely what you want, so you may have to fiddle with your options to get choices that match the data, what you want to show and how you want to show it.

    Click image for larger version

Name:	boxpctile.png
Views:	1
Size:	20.3 KB
ID:	1501699

    Comment


    • #3
      Nick,
      Thank you very much, but kill me if I have ever seen any single research paper, at least in biomedical sciences, presenting a figure with "adjacent values" for 25+ years in this business. Not to say it is "the most common"...
      Or is it perhaps known under any other names?
      Regards,
      Piotr Lewczuk

      Comment


      • #4
        Stata uses the original Tukey convention that points are plotted individually if and only if they fall outside [lower quartile - 1.5 IQR, upper quartile + 1.5 IQR], This is explained in many textbooks across many sciences. I picked two medical statistics texts off my shelves, Martin Bland and van Belle/Fisher/Heagerty/Lumley, and found that they both explain this convention, so I stopped there. It's not universal, as documented in the help for stripplot, but it's very, very common.

        I know the literature on this rather well!

        Comment


        • #5
          No doubt you know the literature; trust me I know it too. I also don't argue that some older textbooks explain it; nevertheless, in 25 years of my academic career I have never seen a paper utilizing it. As a permanent reviewer and associate editor in biomed journals, my first reaction, if I saw it, would be: redo it to something simpler. Again, I don't argue that in different fields it might be different, but I think it would make sense to give users opportunity to redefine how box and whiskers (and many other graphics in Stata) look like. Just in case they send a paper to a stubborn editor/reviewer like me.

          By the way, I came across YOUR excellent paper explaining an approach how to manipulate graphs - that's exactly the starting point for what I need. Thank you!

          https://journals.sagepub.com/doi/pdf...867X0900900309

          Comment


          • #6
            Look out for the correction note in 2013 or so. In fact there is a further small error in the 2009 paperI need to correct, but it won't bite you if you are using a different criterion. My own view is that whiskers to stated percentiles is a better convention than Tukey's but then again I typically show all the data somehow, as in #2.

            Comment


            • #7
              Thank you, Nick!
              You say: "My own view is that whiskers to stated percentiles is a better convention than Tukey's...", and that's exactly what I was trying to point at! I'm glad we seem to be of one mind. It would be nice to be able to do it easier way than by calculating egen's the hard way and then writing ten-lines code line. Please at least consider in the development.
              Regards,
              P. Lewczuk

              Comment


              • #8
                I don't understand what you are asking for as stripplot exists to do this and much else, which was the main point of #2 -- unless you are saying that you want the official commands to be more flexible, which is not my decision to make.

                Comment


                • #9
                  In other packages (Statistica, Graphpad, MedCalc) you double click on whiskers (to stick to this example), a window pops up, and you choose what they should present. The same functionality would be - IMHO - nice to have in Stata. Of course you could leave your Tukey's rule as default, if you necessarily want, but other options should be available in an easier approach.
                  I understand that it is not your decision, but I do hope this forum is read also by those, who make development decisions.
                  Consider it a suggestion for Stata optimization.

                  Comment


                  • #10
                    Thanks for the clarification.

                    If StataCorp change anything it's likely to be by adding options which would then appear also within the dialog box.

                    FWIW, I have implemented percentile-based whiskers because I wanted them myself -- and I have often recommended them here and on Cross Validated, -- but I don't detect much spontaneous dissatisfaction with the Tukey rule among users.

                    Comment


                    • #11
                      Thanks from my side, too.
                      Should you come across of any paper in biomed literature using the Tukey rule for data presentation, please let me know - I would really like to see it used, not only explained in textbooks.

                      Comment


                      • #12
                        I am not a biostatistician, epidemiologist, medic or biological scientist, so I don't read biomedical papers directly: sometimes I come across such references for other reasons.

                        I have to see this the other way round: if it's in textbooks that are widely used, then that is enough to refute the claim, which you do appear to make, that the rule is not used in your field(s). Here are two more textbook references which explain the Tukey rule Woodward, Epidemiology; Armitage, Berry and Matthews, Statistical Methods in Medical Research.

                        Comment


                        • #13
                          A very interesting point of view. But try to see it from the other side of the barricade: if a scientist, a reviewer, or an editor wants to have a different way of data presentation, they should at least have this option. With years (if not decades) of experience in all three roles, I dare objectively say I know how this works: after hearing your argumentation, some users will simply start considering migration to other statistical packages, where they are offered more flexibility. No good for StataCorp., right?

                          Of course you can try to convince all the field that your approach is the only possible and correct one because it is in textbooks. Of course you can try to convince everybody to always ask advice of a profi biostatistician. Of course you can argue that writing ten-lines code to make a simple figure is not that laborious after all, but... well... good luck...

                          My first reaction was to copy&paste my data to another package, I fortunately still keep; the most silly thing one can imagine (data analysis in one package and making simple figures in another), but then I started searching and - interestingly enough - among other things I came across 15-year-old posts from people obviously wishing the same (https://www.stata.com/statalist/arch.../msg00205.html).

                          Comment


                          • #14
                            My research experience started in 1973, so waving lengthy experience at me isn't going to work!

                            More importantly, and more seriously, you're misreading me.

                            To repeat, I personally have come to avoid the Tukey rule and have implemented alternatives. You missed the irony in #2: I was pointing out that you could do this with egen, but I wasn't advising that people do that. I was hinting and showing that stripplot lets you do this directly. I have encountered people who will not use community-contributed commands, period: their line is that they don't know how good a community-contributed command is and they do know that the company don't and won't support it.

                            Also, if you study the help for stripplot you will send enormous, not to say obsessive, documentation of all sorts of variants in the literature. If you can find me saying anywhere ever that the Tukey rule is the only possible and correct rule, then I would be amazed, but I can agree that if I did say that it was a silly thing to say. Nor naturally is it "my approach": it's Tukey's original approach.

                            But as far as users are concerned, this alternative in stripplot is available, unless exceptionally users are behind a firewall so strong that community-contributed commands are inaccessible.
                            People can make a choice.

                            Some of my commands have been folded into official Stata, and many have not been, so I have some experience of what gets adopted and what doesn't.

                            There are essentially two main sides to StataCorp other than the sides that are needed for any company its size: development and marketing, those who focus on developing Stata and those selling it, with technical services betwixt and between.

                            In choosing what to implement, the company works on various criteria, which include (1) what developers are minded to implement themselves; (2) what marketing are telling development repeatedly that users are asking for; (3) whether community-contributed commands are available;

                            On what users are asking for, I favour my guess rather than yours, unsurprisingly. I've been attending users' meetings and following Statalist for just about 25 years and I don't think this request comes up often. Indeed, I have to lobby people to try something different from plain box plots in many cases. You cite a post from 2004 asking for it. You don't cite my post in that thread https://www.stata.com/statalist/arch.../msg00205.html but you are aware of my 2009 paper, which was a delayed response to that thread (and many other stimuli). I don't recall when box plot options were added to stripplot, but it was several years ago.

                            The scenario of people migrating to other software because they can't get this variant of box plot in Stata seems far-fetched to me, but if you know real examples, then that's the kind of story StataCorp wants to know about.

                            Comment


                            • #15
                              Nowhere I have attempted to waive my length of experience in science *against* yours! By far it was not my intention. I was only trying to say that my experience in *biomedicine and neuroscience* (but NOT in statistics) justifies my opinion on what researchers of these fields use (or don't use) or expect. I repeat: saying that the criterion we are discussing "is the most common still in use" (again - in biomedical research!) is a huge exaggeration. Or was is meant ironically, too?

                              I read the help of -stripplot- and your paper - great and very helpful things, let me repeat it! Let me also repeat: my point at this stage of our small conversation is only that it would be good to have it implemented in a user-friendly way in the program's interface.

                              On a different tune: it would be interesting to see which statistical packages are the most popular by a research field. I guess Stata is not that common in biomedicine. My spontaneous guess would be SPSS and R - do you happen to know it? Could it explain, why you don't hear so may requests from researchers of these fields?

                              Comment

                              Working...
                              X