Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple variables grouped by one variable, on one bar graph showing median and IQR

    Hi,

    I am trying to create a similar graph as below on STATA, displaying different independent variables by two groups, with data expressed as Median and IQR. Can you help me with the commands please?
    Attached Files

  • #2
    This is going to be one of those annoying answers, that answers the question (I think) you should have asked rather than the question you did ask. All I can say in my defense is that I mean well...

    A bar chart can have value because it forces you to include 0 on the y-axis, and thus prevents you from suggesting a pattern by creatively cropping the y-axis. However, in the graph you show the y-axis is on a log scale, so it cannot include the value 0. So in this case bars aren't that useful. Instead it would make much more sense to use boxplot (help graph box).
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Bars on log scale that start arbitrarily at 0.1 are in my view obnoxious statistically and graphically.

      More crucially, there is widespread objection to what are variously called detonator, dynamite or plunger plots. http://biostat.mc.vanderbilt.edu/wik...de/Poster3.pdf covers the main points in one page.

      All that aside, plotting median and iqr on logarithmic scale is reminiscent of box plots on logarithmic scale, as usually to a good approximation log of median = median of log and similarly for quartiles. But if tempted to use graph box or graph hbox for this purpose, check out https://www.stata.com/support/faqs/g...ithmic-scales/ for a crucial warning.

      You don't give any data example. Please note https://www.statalist.org/forums/help#stata and in passing https://www.statalist.org/forums/help#spelling

      Presumably the * is some significance test flag, which I won't address.

      Here is some technique with data all Stata users can access. I use stripplot (SSC) and show the data too, often regarded as a good idea in statistics.

      Code:
      sysuse auto, clear
      set scheme s1color
      stripplot price, box vertical over(foreign) ///
      by(rep78, note("Repair record 1978", size(medsmall) pos(12)) row(1)) /// 
      ysc(log) ytitle(Price (USD)) subtitle(, fcolor(eltgreen*0.2))
      Click image for larger version

Name:	median_iqr.png
Views:	1
Size:	39.9 KB
ID:	1491820


      Clearly some of the cosmetic choices there are no more than personal caprice.

      Otherwise if you're determined to mimic the design you shown, then you need twoway bar and twoway rcap.

      Comment


      • #4
        Thank you for that. I tried using box plots instead, but the outliers are shown on the graph, this is why, I resorted to bar graph.

        Comment


        • #5
          As said, box plots are consistent with logarithmic scales, just not the way that graph box draws them by default. If you take logarithms first, the problem disappears, apart from fixing the axis labels.

          And as a writer and as a reader I always want outliers to be visible, never suppressed.
          Last edited by Nick Cox; 04 Apr 2019, 07:45.

          Comment


          • #6
            Thank you Nick for your valuable feedback. Apologies as it's my first time posting on the forum. I haven't logged my data, although they are not normally distributed. I am trying to look at the differences in gene expression levels between preterm and term birth groups. Given the data isn't normally distributed, I have to do a Mann-Whitney U test and I was trying to find the best graphical representation for that. I will take your expert advice on doing box plots, with visible outliers. I have so many different genes, and can't figure how to put them all the box plots on one graph.

            Comment


            • #7
              Not being normally distributed really doesn't mean that Mann-Whitney tests are your only alternative. Who is telling you this? It's just not true, or minimally vastly over-simplified. Further, non-normality is always a matter of degree.

              Comment


              • #8
                The statistical course I attended at my university stated that non-parametric tests should be used for non-normally distributed data. I had to repeat all my results as I had previously ran independent t-tests and ANOVA for most analyses. Would appreciate any help on this.

                Comment


                • #9
                  It's hard to summarize a large chunk of statistics, but -- without trying to be comprehensive --

                  1. The t test often works well if assumptions (often better thought of as "ideal conditions") are only roughly satisfied. Rupert G. Miller's 1986 book Beyond ANOVA (London: Chapman and Hall) has not, as far as I know, been bettered in carefully discussing which assumptions are important and which less so.

                  2. Transformations often help mightily. Although not, it seems, showing your data, the graph in #1 is a typical example of skewed data that are approximately symmetric on log scale. (Yet another reason for considering the design appalling is that the bars might appear to suggest a tail down to low values while there is no information on the other tail. Yet if you look at median and quartiles, the only real information in the graph, they indicate approximate symmetry.)

                  3. There is plenty of machinery for non-normal distributions, e.g. gamma and Poisson, much of it under headings like generalised linear models. This point overlaps with #2, as using an appropriate link function for the outcome can remove the need to transform it.

                  4. Confidence intervals for differences are often more interesting and useful than significance tests for whether there "really" is a difference, and bootstrapping often allows such intervals to be data-driven and not resting shakily on unrealistic premises.

                  First courses in statistics can't be second courses in statistics but they often convey myths and misunderstandings that have to be unlearned by researchers.

                  Comment

                  Working...
                  X