Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Remove whiskers from box plot

    Hello,

    Apologies if this has already been resolved in a previous post, but I could not find one - is it possible to remove whiskers from graph box? I just want to show median and IQR (there is an additional category - hospitals - with much longer whiskers that I want to add, which lengthens the axis and makes the other groups (health centre etc.) much harder to read.

    stripplot does not seem to be an option because it does not accept weights.

    Second, the axis(range...) option does not seem to be making any edits to the axis on the graph. I would appreciate any tips for getting the code to work.

    Many thanks for your help,
    Francesca

    Code for graph:
    #delimit ;
    graph hbox n_anydr n_midw n_nurse n_sba if sample==1 & factype>1 [pweight=sample_wgt],
    by(factype, cols(1) note("")) over(ftype, axis(range(0(2)40)))
    noout note("")
    box(1, color(navy) )
    box(2, color(ebblue))
    box(3, color(ebblue*0.4))
    box(4, color(dkgreen))
    plotregion(color(white) ifcolor(white))
    legend(order( 1 "Doctors" 2 "Midwives" 3 "Nurses" 4 "Total SBA") rows(1) size(small))
    medtype(cline)
    xsize(1) ysize(1.2)
    ;
    #delimit cr


    Attached Files
    Last edited by Francesca Cavallaro; 04 Apr 2019, 09:56.

  • #2
    A broadly similar question was asked just about 3 hours earlier than yours.

    https://www.statalist.org/forums/for...median-and-iqr

    Please note (FAQ Advice #12).

    1. We ask that you use .png to show graphs. Many readers will be able to view .tif, but a separate window will open when they do and that makes cross-reference to your question more difficult.

    2. We ask that you explain community-contributed commands you refer to. In this case stripplot is from SSC. You are correct that it does not support weights. That does not make it completely irrelevant.

    As in the linked thread, wanting to show only median and IQR is a little puzzling as the rest of the information is usually interesting and helpful to readers. In your case, as in the cited thread, logarithmic scale would seem a natural way to tame the range. I recommend that approach. Just possibly you have some zeros: even if so there are work-arounds.

    The approach I would suggest -- given what you want to do -- is to reduce your data to median and quartiles for each group of interest. You can use collapse with pweights to get median, p25 and p75.

    Given a sample of 3, then calling up graph box, graph hbox or stripplot, box will show those three values as a box, and only a box. Clearly the middle of 3 values is the median.

    Stata's rules further imply that the lowest of 3 values is echoed as the lower quartile and the highest as the upper quartile. No whiskers are possible because no values can lie beyond the plotted points.

    If any two or three of median and quartiles coincide then you will get a truncated box or even a single line, but that is always true.

    Here is a silly example to make the point. Clearly I can't use your data, but you can run this yourself to see that it works.

    Code:
    sysuse auto, clear
    local what : var label mpg
    collapse (p25) mpg25 = mpg (p50) mpg50 = mpg (p75) mpg75 = mpg, by(foreign)
    list
    reshape long mpg, i(foreign) j(percent)
    list
    graph box mpg, over(foreign) ytitle("`what'")

    Comment


    • #3
      Just to show the stripplot remains valid, once the summary statistics have been calculated.

      Code:
      sysuse auto, clear
      local what : var label mpg
      collapse (p25) mpg25 = mpg (p50) mpg50 = mpg (p75) mpg75 = mpg, by(foreign)
      list
      reshape long mpg, i(foreign) j(percent)
      list
      graph box mpg, over(foreign) ytitle("`what'")
      
      stripplot mpg, over(foreign) ytitle("`what'") box ms(none) vertical

      Comment

      Working...
      X