Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Box plot when you've already got percentiles, not the raw data

    I'd like to make some box and whisker plots, but the issue is that I only have the percentiles, not the underlying data. The percentiles are the locations where the box/whisker lines would be, so I should be able to make the graphs, but for disclosure/privacy reasons I only have the percentiles.
    Can anyone think of a way to either (1) draw a box plot directly from the percentiles, or (2) more roundabout, generate a column of data with the distribution implied by some percentiles, so I could then easily use the graph box command?

    The data is currently organized as below, where each variable I'd like to plot is a row of a CSV, with 5, 25, 50, 75, and 95th percentile for control (0) and treatment (1). Ideally treatment and control would go side by side, as if I used the 'over' option.
    name p050 p250 p500 p750 p950 p051 p251 p501 p751 p951
    zinc_avg 0.47 2.0 4.76 7.265 11.26 3.734 7.393 10.967 13.63 25.12

  • #2
    This code works with what you show and also if there are really multiple observations with the same flavour.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str8 name float p050 byte p250 float(p500 p750 p950 p051 p251 p501 p751 p951)
    "zinc_avg" .47 2 4.76 7.265 11.26 3.734 7.393 10.967 13.63 25.12
    end
    
    local N = _N
    
    gen x0 = 0 
    gen x1 = 1 
    
    local c0 = cond(c(version) >= 18, "stc1", "blue")
    local c1 = cond(c(version) >= 18, "stc2", "red")
    
    forval i = 1/`N' { 
        twoway rbar p250 p500 x0, barw(0.4) lcolor(`c0') fcolor(none) ///
        || rbar p500 p750 x0, barw(0.4) lcolor(`c0') fcolor(none) ///
        || rspike p750 p950 x0, lcolor(`c0') ///
        || rspike p250 p050 x0, lcolor(`c0') /// 
        || rbar p251 p501 x1, barw(0.4) lcolor(`c1') fcolor(none) ///
        || rbar p501 p751 x1, barw(0.4) lcolor(`c1') fcolor(none) ///
        || rspike p751 p951 x1, lcolor(`c1') ///
        || rspike p251 p051 x1, lcolor(`c1') /// 
        xla(0 "control" 1 "treatment", tlc(none)) xsc(r(-0.3 1.3)) /// 
        ytitle("`=subinstr(name[`i'], "_", " ", .)'") aspect(1) legend(off) ///
        note(whiskers to 5 and 95 percentiles) name(G`i', replace)
    }
    Click image for larger version

Name:	zinc.png
Views:	1
Size:	31.6 KB
ID:	1776571

    Comment


    • #3
      The code in #2 wouldn't work with more than one observation in memory.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str8 name float p050 byte p250 float(p500 p750 p950 p051 p251 p501 p751 p951)
      "zinc_avg" .47 2 4.76 7.265 11.26 3.734 7.393 10.967 13.63 25.12
      end
      
      local N = _N
      
      gen x0 = 0 
      gen x1 = 1 
      
      local c0 = cond(c(version) >= 18, "stc1", "blue")
      local c1 = cond(c(version) >= 18, "stc2", "red")
      
      forval i = 1/`N' {    
          
          twoway rbar p250 p500 x0 in `i', barw(0.4) lcolor(`c0') fcolor(none) ///
          || rbar p500 p750 x0 in `i', barw(0.4) lcolor(`c0') fcolor(none) ///
          || rspike p750 p950 x0 in `i', lcolor(`c0') ///
          || rspike p250 p050 x0 in `i', lcolor(`c0') /// 
          || rbar p251 p501 x1 in `i', barw(0.4) lcolor(`c1') fcolor(none) ///
          || rbar p501 p751 x1 in `i', barw(0.4) lcolor(`c1') fcolor(none) ///
          || rspike p751 p951 x1 in `i', lcolor(`c1') ///
          || rspike p251 p051 x1 in `i', lcolor(`c1') /// 
          xla(0 "control" 1 "treatment", tlc(none)) xsc(r(-0.3 1.3)) /// 
          ytitle("`=subinstr(name[`i'], "_", " ", .)'") aspect(1) legend(off) ///
          note(whiskers to 5 and 95 percentiles) name(G`i', replace)
          
      }
      The results in #1 suggest that a logarithmic scale might help.

      Comment


      • #4
        Excellent, the second answer works like a charm for the whole dataset. Thank you, Nick.

        Comment

        Working...
        X