Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • align histogram and boxplot after combine

    I want to align a histogram and boxplot after combining them in a single graph.

    The code below reproduces what i want, however the alignment is artificial after "manipulating" the left label of the histogram [ylabel(,labgap(-5) tposition(inside))] and the margin of the combined graph [imargin(b=0 l+5 t=0)].

    i would like to align the x axis of both graphs without "manually" introducing adhoc values.

    #delimit ;

    use https://rodrigotaborda.com/ad/data/e...02510_old.dta;

    histogram icfes
    ,
    title("")
    ytitle("")
    xtitle("")
    xsca(alt)
    percent
    xscale(range(280 570))
    xlabel(300(50)550)
    yscale(range(0 20))
    ylabel(5(5)18)
    ylabel(,labgap(-5) tposition(inside))
    name(hist_icfes, replace)
    ;

    graph hbox icfes
    ,
    ytitle("")
    fysize(25)
    yscale(range(280 570))
    ylabel(300(50)550)
    name(box_icfes, replace)
    ;

    graph combine
    hist_icfes
    box_icfes
    ,
    title(ICFES)
    cols(1)
    imargin(b=0 l+5 t=0)
    note(Nota: Histograma muestra el porcentaje.)
    ;

    graph drop
    hist_icfes
    box_icfes
    ;

    Click image for larger version

Name:	distribucion.jpg
Views:	1
Size:	30.6 KB
ID:	1772603

  • #2
    graph hbox is not a twoway command, so I am not surprised at the combination being awkward.

    One way to make progress is to add your own box plot elements piece by piece as twoway calls. Infer from this example that I find the 1.5 IQR rule greatly oversold. Few authors and fewer readers (proportionally) could explain correctly what it does and why it was ever suggested. If you're showing detail beyond the box plot, as you are, you don't need the complication. If you want to see full detail in the tails, use a quantile plot, or something equivalent.

    Code:
    sysuse auto, clear 
    
    su mpg, detail 
    
    foreach v in min max p25 p75 p50 { 
        gen `v' = r(`v') in 1 
    }
    
    twoway__histogram_gen mpg, gen(h x) start(10) width(2) freq
    
    su h, meanonly 
    
    gen where = -r(max) * 0.05 in 1 
    local base = 2 * where[1]
    
    histogram mpg, width(2) start(10) freq ///
    addplot(scatter where p50, ms(Dh) mc(black) ||   ///
    rbar p25 p75 where, horizontal fcolor(none) lc(black) ||      /// 
    rspike min p25 where, horizontal lc(black) || ///
    rspike max p75where, horizontal lc(black)) ysc(r(`base' .)) legend(off)
    For tighter control, I would use nicelabels from the Stata Journal and then calculate the space for the box as a fraction of the maximum label value.
    Click image for larger version

Name:	hist_box.png
Views:	1
Size:	35.3 KB
ID:	1772621

    Comment


    • #3
      great, thanks for your code, this is helpful, i am interested in ways to convey a thorough message in a single graph, your code is quite close to what i want, regards

      Comment


      • #4
        For myself I now often favour a combination of quantile plot, means, and box plot. See e.g. https://www.statalist.org/forums/for...ercentile-sets

        Comment


        • #5
          Here is a tighter script. Naturally much of the point here is to work out your own rules and wire them into your code.

          The point to running twoway__histogram_gen first and then nicelabels is to be able to work out the height of the y axis and add space below for the box plot. If we wanted density or percent or whatever else, then the numbers would be quite different.

          Other way round, this is some way from a really general script as it wires in a variable name and histogram start and width tailored to that variable.


          Code:
          sysuse auto, clear 
          
          capture set scheme stcolor
          
          if _rc == 0 { 
              local mycolors lcolor(stc1) fcolor(stc1*0.2)
          }
          
          su mpg, detail 
          
          foreach v in min max p25 p75 p50 { 
              gen `v' = r(`v') in 1 
          }
          
          twoway__histogram_gen mpg, gen(h x) start(10) width(2) freq
          
          * nicelabels is from Stata Journal 
          nicelabels h, local(yla)
          local max = word("`yla'", -1)
          
          gen where = -`max' * 0.05 in 1 
          local base = 2 * where[1]
          
          histogram mpg, width(2) start(10) freq `mycolors' ///
          addplot(scatter where p50, ms(Dh) mc(black) ||   ///
          rbar p25 p75 where, horizontal fcolor(none) lc(black) ||      /// 
          rspike min p25 where, horizontal lc(black) || ///
          rspike max p75 where, horizontal lc(black)) yla(0 `yla') ysc(r(`base' .)) legend(off)
          
          drop h x

          Comment


          • #6
            thanks, nick, this is excellent

            Comment

            Working...
            X