Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Histogram for groups - Overlaying median line

    I have a histogram graph comparing distribution of continuous variable in two groups. I wish to place a median line specific to each group on this histogram.

    I have tried xline but the value I will use only applies to one group but the line is repsresented on both groups.

    Any suggestions?

    Thank you.

  • #2
    It is hard to comment on your code because you don't show any. There is no problem with specifying two or more arguments to xline(), but it is usually better to do something that makes clear which line is which.

    Several solutions for added lines, and I doubt that https://journals.sagepub.com/doi/pdf...6867X241276116 (2024 paper) exhausts the possibilities.

    The histogram legend below would be better horizontal (one row).

    Code:
    capture set scheme stcolor 
    
    sysuse auto, clear 
    
    egen median = median(mpg), by(foreign)
    
    gen where = 30
    
    su median, meanonly 
    local pos = (r(min) + r(max))/2 
    
    twoway ///
    histogram mpg if foreign == 0, percent start(10) width(2) lcolor(stc1) fcolor(stc1%10) ///
    || spike where median if foreign == 0, color(stc1) ///
    || histogram mpg if foreign == 1, percent start(10) width(2) lcolor(stc2) fcolor(stc2%10) ///
    || spike where median if foreign == 1, color(stc2) ///
    legend(order(1 "Domestic" 3 "Foreign" ) pos(12)) text(30 `pos' "medians") ///
    ytitle(Percent) xtitle(Miles per gallon) xla(10(5)40) name(H, replace)
    
    bysort foreign (mpg) : gen x = cond(_n == 1, 0, cond(_n == _N, 1, .)) 
    
    qplot mpg, by(foreign, legend(off) note(medians are horizontals)) ms(O) ///
    xla(0 1 .25 "0.25" .5 "0.5" .75 "0.75") addplot(line median x) ///
    xtitle(Fraction of data) name(Q, replace)
    Click image for larger version

Name:	Daniel_H.png
Views:	1
Size:	31.4 KB
ID:	1778046
    Click image for larger version

Name:	Daniel_Q.png
Views:	1
Size:	34.5 KB
ID:	1778047



    You may not be superimposing histograms, but the technique above carries across to other designs, namely any juxtaposition.

    Clearly the use of 30 is empirical for the variable and choices concerned: you may need to iterate.

    Histograms are easy in principle, but often hard in practice to do really well.

    In contrast quantile plots don't need so many choices. qplot is from the Stata Journal.

    Comment

    Working...
    X