Plot (kernel) density estimates as areas

This is a brief puff for an idea that has become standard in some quarters, but seems to deserve a bigger push until everyone who might care knows about it. Here is a reproducible example, which as always is indicative, not definitive.

Code:

sysuse auto, clear

gen where = _n + 4 in 1/45

local choices kernel(biweight) bw(5) at(where)

kdensity mpg if foreign, `choices' gen(x1 d1)

kdensity mpg if !foreign, `choices' gen(x0 d0)

gen rug1 = -0.004
gen rug0 = -0.008

twoway area d1 d0 where, xtitle("`: var label mpg'") color(orange%40 blue%40) ///
|| scatter rug1 mpg if foreign, ms(|) mc(orange) msize(medlarge) ///
|| scatter rug0 mpg if !foreign, ms(|) mc(blue) msize(medlarge) ///
legend(order(1 "Foreign" 2 "Domestic") pos(1) ring(0) col(1)) ///
ytitle(Probability density) yla(, ang(h)) xla(10(10)40)

Click image for larger version

Name: kdensity.png
Views: 1
Size: 26.2 KB
ID: 1547539

Kernel density estimates are plotted by default in Stata as lines, meaning curves. It is elementary (meaning, fundamental) that area under the curve has an interpretation as probability.

Often area-based graphs say in a complicated way what could be said much more simply. Bad examples include bars with arbitrary bases that could just be replaced by point symbols for the values in question, or bars that start at zero, when not being zero is banal or irrelevant.

However, area graphs can be helpful when comparing two or more distributions. (Histograms work that way.) But then transparency becomes vital to see overlap clearly.

You can do something like this directly with kdensity or twoway density with the option recast(area). There is no special rationale for coding as above, although the default of truncating the density at the observed extremes can be unfortunate, so I typically work a little harder at setting up a wider grid on which to calculate estimates.

The immediate inspiration for this came from an excellent book by Claus Wilke. This is a link to a review I wrote with several detailed comments: https://www.amazon.com/gp/customer-reviews/R22MWD7RJ6QAFP

Announcement

Plot (kernel) density estimates as areas

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: