Kernel density plot outside the minimum allowed value

Tunga Kantarci

Join Date: Oct 2015

Posts: 90
#1

Kernel density plot outside the minimum allowed value

03 Nov 2020, 15:51

This question is not about treatment effects but only about a kernel density produced by a treatment effects command of Stata.

Code:

tebalance density variable

plots the kernel densities in the enclosed picture file. The minimum value of the variable for which the plot is produced is 0. However, the density suggests there are values below 0. This somewhat odd behavior must be due to the kernel smoothing. But I would still expect Stata to start the plot from 0. But it does not. Anyhow, is there a way to force Stata to start the plot from 0?

Attached Files
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35466
#2

03 Nov 2020, 19:27

Your graph tells the story. As implemented here kernel density estimation knows and does nothing about what is and is not feasible. How would Stata see monthlywageday and understand that zero is a limit so far as you are concerned for such a variable?

There are methods for producing density estimates over a limited range only. They depend on a method being told directly or indirectly about the support. For example, you can estimate density on log scale (and then back-transform appropriately if you want to see density estimates for the original scale). Then again, there are methods that reflect probability mass back from forbidden intervals into the allowed space. I am not aware of any implementation of such methods in official Stata.

As I understand it you would need to work on log scale to respect positive support for your variable. However, the implication of your graph is of a bimodal distribution.
Comment
Tunga Kantarci

Join Date: Oct 2015

Posts: 90
#3

05 Nov 2020, 04:34

Thanks a lot. I understand that the kernel, for whatever reason, can lead to displaying values outside the support. I will have to live with that.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35466
#4

05 Nov 2020, 06:05

Again, your graph shows -- let's emphasise this -- very strongly bimodal distributions. Look at the zero estimates for probability density in the middle of each distribution. There is, I guess, a simple story here, say a mix of part-time and full-time workers, but it seems unlikely that any kind of kernel estimates will help at all, so whether the estimates smear probability into impossible regions is a side-issue.

Yet further, the graphs seem consistent with just two distinct values in each case. I think I see the shape of the default Epanechnikov kernel echoed in the graph as two data spikes smeared out.

With the auto data I can do this,

Code:

. sysuse auto, clear (1978 Automobile Data) . kdensity foreign, xli(0 1)

What's noticeable is that

1. Stata doesn't squawk that I asked it to do something silly. (Some years ago, a former colleague protested that statistical packages should be programmed to forbid silly requests. I asked for an example, which was that regressions should be forbidden if the variables concerned were not normally distributed. I had an uphill struggle explaining that isn't wrong!)

2. There is nothing for the kernel estimate to discover. The variable is binary and there is nothing else that the estimate can tell you legitimately. It's unlike the standard case in which a sample distribution shows spikes and gaps as a matter of sampling and measurement quirks and the estimate should clean that up to some degree.

Now that's an extreme case and your variable isn't defined as binary, but the serious question is what you expect the density estimates to tell you. .

Last edited by Nick Cox; 05 Nov 2020, 06:16.
Comment

Announcement

Kernel density plot outside the minimum allowed value

Comment

Comment

Comment