Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Kernel density plot outside the minimum allowed value

    This question is not about treatment effects but only about a kernel density produced by a treatment effects command of Stata.
    Code:
    tebalance density variable
    plots the kernel densities in the enclosed picture file. The minimum value of the variable for which the plot is produced is 0. However, the density suggests there are values below 0. This somewhat odd behavior must be due to the kernel smoothing. But I would still expect Stata to start the plot from 0. But it does not. Anyhow, is there a way to force Stata to start the plot from 0?



    Attached Files

  • #2
    Your graph tells the story. As implemented here kernel density estimation knows and does nothing about what is and is not feasible. How would Stata see monthlywageday and understand that zero is a limit so far as you are concerned for such a variable?

    There are methods for producing density estimates over a limited range only. They depend on a method being told directly or indirectly about the support. For example, you can estimate density on log scale (and then back-transform appropriately if you want to see density estimates for the original scale). Then again, there are methods that reflect probability mass back from forbidden intervals into the allowed space. I am not aware of any implementation of such methods in official Stata.

    As I understand it you would need to work on log scale to respect positive support for your variable. However, the implication of your graph is of a bimodal distribution.

    Comment


    • #3
      Thanks a lot. I understand that the kernel, for whatever reason, can lead to displaying values outside the support. I will have to live with that.

      Comment


      • #4
        Again, your graph shows -- let's emphasise this -- very strongly bimodal distributions. Look at the zero estimates for probability density in the middle of each distribution. There is, I guess, a simple story here, say a mix of part-time and full-time workers, but it seems unlikely that any kind of kernel estimates will help at all, so whether the estimates smear probability into impossible regions is a side-issue.

        Yet further, the graphs seem consistent with just two distinct values in each case. I think I see the shape of the default Epanechnikov kernel echoed in the graph as two data spikes smeared out.

        With the auto data I can do this,

        Code:
        . sysuse auto, clear
        (1978 Automobile Data)
        
        . kdensity foreign, xli(0 1)
        Click image for larger version

Name:	density.png
Views:	1
Size:	26.2 KB
ID:	1580417



        What's noticeable is that

        1. Stata doesn't squawk that I asked it to do something silly. (Some years ago, a former colleague protested that statistical packages should be programmed to forbid silly requests. I asked for an example, which was that regressions should be forbidden if the variables concerned were not normally distributed. I had an uphill struggle explaining that isn't wrong!)

        2. There is nothing for the kernel estimate to discover. The variable is binary and there is nothing else that the estimate can tell you legitimately. It's unlike the standard case in which a sample distribution shows spikes and gaps as a matter of sampling and measurement quirks and the estimate should clean that up to some degree.

        Now that's an extreme case and your variable isn't defined as binary, but the serious question is what you expect the density estimates to tell you. .
        Last edited by Nick Cox; 05 Nov 2020, 06:16.

        Comment

        Working...
        X