Getting the value of kernel density function estimate at specific coordinate

Tom Storwitz

Join Date: Nov 2019

Posts: 11
#1

Getting the value of kernel density function estimate at specific coordinate

17 Nov 2021, 01:48

Dear all,

I estimate the kernel density of variable var1 and at the mean of var1 I would like to draw a vertical dashed line from the horizontal axis to the curve. I know how to draw a line in twoway Stata graphs, but what I would like to ask is if there is a general way of finding out the function value of the estimated kernel density at a specific (horizontal) coordinate - in my case at the mean of var1.

Thanks in advance and best regards
Tom
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35697

17 Nov 2021, 02:55

With a little bit of work you can arrange that the mean is one of the points for which the density is estimated. As standard, I'd underline that there are many estimates depending on the kernel type, bandwidth, whether the estimation pays attention to boundary conditions, and so forth.

This isn't general code; the point is that I think you need code for the purpose that covers the support (and some more perhaps).

Code:

. sysuse auto, clear
(1978 automobile data)

. su mpg, meanonly

. gen grid = cond(_n <= 20, r(mean) - _n, cond(_n == 21, r(mean), cond(_n < 41, r(mean) + _n - 20, .)))
(34 missing values generated)

. kdensity mpg, at(grid) gen(wanted)

. sort grid

.     l grid wanted if wanted < .

     +----------------------+
     |     grid      wanted |
     |----------------------|
  1. | 1.297297           0 |
  2. | 2.297297           0 |
  3. | 3.297297           0 |
  4. | 4.297297           0 |
  5. | 5.297297           0 |
     |----------------------|
  6. | 6.297297           0 |
  7. | 7.297297           0 |
  8. | 8.297297   .00136238 |
  9. | 9.297297   .00287074 |
 10. |  10.2973   .00799528 |
     |----------------------|
 11. |  11.2973   .01444916 |
 12. |  12.2973   .02188993 |
 13. |  13.2973   .02999259 |
 14. |  14.2973   .04122117 |
 15. |  15.2973    .0543175 |
     |----------------------|
 16. |  16.2973   .06368328 |
 17. |  17.2973   .07044922 |
 18. |  18.2973   .07275096 |
 19. |  19.2973   .07385776 |
 20. |  20.2973   .07123286 |
     |----------------------|
 21. |  21.2973   .06692756 |
 22. |  23.2973   .05275207 |
 23. |  24.2973   .04833418 |
 24. |  25.2973   .04221761 |
 25. |  26.2973   .03634904 |
     |----------------------|
 26. |  27.2973   .03130446 |
 27. |  28.2973   .02501904 |
 28. |  29.2973   .01865703 |
 29. |  30.2973   .01497124 |
 30. |  31.2973   .01399729 |
     |----------------------|
 31. |  32.2973   .01164883 |
 32. |  33.2973   .00997121 |
 33. |  34.2973   .00801714 |
 34. |  35.2973   .00678842 |
 35. |  36.2973   .00586856 |
     |----------------------|
 36. |  37.2973   .00504458 |
 37. |  38.2973   .00358713 |
 38. |  39.2973   .00219631 |
 39. |  40.2973   .00223728 |
 40. |  41.2973   .00228501 |
     +----------------------+

Comment

Tom Storwitz

Join Date: Nov 2019

Posts: 11
#3

17 Nov 2021, 03:20

Dear Nick,

Thank you very much for your quick and helpful response. Just to make sure I understand correctly: So there is no other way to access the function values of the estimated kernel density than explicitly defining a grid (that includes the mean, as in your example) at whose points the kernel density will be estimated, and then getting the function values at those points? I thought, since Stata does smooth the curve between the points it selects for the kernel density estimates and thus determines function values at each coordinate, one might perhaps be able to access the function value at arbitrary points (regardless of the underlying grid)...

Best
Tom
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35697
#4

17 Nov 2021, 03:40

kdensity lets you generate() a variable which is what we are doing here. If you are wondering whether the density estimates are also saved as r() or e() results, the answer is no.

In general, estimated parameters that are important will be individually accessible as saved results, but when what is being estimated is in effect a function, as here, the way Stata lets you get at the results is through a new variable.

It's a good point because Stata does regard parameters being estimated as results you might want to access directly.

As I guess you know, kernel density estimation pays no attention to whether any point at which density is being estimated is also something else, such as being (or being close to) a mean or even a mode. A variant on your question I've seen more often is how to access 'the' mode.

I am reminded of a nice remark that nonparametric problems are really problems with an infinite number of parameters!
Comment

FernandoRios

Join Date: Apr 2014
Posts: 2469

17 Nov 2021, 05:34

In addition to Nick's helpful advice.
While kdensity does not create a function (because there is none), it does provide you with all the information you need to estimate a particular set of parameters. You just need to understand how kdensity creates the underlying figure.
For example:

Code:

webuse nlswork
** using gaussian kernel because it is the easiest one to replicate
kdensity ln_wage, kernel(gaussian)
return list
global bw = r(bwidth)
** tracing a line at the mean, the 10th and 90th pctile
sum ln_wage,d
global p10= r(p10)
global p90= r(p90)
global mean= r(mean)
** getting the kdensity
gen aux1 = normalden(ln_wage,$p10,$bw)
gen aux2 = normalden(ln_wage,$p90,$bw)
gen aux3 = normalden(ln_wage,$mean,$bw)
sum aux1
global k10 = r(mean)
sum aux2
global k90 = r(mean)
sum aux3
global kmean = r(mean)

two (kdensity ln_wage, kernel(gaussian) xline($p10 $p90 $mean) yline($k10 $k90 $kmean) ) ///
       (scatteri $k10 $p10 "q10" $k90 $p90 "q90" $kmean $mean "mean")

For other densities, one just needs to construct the corresponding kernel function.
HTH

Comment

Tom Storwitz

Join Date: Nov 2019

Posts: 11
#6

17 Nov 2021, 08:42

Many thanks, Nick and Fernando!
Comment

Announcement