Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Getting the value of kernel density function estimate at specific coordinate

    Dear all,

    I estimate the kernel density of variable var1 and at the mean of var1 I would like to draw a vertical dashed line from the horizontal axis to the curve. I know how to draw a line in twoway Stata graphs, but what I would like to ask is if there is a general way of finding out the function value of the estimated kernel density at a specific (horizontal) coordinate - in my case at the mean of var1.

    Thanks in advance and best regards
    Tom

  • #2
    With a little bit of work you can arrange that the mean is one of the points for which the density is estimated. As standard, I'd underline that there are many estimates depending on the kernel type, bandwidth, whether the estimation pays attention to boundary conditions, and so forth.

    This isn't general code; the point is that I think you need code for the purpose that covers the support (and some more perhaps).

    Code:
    . sysuse auto, clear
    (1978 automobile data)
    
    . su mpg, meanonly
    
    . gen grid = cond(_n <= 20, r(mean) - _n, cond(_n == 21, r(mean), cond(_n < 41, r(mean) + _n - 20, .)))
    (34 missing values generated)
    
    . kdensity mpg, at(grid) gen(wanted)
    
    . sort grid
    
    .     l grid wanted if wanted < .
    
         +----------------------+
         |     grid      wanted |
         |----------------------|
      1. | 1.297297           0 |
      2. | 2.297297           0 |
      3. | 3.297297           0 |
      4. | 4.297297           0 |
      5. | 5.297297           0 |
         |----------------------|
      6. | 6.297297           0 |
      7. | 7.297297           0 |
      8. | 8.297297   .00136238 |
      9. | 9.297297   .00287074 |
     10. |  10.2973   .00799528 |
         |----------------------|
     11. |  11.2973   .01444916 |
     12. |  12.2973   .02188993 |
     13. |  13.2973   .02999259 |
     14. |  14.2973   .04122117 |
     15. |  15.2973    .0543175 |
         |----------------------|
     16. |  16.2973   .06368328 |
     17. |  17.2973   .07044922 |
     18. |  18.2973   .07275096 |
     19. |  19.2973   .07385776 |
     20. |  20.2973   .07123286 |
         |----------------------|
     21. |  21.2973   .06692756 |
     22. |  23.2973   .05275207 |
     23. |  24.2973   .04833418 |
     24. |  25.2973   .04221761 |
     25. |  26.2973   .03634904 |
         |----------------------|
     26. |  27.2973   .03130446 |
     27. |  28.2973   .02501904 |
     28. |  29.2973   .01865703 |
     29. |  30.2973   .01497124 |
     30. |  31.2973   .01399729 |
         |----------------------|
     31. |  32.2973   .01164883 |
     32. |  33.2973   .00997121 |
     33. |  34.2973   .00801714 |
     34. |  35.2973   .00678842 |
     35. |  36.2973   .00586856 |
         |----------------------|
     36. |  37.2973   .00504458 |
     37. |  38.2973   .00358713 |
     38. |  39.2973   .00219631 |
     39. |  40.2973   .00223728 |
     40. |  41.2973   .00228501 |
         +----------------------+
    .

    Comment


    • #3
      Dear Nick,

      Thank you very much for your quick and helpful response. Just to make sure I understand correctly: So there is no other way to access the function values of the estimated kernel density than explicitly defining a grid (that includes the mean, as in your example) at whose points the kernel density will be estimated, and then getting the function values at those points? I thought, since Stata does smooth the curve between the points it selects for the kernel density estimates and thus determines function values at each coordinate, one might perhaps be able to access the function value at arbitrary points (regardless of the underlying grid)...

      Best
      Tom

      Comment


      • #4
        kdensity lets you generate() a variable which is what we are doing here. If you are wondering whether the density estimates are also saved as r() or e() results, the answer is no.

        In general, estimated parameters that are important will be individually accessible as saved results, but when what is being estimated is in effect a function, as here, the way Stata lets you get at the results is through a new variable.

        It's a good point because Stata does regard parameters being estimated as results you might want to access directly.

        As I guess you know, kernel density estimation pays no attention to whether any point at which density is being estimated is also something else, such as being (or being close to) a mean or even a mode. A variant on your question I've seen more often is how to access 'the' mode.

        I am reminded of a nice remark that nonparametric problems are really problems with an infinite number of parameters!

        Comment


        • #5
          In addition to Nick's helpful advice.
          While kdensity does not create a function (because there is none), it does provide you with all the information you need to estimate a particular set of parameters. You just need to understand how kdensity creates the underlying figure.
          For example:

          Code:
          webuse nlswork
          ** using gaussian kernel because it is the easiest one to replicate
          kdensity ln_wage, kernel(gaussian)
          return list
          global bw = r(bwidth)
          ** tracing a line at the mean, the 10th and 90th pctile
          sum ln_wage,d
          global p10= r(p10)
          global p90= r(p90)
          global mean= r(mean)
          ** getting the kdensity
          gen aux1 = normalden(ln_wage,$p10,$bw)
          gen aux2 = normalden(ln_wage,$p90,$bw)
          gen aux3 = normalden(ln_wage,$mean,$bw)
          sum aux1
          global k10 = r(mean)
          sum aux2
          global k90 = r(mean)
          sum aux3
          global kmean = r(mean)
          
          two (kdensity ln_wage, kernel(gaussian) xline($p10 $p90 $mean) yline($k10 $k90 $kmean) ) ///
                 (scatteri $k10 $p10 "q10" $k90 $p90 "q90" $kmean $mean "mean")
          For other densities, one just needs to construct the corresponding kernel function.
          HTH

          Comment


          • #6
            Many thanks, Nick and Fernando!

            Comment

            Working...
            X