Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Measuring the Area Under Kernel Density Curve

    Dear Statalist,
    I am studying ethnic disparities in conspicuous consumption on a household level. Following in the steps of previous papers, I plot a Kernel density by ethnicities with total household expenditures on conspicuous goods on x-axis. My professor suggested measuring the area under Kernel density curve by ethnicity to get propensities to consume conspicuous goods and use this as evidence supporting my empirical findings.

    So far, I attempted to get Kernel density estimates using this command:
    kdensity consp if ethnicity=="A", gen(density),
    and get an invalid syntax message.

    What code can help me quantify the area under Kernel density estimation curve and what will be the measure of this value?

  • #2
    The immediate syntax error is presumably that the generate() option needs two arguments, even if you don't care equally about them.

    Otherwise I am at a loss to know how kernel density estimation is either easier or better to work with here than the cumulative distribution function, which you can calculate directly. Indeed, kernel density estimation can smear probability mass into impossible regions, often reckoned to be a small deal if it helps visualization and qualitative understanding, but possibly a problem if you want to read off quantitative estimates.

    Comment


    • #3
      Dear Nick, thank you for such a fast response.

      I got the point about the -gen- function. Would you mind suggesting the other rearranged code (if applicable) or suggesting a way to calculate the KDF?

      Also, is it your belief that at the initial stage of the research and as a part of data summary, visualizing densities by ethnicity is capable of providing some insights into disparities (in the absence of other controls, of course)?
      Last edited by Dina Mamadjanova; 28 Apr 2023, 07:12.

      Comment


      • #4

        cumul is the longstanding official command for calculating cumulative distribution functions. You want to measure areas under two or more such functions. As all cumulative distribution functions define total area 1, you presumably want to calculate areas in particular intervals, but I can't suggest code without a more precise question.

        generate() is an option not a function.

        I can't predict how successful that will be for your data.

        Comment


        • #5
          I'll take it from now. Thank you for advice!

          Comment

          Working...
          X