Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Labeling a density curve

    Sorry, I'm too stupid to figure this out myself.

    I want to make a density curve, so I'll do this:

    Code:
    sysuse auto, clear
    
    kdensity mpg, gen(x y) at(mpg) nograph
    
    // Density curve
    twoway (line y x, sort)
    I then want to label a point on the density curve:

    Code:
    // Label density curve
    qui sum mpg, detail
    local median `r(p50)'
    
    twoway (line y x, sort) ///
           (scatteri .015 `median' "`median' is the median")  ///
           , legend(off)
    Here, .015 is of course just a stupid guess to illustrate my point, I want the label to be here instead (as illustrated in MS Paint):

    Click image for larger version

Name:	Graph.jpg
Views:	1
Size:	12.5 KB
ID:	1680830


    Is there a way to do that not in MS Paint?

    Bonus question: I want to add a vertical line to the plot:

    Code:
    twoway (line y x, sort) ///
           (scatteri .015 `median' "`median' is the median")  ///
           , legend(off) xline(`median', lcolor(gs6))
    Is there a way I can make the vertical line start on the x-axis and end at the density curve?

    Thanks so much
    Go

  • #2
    Code:
    sysuse auto, clear
    
    kdensity mpg, gen(x y) at(mpg) nograph
    
    // Label density curve
    qui sum mpg, detail
    local median `r(p50)'
    
    levelsof y if x==`median',local(ymed)
    
    twoway (line y x, sort) ///
           (scatteri `ymed' `median' "`median' is the median", msym(circle))  (scatteri `ymed' `median' 0 `median', lcolor(gs6) recast(line)) ///
           , legend(off)

    Comment


    • #3
      @Ali Atia's trick works because and only because the median is also a value occurring in the data, which need not be the case.

      In general, the number of values is either even or odd.

      If the number is odd, the median should also occur in the data.

      In the auto data, we have 74 values of mpg, an even number, so by the usual convention the median is taken to be half-way between the 37th and 38th ranked values. Here they are both 20, so the median is also a value that occurs in the data.

      However, that coincidence is not guaranteed. For example, the method falls over with
      price in the auto data, for which there are 74 distinct values, so that the median doesn't also occur as a value in the data.

      This method seems to work well enough in such cases, although in reverse I suspect that it can fall over given extensive ties in the data. The good news, I think, is that if this method doesn't work then Ali's method should.


      Code:
      sysuse auto, clear
      
      local outcome price 
      
      kdensity `outcome', gen(x y) at(`outcome') nograph
      
      // Label density curve
      qui sum `outcome', detail
      local median `r(p50)'
      count if `outcome' < . 
      local N = r(N)
      egen rank = rank(`outcome')
      su y if inrange(rank, `N'/2, `N'/2 + 1), meanonly 
      local ymed = r(mean)
      
      twoway (line y x, sort) ///
             (scatteri `ymed' `median' "`median' is the median", msym(circle))  (scatteri `ymed' `median' 0 `median', lcolor(gs6) recast(line)) ///
             , legend(off)

      Comment


      • #4
        Just a word of caution here: while this doesn't happen in this data, you can end up with a median that is not one of the values that the variable takes. This code would then fail. It might be simpler to just mark a vertical line at the median value, and not worry about positioning its label and marker at the corresponding density value.

        Edit: This got cross-posted with #3. I mean to refer to Ali's code in #2.
        Last edited by Hemanshu Kumar; 05 Sep 2022, 23:49.

        Comment


        • #5
          Here is an approach which should be robust to the issues mentioned in #3 and #4.

          Code:
          sysuse auto, clear
          local outcome gear_ratio
          kdensity `outcome', gen(x y) at(`outcome') nograph
          
          // Label density curve
          qui sum `outcome', detail
          local median `r(p50)'
          
          insobs 1
          replace x = `median' in `=_N'
          ipolate y x, gen(z)
          local ymed = z[_N]
          
          twoway (line y x, sort) ///
                 (scatteri `ymed' `median' "`median' is the median", msym(circle))  (scatteri `ymed' `median' 0 `median', lcolor(gs6) recast(line)) ///
                 , legend(off)
          Below is the output for all numeric variables in the auto dataset:

          Click image for larger version

Name:	5.png
Views:	1
Size:	100.5 KB
ID:	1680941
          Last edited by Ali Atia; 06 Sep 2022, 09:58.

          Comment

          Working...
          X