Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Superimposing median line/point in a twoway kdensity plot

    Hi there,

    I am having issues with a twoway kdensity plot on which I would like to superimpose either a line or a point representing the median (by country).

    I am working with Stata 18 and my dataset is a subset of the ESS (European Social Survey).

    Following what has been said in other posts on the topic I coded the following:

    Code:
     egen median_Redistribution_norm=median(Redistribution_norm), by(cntry) /// to create a variable containing the median value for my variable "Redistribution_norm" for each country
    Then, I am stuck with the following twoway kdensity code where I do not know to add the abovementioned value (let's say in the form of a line):

    Code:
     twoway kdensity Red, bw(0.2) title(Redistribution Policy) range(0 1) xlabel(0 "Left" 1 "Right") by(cntry)
    I previously tried to use the xline() graph option, yet it appears to me it is only worth it when you insert the exact value.

    Thanks in advance

    Mattia

  • #2
    You don't show a data example or phrase your question using an accessible dataset, but here is some technique with a standard dataset that may help.

    This requires two passes, the first to note a suitable upper limit for probability density on the graph and a second to add the medians.

    Code:
    sysuse auto, clear
    egen median = median(mpg), by(foreign)
    
    twoway kdensity mpg, by(foreign)
    * this shows that the highest density is close to 0.1; your density maximum may easily be quite different
    gen y1 = 0
    gen y2 = 0.1
    
    twoway kdensity mpg, by(foreign, legend(off) note("spikes are medians")) || rspike y1 y2 median, ytitle(Probability density) xtitle("`: var label mpg'")
    As you imply, xline() would be of limited use here. One reason is that you may want a different line for each graph. Another, which doesn't bite so hard here, but can bite with other graph types, is that Stata wants to put added lines before it plots the data.
    Click image for larger version

Name:	density_median.png
Views:	1
Size:	35.3 KB
ID:	1731862
    EEDIT Compare the problem of recession shading as discussed e.g. in https://journals.sagepub.com/doi/pdf...867X1601600315
    Last edited by Nick Cox; 28 Oct 2023, 09:10.

    Comment


    • #3
      Dear Nick,

      sorry for the late reply. Moreover, I apologize for the lack of dataset example. Here's one:

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input byte essround str2 cntry float Redistribution_norm
      9 "AT"   0
      9 "AT" .25
      9 "AT"   0
      9 "AT"   0
      9 "AT" .25
      9 "AT"   0
      9 "AT"   0
      9 "AT" .25
      9 "AT" .25
      9 "AT" .75
      9 "AT" .25
      9 "AT" .25
      9 "AT" .75
      9 "AT" .25
      9 "AT" .25
      9 "AT"   0
      9 "AT"  .5
      9 "AT"   0
      9 "AT" .25
      9 "AT" .75
      9 "AT" .25
      9 "AT" .25
      9 "AT"   0
      9 "AT"   0
      9 "AT"   0
      9 "AT" .25
      9 "AT" .25
      9 "AT" .25
      9 "AT" .25
      9 "AT" .25
      9 "AT"   0
      9 "AT" .25
      9 "AT" .25
      9 "AT"   1
      9 "AT"   0
      9 "AT"   0
      9 "AT" .25
      9 "AT" .75
      9 "AT" .25
      9 "AT"   0
      9 "AT" .75
      9 "AT"  .5
      9 "AT" .25
      9 "AT" .25
      9 "AT" .25
      9 "AT"   0
      9 "AT" .25
      9 "AT"   0
      9 "AT"  .5
      9 "AT" .25
      9 "AT" .25
      9 "AT"   0
      9 "AT" .75
      9 "AT" .25
      9 "AT"   0
      9 "AT"   0
      9 "AT"   0
      9 "AT"   .
      9 "AT" .25
      9 "AT" .25
      9 "AT" .25
      9 "AT" .25
      9 "AT" .25
      9 "AT" .25
      9 "AT" .25
      9 "AT" .25
      9 "AT" .25
      9 "AT"   0
      9 "AT" .25
      9 "AT"   0
      9 "AT"   1
      9 "AT"   1
      9 "AT"   0
      9 "AT"   .
      9 "AT" .25
      9 "AT" .25
      9 "AT" .25
      9 "AT" .25
      9 "AT" .25
      9 "AT"   0
      9 "AT" .25
      9 "AT" .75
      9 "AT" .25
      9 "AT"   0
      9 "AT" .25
      9 "AT"   0
      9 "AT"   0
      9 "AT" .25
      9 "AT"   0
      9 "AT" .25
      9 "AT" .25
      9 "AT" .25
      9 "AT"  .5
      9 "AT" .25
      9 "AT" .25
      9 "AT"  .5
      9 "AT" .25
      9 "AT" .25
      9 "AT" .25
      9 "AT" .75
      end
      label values Redistribution_norm Redistribution_std1
      label def Redistribution_std1 0 "0. Left", modify
      label def Redistribution_std1 1 "1. Right", modify
      Your suggestion worked perfectly. I would be curious to know whether the same is possible by showing a dot for the median rather than a line.

      Sincerely
      Mattia

      Comment


      • #4
        Dear Nick,

        sorry to bother you again.

        While the passes you suggestion worked perfectly for generating different graphs for each country, I encountered some issues when trying to generate single country's graphs. Here's the variables I used, followed by the codes and graphs:

        Dataset

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input str2 cntry float(Redistribution_norm median_Redistribution_norm lowerlimit_Red_median upperlimit_Red_median)
        "AT"   0 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT"   0 .25 0 1.7
        "AT"   0 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT"   0 .25 0 1.7
        "AT"   0 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT" .75 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT" .75 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT"   0 .25 0 1.7
        "AT"  .5 .25 0 1.7
        "AT"   0 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT" .75 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT"   0 .25 0 1.7
        "AT"   0 .25 0 1.7
        "AT"   0 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT"   0 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT"   1 .25 0 1.7
        "AT"   0 .25 0 1.7
        "AT"   0 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT" .75 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT"   0 .25 0 1.7
        "AT" .75 .25 0 1.7
        "AT"  .5 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT"   0 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT"   0 .25 0 1.7
        "AT"  .5 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT"   0 .25 0 1.7
        "AT" .75 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT"   0 .25 0 1.7
        "AT"   0 .25 0 1.7
        "AT"   0 .25 0 1.7
        "AT"   . .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT"   0 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT"   0 .25 0 1.7
        "AT"   1 .25 0 1.7
        "AT"   1 .25 0 1.7
        "AT"   0 .25 0 1.7
        "AT"   . .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT"   0 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT" .75 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT"   0 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT"   0 .25 0 1.7
        "AT"   0 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT"   0 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT"  .5 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT"  .5 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT" .25 .25 0 1.7
        "AT" .75 .25 0 1.7
        end
        label values Redistribution_norm Redistribution_std1
        label def Redistribution_std1 0 "0. Left", modify
        label def Redistribution_std1 1 "1. Right", modify
        Code:

        Code:
        // add median lines
        gen lowerlimit_Red_median= 0 //lower limit
        gen upperlimit_Red_median = 1.7 //upperlimit
        
        // generating a kdensity plot for Austria & then Portugal
        
        twoway kdensity Red if cntry=="AT", bw(0.2) title(Redistribution Policy) range(0 1) xlabel(0(0.25)1) || rspike lowerlimit_Red_median upperlimit_Red_median median_Redistribution_norm, ytitle(Probability density) xtitle("`: var label Red'")
        
        twoway kdensity Red if cntry=="PL", bw(0.2) title(Redistribution Policy) range(0 1) xlabel(0(0.25)1)|| rspike lowerlimit_Red_median upperlimit_Red_median median_Redistribution_norm, ytitle(Probability density) xtitle("`: var label Red'")
        Graphs (here things get "weird"):

        Click image for larger version

Name:	Graph AT ESS.png
Views:	1
Size:	37.3 KB
ID:	1731917

        Click image for larger version

Name:	Graph PL ESS.png
Views:	1
Size:	32.5 KB
ID:	1731918



        I can't understand if it relates to an issue in the code or not.

        Sincerely
        Mattia
        Last edited by Mattia Gatti; 29 Oct 2023, 05:08.

        Comment


        • #5
          Surely, as you just need to change the command to plot the median. Let's suppose that we plot the median with a vertical position of 0.

          Your data example shows one country, but the code should be good for all. But for that country the observed values are just 0(0.25)1 and the median can only be one of those or halfway between two of those (i.e. one of 0.125, 0.375, ...).

          Whether that is typical and kdensity is a good idea is over to you.

          Note that whenever there is data for the extremes 0 and 1, half the contribution of each of those values to the density estimate is not plotted. (I am guessing that values outside [0, 1] don't make sense, but the kernel density estimation isn't adapted to do anything special at the extremes of the data.)

          I added a rug plot, meaning a marginal plot showing the distinct observed values.

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input byte essround str2 cntry float Redistribution_norm
          9 "AT"   0
          9 "AT" .25
          9 "AT"   0
          9 "AT"   0
          9 "AT" .25
          9 "AT"   0
          9 "AT"   0
          9 "AT" .25
          9 "AT" .25
          9 "AT" .75
          9 "AT" .25
          9 "AT" .25
          9 "AT" .75
          9 "AT" .25
          9 "AT" .25
          9 "AT"   0
          9 "AT"  .5
          9 "AT"   0
          9 "AT" .25
          9 "AT" .75
          9 "AT" .25
          9 "AT" .25
          9 "AT"   0
          9 "AT"   0
          9 "AT"   0
          9 "AT" .25
          9 "AT" .25
          9 "AT" .25
          9 "AT" .25
          9 "AT" .25
          9 "AT"   0
          9 "AT" .25
          9 "AT" .25
          9 "AT"   1
          9 "AT"   0
          9 "AT"   0
          9 "AT" .25
          9 "AT" .75
          9 "AT" .25
          9 "AT"   0
          9 "AT" .75
          9 "AT"  .5
          9 "AT" .25
          9 "AT" .25
          9 "AT" .25
          9 "AT"   0
          9 "AT" .25
          9 "AT"   0
          9 "AT"  .5
          9 "AT" .25
          9 "AT" .25
          9 "AT"   0
          9 "AT" .75
          9 "AT" .25
          9 "AT"   0
          9 "AT"   0
          9 "AT"   0
          9 "AT"   .
          9 "AT" .25
          9 "AT" .25
          9 "AT" .25
          9 "AT" .25
          9 "AT" .25
          9 "AT" .25
          9 "AT" .25
          9 "AT" .25
          9 "AT" .25
          9 "AT"   0
          9 "AT" .25
          9 "AT"   0
          9 "AT"   1
          9 "AT"   1
          9 "AT"   0
          9 "AT"   .
          9 "AT" .25
          9 "AT" .25
          9 "AT" .25
          9 "AT" .25
          9 "AT" .25
          9 "AT"   0
          9 "AT" .25
          9 "AT" .75
          9 "AT" .25
          9 "AT"   0
          9 "AT" .25
          9 "AT"   0
          9 "AT"   0
          9 "AT" .25
          9 "AT"   0
          9 "AT" .25
          9 "AT" .25
          9 "AT" .25
          9 "AT"  .5
          9 "AT" .25
          9 "AT" .25
          9 "AT"  .5
          9 "AT" .25
          9 "AT" .25
          9 "AT" .25
          9 "AT" .75
          end
          label values Redistribution_norm Redistribution_std1
          label def Redistribution_std1 0 "0. Left", modify
          label def Redistribution_std1 1 "1. Right", modify
          
          egen median = median(Redistribution_norm), by(cntry)
          
          gen y1 = 0
          gen y2 = -0.1
          
          twoway kdensity Redistribution_norm, by(cntry, legend(off) note("dots are medians")) || scatter y1 median, ytitle(Probability density) xtitle("some better text needed here") || scatter y2 Redistribution_norm, ms(|) msize(*2)
          Click image for larger version

Name:	density_median2.png
Views:	1
Size:	29.1 KB
ID:	1731916

          Last edited by Nick Cox; 29 Oct 2023, 05:11.

          Comment


          • #6
            Dear Nick,

            the trick works perfectly, thanks a lot. I can now use both the line and the dot.

            Unfortunately (as it happened for the line) I still get a very similar issue when I try to generate the graph for single countries. In this case, not two lines, but 2 dots.

            CODE:
            Code:
            twoway kdensity Redistribution_norm if cntry=="AT", bw(0.2) || scatter y1  median_Redistribution_nor, ytitle(Probability density) xtitle("Redistribution")
            Click image for larger version

Name:	Graph AT ESS scatter median.png
Views:	1
Size:	34.8 KB
ID:	1731939


            Sincerely
            Mattia

            PS. The height of the dot for the median was purposively changed to y1=1.

            Comment


            • #7
              You need the by() option to avoid multiple dots, I think.

              Comment


              • #8
                Reverting to #2 it's a little simpler to use spike rather than rspike.

                Code:
                sysuse auto, clear
                egen median = median(mpg), by(foreign)
                
                twoway kdensity mpg, by(foreign)
                * this shows that the highest density is close to 0.1; your density maximum may easily be quite different
                
                gen y2 = 0.1
                
                twoway kdensity mpg, by(foreign, legend(off) note("spikes are medians")) || spike y2 median, ytitle(Probability density) xtitle("`: var label mpg'")

                Comment


                • #9
                  Dear Nick,

                  thanks a lot for the insights on the spike function. As for the issue of the two dots/spikes, I see no solution rather than generating multiple graphs by(cntry).

                  Sincerely

                  Mattia

                  Comment


                  • #10
                    I thought that was what you wanted to do! But if you want just one country, the if condition must appear throughout.

                    Code:
                     
                     twoway kdensity Redistribution_norm if cntry=="AT", bw(0.2) || scatter y1  median_Redistribution_nor if cntry == "AT", ytitle(Probability density) xtitle("Redistribution")

                    Comment


                    • #11
                      Thanks a lot, amazing! Now it works also on single countries.

                      Best
                      Mattia

                      Comment

                      Working...
                      X