Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • twoway: colored areas under kdensity-plot - but different colors by interval on x-axis

    Hello Listers,

    I hope this is not redundant but I've encountered a problem for which I could not find a solution, yet.
    Here is what I want to do:
    I want a kdensity plot and split it up by vertical lines that mark the quartiles of the distribution - so far so good.

    But here comes where I get stuck:
    I would like the area under the density plot to be shaded in different colors for each quartile, i.e. a different color for different intervals on the x-axis.
    I thought about using multiple kdensity plots with if-conditions for the xaxis, but of course then Stata will plot the density within the quartiles rather just a subinterval of the overall distribution (see Second try in code) -- this is not what I am after.

    The desired end result should look something like this:

    Click image for larger version

Name:	example.png
Views:	1
Size:	17.7 KB
ID:	1682080


    I made a little toy example here:
    Code:
    sysuse auto, clear
    
    sum weight, d
    local min = r(min)
    local p25 = r(p25)
    local p50 = r(p50)
    local p75 = r(p75)
    local max = r(max)
    
    local g1 = `min' + (`p25'-`min')/2
    local g2 = `p25' + (`p50'-`p25')/2
    local g3 = `p50' + (`p75'-`p50')/2
    local g4 = `p75' + (`max'-`p75')/2
    
    * First try:
    #delimit ;
    twoway (kdensity weight, recast(area) col(%50))
           (scatteri 0.00015 `g1' "G1", msym(none) mlabpos(12))
           (scatteri 0.00015 `g2' "G2", msym(none) mlabpos(12))
           (scatteri 0.00015 `g3' "G3", msym(none) mlabpos(12))
           (scatteri 0.00015 4000 "G4", msym(none) mlabpos(12))
           ,
           xline(`p25' `p50' `p75')
           legend(off)
           name(first_try,replace)
           ;
    #delimit ct
    
    * Second try:
    #delimit ;
    twoway (kdensity weight, recast(area) col(%50))
           (kdensity weight if inrange(weight, `min',`p25'), recast(area) col(%50))
           (kdensity weight if inrange(weight, `p25',`p50'), recast(area) col(%50))
           (scatteri 0.00015 `g1' "G1", msym(none) mlabpos(12))
           (scatteri 0.00015 `g2' "G2", msym(none) mlabpos(12))
           (scatteri 0.00015 `g3' "G3", msym(none) mlabpos(12))
           (scatteri 0.00015 4000 "G4", msym(none) mlabpos(12))
           ,
           xline(`p25' `p50' `p75')
           legend(off)
           name(second_try,replace)
           ;
    #delimit ct

  • #2
    In principle you need to calculate the density -- using kdensity not twoway kdensity -- and generate the density as a new variable -- and then, and only then, show segments of the density trace using different colours.

    In practice I had a couple of goes at this and couldn't avoid awkward overlaps or offlaps, presumably because the quartiles aren't necessarily values in the data. I stopped there. Interpolating extra values for graphical purposes might be needed.
    Last edited by Nick Cox; 15 Sep 2022, 11:19.

    Comment


    • #3
      Hi Prof Nick

      I am trying to estimate the following model.
      Y1= aX1 + bX2+ E.
      Y1 is a binary variable.
      X2 is also a binary variable.
      X2 is endogenous.
      I also have a sample selection problem.
      Y1 is the employment status in 2011 for all those who are already employed in 2005. Y1 is a binary variable, it takes value 1 if a person is employed in 2011 and takes value 0 if a person is not employed in 2011.

      My query:
      I am not able to figure out how to build this model in stata.
      The commands that I have come across so far:
      ivregress: I cannot use it as my dependent variable is binary.
      ivprobit: I cannot use it as my endogenous variable is binary, IVPROBIT assumes that the endogenous variable is continuous.
      If I use the heckprobit command, It would take care of the sample selection problem. But how do I correct for endogenity with sample selection.
      Is there any other command that I am not aware of in this discussion.


      Thank you
      PS: Sorry for diverting the discussion.

      Comment


      • #4
        Here is something that may work for you

        Code:
         webuse nlswork, clear
         bysort ln_wage:gen n=_n
         gen lnwage2=ln_wage if n==1
        * Extra but not needed
        ssc install palettes
        ssc install color_style
        set scheme white2
        color_style bay
        sum ln_wage,d
        local q25 = r(p25)
        local q50 = r(p50)
        local q75 = r(p75)
        two (area fx lnwage2 if lnwage2<`q25')  ///
            (area fx lnwage2 if lnwage2>=`q25' & lnwage2<`q50')  ///    
            (area fx lnwage2 if lnwage2>=`q50' & lnwage2<`q75')  ///    
            (area fx lnwage2 if lnwage2>=`q75' )  , legend(off)
        Last edited by FernandoRios; 15 Sep 2022, 11:30.

        Comment


        • #5
          Using Ben Jann's -colorpalette- ( Jann, B. (2018). Color palettes for Stata graphics. The Stata Journal 18(4): 765-785)

          Code:
          sysuse auto, clear
          sum weight, d
          local min = r(min)
          local p25 = r(p25)
          local p50 = r(p50)
          local p75 = r(p75)
          local max = r(max)
          
          set obs 10000
          kdensity weight, gen( x d) nograph n(10000)
          
          gen g = 1 if x <= `p25'
          replace g = 2 if x > `p25' & x < `p50'
          replace g = 3 if x > `p50' & x < `p75'
          replace g = 4 if x > `p75' 
          
          colorpalette viridis, n(4) nograph
           
          line d x if g == 1 , recast(area) color("`=r(p1)'")  /// 
              || line d x if g == 2 , recast(area) color("`=r(p2)'") /// 
              || line d x if g == 3 , recast(area) color("`=r(p3)'") /// 
              || line d x if g == 4 , recast(area) color("`=r(p4)'")  /// 
              || , legend(off) xlabel(2000(1000)5000) xscale(range(1400 5200))
          Click image for larger version

Name:	Graph.png
Views:	1
Size:	44.5 KB
ID:	1682101

          Comment


          • #6
            Originally posted by FernandoRios View Post
            Here is something that may work for you

            Code:
            webuse nlswork, clear
            bysort ln_wage:gen n=_n
            gen lnwage2=ln_wage if n==1
            * Extra but not needed
            ssc install palettes
            ssc install color_style
            set scheme white2
            color_style bay
            sum ln_wage,d
            local q25 = r(p25)
            local q50 = r(p50)
            local q75 = r(p75)
            two (area fx lnwage2 if lnwage2<`q25') ///
            (area fx lnwage2 if lnwage2>=`q25' & lnwage2<`q50') ///
            (area fx lnwage2 if lnwage2>=`q50' & lnwage2<`q75') ///
            (area fx lnwage2 if lnwage2>=`q75' ) , legend(off)
            Hello Fernando,
            thank you for your reply! I must admit that I did not really study your code, but it did not run through because the variable fx that you use in the twoway command is not in the data.
            Did you maybe forget to include the respective gen command in your code?
            Thanks!
            Best
            Boris

            Comment


            • #7
              Originally posted by Nick Cox View Post
              In principle you need to calculate the density -- using kdensity not twoway kdensity -- and generate the density as a new variable -- and then, and only then, show segments of the density trace using different colours.

              In practice I had a couple of goes at this and couldn't avoid awkward overlaps or offlaps, presumably because the quartiles aren't necessarily values in the data. I stopped there. Interpolating extra values for graphical purposes might be needed.
              Thank you, Nick!
              I thought about that, too. But then I did not know the kdensity command, so thank you for pointing this out!
              Best
              Boris

              Comment


              • #8
                Originally posted by Scott Merryman View Post
                Using Ben Jann's -colorpalette- ( Jann, B. (2018). Color palettes for Stata graphics. The Stata Journal 18(4): 765-785)

                Code:
                sysuse auto, clear
                sum weight, d
                local min = r(min)
                local p25 = r(p25)
                local p50 = r(p50)
                local p75 = r(p75)
                local max = r(max)
                
                set obs 10000
                kdensity weight, gen( x d) nograph n(10000)
                
                gen g = 1 if x <= `p25'
                replace g = 2 if x > `p25' & x < `p50'
                replace g = 3 if x > `p50' & x < `p75'
                replace g = 4 if x > `p75'
                
                colorpalette viridis, n(4) nograph
                
                line d x if g == 1 , recast(area) color("`=r(p1)'") ///
                || line d x if g == 2 , recast(area) color("`=r(p2)'") ///
                || line d x if g == 3 , recast(area) color("`=r(p3)'") ///
                || line d x if g == 4 , recast(area) color("`=r(p4)'") ///
                || , legend(off) xlabel(2000(1000)5000) xscale(range(1400 5200))
                [ATTACH=CONFIG]n1682101[/ATTACH]
                Hello Scott,
                I guess this is the practical application of what Nick also suggested and it works out fine in this case - even the color scheme you chose is the one I will be using
                Thank you!
                Best
                Boris

                Comment


                • #9
                  Hi Boris, absolutely Right, the code should have been

                  Code:
                  webuse nlswork, clear
                  bysort ln_wage:gen n=_n
                  gen lnwage2=ln_wage if n==1
                  * Extra but not needed
                  ssc install palettes
                  ssc install color_style
                  set scheme white2
                  color_style bay
                  sum ln_wage,d
                  local q25 = r(p25)
                  local q50 = r(p50)
                  local q75 = r(p75)
                  ** _> This line was missing
                  kdensity lnwage , gen(fx) at(lnwage2) nograph
                  two (area fx lnwage2 if lnwage2<`q25') ///
                  (area fx lnwage2 if lnwage2>=`q25' & lnwage2<`q50') ///
                  (area fx lnwage2 if lnwage2>=`q50' & lnwage2<`q75') ///
                  (area fx lnwage2 if lnwage2>=`q75' ) , legend(off)

                  Comment


                  • #10
                    Hi Fernando,
                    thanks for your reply!
                    Now I see, its a similar approach as taken by Scott.
                    Only Scott added artificial observation to smooth the kernel distribution and avoid gaps between the quartile-values (if I understood correctly), whereas you ranked the existing x-values and estimate and plot the density-value only for the "first" data point of each x-value.

                    Thank you for your help and pointing out the different options on how to apply the kdensity command for this purpose!

                    Best
                    Boris

                    Oh, btw there was a minor typo in the code line Fernando added (underscore added):

                    Code:
                    webuse nlswork, clear
                    bysort ln_wage:gen n=_n
                    gen lnwage2=ln_wage if n==1
                    * Extra but not needed
                    ssc install palettes
                    ssc install color_style
                    set scheme white2
                    color_style bay
                    sum ln_wage,d
                    local q25 = r(p25)
                    local q50 = r(p50)
                    local q75 = r(p75)
                    ** _> This line was missing
                    kdensity ln_wage , gen(fx) at(lnwage2) nograph
                    two (area fx lnwage2 if lnwage2<`q25') ///
                    (area fx lnwage2 if lnwage2>=`q25' & lnwage2<`q50') ///
                    (area fx lnwage2 if lnwage2>=`q50' & lnwage2<`q75') ///
                    (area fx lnwage2 if lnwage2>=`q75' ) , legend(off)

                    Comment

                    Working...
                    X