Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating overlapped area plot

    Hello folks,

    I have three variables A, B, C and D. I am trying to create an area plot

    gen total= rowtotal( A B C),m
    gen A_share=A/total
    gen B_share=B/total
    gen C_share=B/total

    D is a continuous variable ranging from 0 to 100

    I am trying to create area plot for how the share of the three variable evolves with D.

    I could not figure out the best way to do this. The closest I have come is creating cumulative variables as follows and then maybe somehow overlap the curves

    gen B_share_cuml= A_share+B_share
    gen C_share_cuml= Bshare_cuml+C_share

    Note C_share_cuml is 1 throughout.

    Then, I created line charts using:
    twoway lowess A_share D
    twoway lowess B_share_cum D
    twoway lowess C_share_cuml D

    But I cannot figure out the way to combine these three lines in the same chart..


    I also used two way area plot but the chart did not come out any where close to a smooth area plot.

    twoway area A_share D


    Any suggestions would be hugely appreciated. Thanks in advance.



  • #2
    A stacked area chart is a design that is popular but in my view over-rated. If you scale to total 1, then the graph echoes the structure you imposed. More importantly, it isn't always easy -- or even ever easy -- to look at or for fine structure, especially for categories not at the top or bottom of the chart, where visual subtraction of two irregular curves is needed to assess variation.

    That prejudice stated, there are small typos in your code that may distract rapid readers. rowtotal() requires egen not generate and will fail otherwise.

    For a cumulative area plot there are indeed -- so far as I know -- no real short-cuts. I would have written a wrapper for this long ago if I thought it was a good idea, but I don't, sorry, and in any case it seems that no-one else has either.

    The first area to be plotted starts at 0 and the last area to be plotted ends at 1. The easiest recipe I know is exemplified below. In the case of three categories you only need to calculate fraction of A + fraction of B.

    I also show two simple alternatives. I didn't go as far as smoothing. My guess is that even if

    A + B + C is scaled to total 1

    it doesn't exactly follow that

    smooth(A) + smooth(B) + smooth(C)

    will total 1, although that may be roughly true.


    Code:
    clear
    set scheme s1color
    
    set obs 101
    set seed 314159265
    
    gen D = _n - 50
    gen A = 3 + 0.01 * D + rnormal(0, 0.1)
    gen B = 4 + rnormal(0, 0.1)
    gen C = 3 - 0.01 * D + rnormal(0, 0.1)
    
    foreach v in A B C {
        gen p`v' = `v' / (A + B + C)
    }
    
    gen pAB = pA + pB
    gen one = 1
    
    tokenize `" "230 159 0" "86 180 233" "0 158 115" "'
    
    twoway area pA D, base(0) color(`"`1'"') || rarea pA pAB D, color(`"`2'"') || rarea pAB one D, color(`"`3'"') legend(order(3 "C" 2 "B" 1 "A") pos(3) col(1)) name(G1, replace) xla(-50(10)50)
    
    * ssc install multiline
    multiline p? D , name(G2, replace) xla(-50(10)50)
    
    line p? D, lc("`1'" "`2'" "`3'") legend(order(2 "B" 3 "C" 1 "A") pos(3) col(1)) name(G3, replace)
    Click image for larger version

Name:	stack1.png
Views:	1
Size:	28.0 KB
ID:	1692120


    This one uses multiline which must be installed from SSC before you can use it.
    Click image for larger version

Name:	stack2.png
Views:	1
Size:	40.1 KB
ID:	1692121


    Click image for larger version

Name:	stack3.png
Views:	1
Size:	55.4 KB
ID:	1692122

    Comment


    • #3
      Thanks Nick for your reply.

      I tried to replicate your code as follows but the results were quite messy.

      gen hosp_cuml=drug_share1+hosp_share1
      gen other_cuml=hosp_cuml+ other_share1
      gen tot_cuml=1

      tokenize `" "230 159 0" "86 180 233" "0 158 115" "'

      twoway area drug_share1 he_share_income1 if he_share_income1<1, base(0) color(`"`1'"') || rarea drug_share1 hosp_cuml he_share_income1 if he_share_income1<1, color(`"`2'"') || rarea hosp_cuml tot_cuml he_share_income1 if he_share_income1<1, color(`"`3'"') legend(order(3 "Other costs" 2 "Hospital charges" 1 "Drug cost") pos(3) col(1)) name(G1, replace) xla(0(.1)1)

      No clue on where I am going wrong. Also, any way I can smooth the area plot similar to the lowess command?
      Click image for larger version

Name:	G1.png
Views:	1
Size:	1.48 MB
ID:	1692170

      Comment


      • #4
        Try sorting the data by he_share_income1 before plotting?

        Comment


        • #5
          Thanks Himansu. This is a huge progress.. Any idea I can smoothen the area plot with similar to what is done by lowess command?
          Click image for larger version

Name:	G1.png
Views:	1
Size:	1.86 MB
ID:	1692178

          Comment


          • #6
            As you suggested in #1 you can smooth the proportions first with lowess or any other smoothing method of your choice. But before you do that even, it is possible that you have ties on your predictor, and if so you can take averages first.

            Code:
            foreach v in  drug hosp other { 
                egen `v'_share_mean = mean(`v'_share1), by(he_share_income1) 
            } 

            Comment

            Working...
            X