Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a stacked area chart

    Hello,
    being pretty new on Stata (version 17), I would appreciate your help.
    I have a dataset with british total and "specific" (customs, stamp duties...) receipts from 1700 to 1900, here there is a sample with the first 25 observations:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int anno float(perc_customs_introiti perc_accise_introiti perc_stamp_duty_introiti perc_tasse_fondiarie_introiti perc_altre_tasse_introiti perc_tasse_redd_ricc_introiti)
    1700 36.616867  24.26046 2.1486816 32.379025 4.5949645 .
    1701  38.43966  26.91642 2.3244312  29.33358  2.985905 .
    1702  29.66581  29.44268 1.8395478 36.047203   3.00476 .
    1703  28.45495 31.285954  1.635186 35.222176   3.40173 .
    1704 26.879017  31.56375 1.7043867 36.355595 3.4972525 .
    1705 21.743856 33.568054 1.5973535  39.01701  4.073724 .
    1706 24.288326  31.84824 1.6179712  38.53585  3.709609 .
    1707  24.56408 31.992044  1.637297   37.3942  4.412377 .
    1708 23.734997 31.766684 1.7906865  38.43015 4.2774844 .
    1709 25.174925  30.00575 1.8211445  39.49966 3.4985144 .
    1710 24.485016 30.191654  1.868757  39.91779  3.536778 .
    1711  22.58868 32.060135  2.283298  40.13155   2.93634 .
    1712  25.49948  32.59208 3.0055594 36.223072 2.6798124 .
    1713  25.91516  36.66358 1.9294304 30.575745  4.916083 .
    1714  29.96764   39.1632 2.2792418  23.09755  5.492372 .
    1715  29.51447  41.76304  2.501912  21.39675  4.823831 .
    1716   27.2313  40.71797 2.2914875  26.17369 3.5855546 .
    1717  27.90652 36.977215   2.21598 29.537296  3.362984 .
    1718 27.597136  39.89545  2.276095 26.531116 3.7001975 .
    1719 27.068563  40.32622 2.3933446 25.835007 4.3768697 .
    1720 25.939896  39.80259  2.680255  24.81242  6.764836 .
    1721  24.28369   42.2039 2.3404963 26.211895   4.96002 .
    1722  24.88647  44.21716  2.331956 24.338257 4.2261586 .
    1723 27.096666  46.19822   2.45453  20.54143  3.709161 .
    1724 28.278706  45.70643  2.611796 20.486275  2.916792 .
    end
    All variables, except for 'anno', are percentage value of total receipts, so they sum up to 100. The last variable (perc_tasse_redd_ricc_introiti) has missing value because there were no income tax in England prior to 1798. I would like to create a stacked area chart that shows the percentage composition of total receipts.
    I tried with the following code:
    Code:
    twoway (area perc_customs_introiti perc_accise_introiti perc_stamp_duty_introiti perc_tasse_fondiarie_introiti perc_tasse_redd_ricc_introiti anno)
    but area are overlapped and they don't sum up to 100, as you can see in the attached picture.

    Click image for larger version

Name:	forum.png
Views:	1
Size:	171.7 KB
ID:	1690143

    Can someone help me?

    Thank you.



  • #2
    Here's one way to do it:

    Code:
    local stubs accise stamp_duty tasse_fondiarie altre_tasse tasse_redd_ricc
    
    gen prev_var = perc_customs_introiti
     
    foreach stub of local stubs {
        egen cum_`stub' = rowtotal(prev_var perc_`stub'_introiti)
        replace prev_var = cum_`stub'
    }
    drop prev_var
    
    #delimit ;
    twoway area perc_customs_introiti anno, base(0) ||
            rarea perc_customs_introiti cum_accise anno ||
            rarea cum_accise cum_stamp_duty anno ||
            rarea cum_stamp_duty cum_tasse_fondiarie anno ||
            rarea cum_tasse_fondiarie cum_altre_tasse anno ||
            rarea cum_altre_tasse cum_tasse_redd_ricc anno,
            legend(label(1 "customs") label(2 "accise") label(3 "stamp duty")
                label(4 "tasse fondiarie") label(5 "altre tasse") label(6 "tasse redd ricc") )
            scheme(s2color)
            ;
    #delimit cr
    which produces:
    Click image for larger version

Name:	Screenshot 2022-11-20 at 1.41.30 AM.png
Views:	1
Size:	670.0 KB
ID:	1690149

    Last edited by Hemanshu Kumar; 19 Nov 2022, 13:13.

    Comment


    • #3

      Hemanshu Kumar's helpful answer leaves a key point implicit: your multiple calls to twoway area are just superimposing areas; it's obvious to you but not at all to Stata that you want the areas stacked.

      Although this design is easy to understand in principle, and it is quite popular too, it is very often ineffective. As the category values are calculated as percents, any graph that echoes that constraint is in that sense just telling you what you know already and reduces the scope to see fine structure beyond that. The legend that springs into existence is also a nuisance as well as essential with this design to show which variable is which.

      It is especially important to have a design that will work well for the larger dataset you have, reaching nearer the present.

      I often find that it is obvious on subject-matter grounds that different components have different levels, spreads, etc. That being so, sacrificing showing all values on the same scale is often no sacrifice at all.

      Conversely, if keeping comparability is important, a line graph with logarithmic or similar scale could be a better idea.

      Here I show your data example using multiline from SSC. As it's only a small fraction of your entire dataset. I haven't tried to improve on axis labels or to reproduce the variable labels that appear on your graph.

      I didn't do it here, but a reader who needs to be told that your variable is anno is out of their depth any way. When the horizontal axis variable is clearly a time or date, the variable name or label is redundant.

      Code:
      drop perc_tasse_redd_ricc_introiti
      set scheme s1color
      ssc desc multiline
      * ssc install multiline
      multiline per* anno

      Click image for larger version

Name:	poles.png
Views:	1
Size:	27.5 KB
ID:	1690183

      Last edited by Nick Cox; 20 Nov 2022, 06:04.

      Comment

      Working...
      X