Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a histogram that is both grouped and stacked, with another variable that is represented by a line

    Hello Stata people;

    I'm using the version 13.1 os Stata while working with this dataset:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int anne float(impotssurlessocitsptroliresen droitsdedouaneen impotssurlessocitsnonptroliresen droitsdeconsommationen diversimpotsetdroitsen taxesurvaleurajouteen impotsurlerevenuen) long totaldesrecettesfiscalesenmillio
    2022   4   5  8.2 10.2 15.5 28.7 28.4 35448
    2023 2.5 4.8  9.4 10.2 16.8 27.7 28.6 39200
    2024   3   5   10  9.5 15.9 26.9 29.6 41755
    2025 1.5   5 12.6  9.6 15.6   27 28.7 44523
    2026 1.3   5 12.9  9.6 15.4   27 28.5 47773
    end
    format %ty anne
    This data shows the evolution and structure of tax revenues for a given economy. The first 7 variables are expressed in %, the 8th is expressed in monetary units. The observations are annual.

    The goal is to draw a histogram that is both grouped by year and stacked for the first 7 variables, but for the last variable (expressed in monetary units), I want it drawn as a line, all in the same one graph.

    Is there a solution to this please?

    With many thanks!

  • #2
    Here is your requested graph, but there are better ways to visualize these data. You could, for example, place the line graph in a separate panel below the bar graphs. Additionally, the stacked design may not be ideal with so many categories, as readers would need to move back and forth between the legend and the graph. See tabplot (the Stata Journal) for a design that is more readable and visually appealing. Finally, you should work on shortening your variable names.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int anne float(impotssurlessocitsptroliresen droitsdedouaneen impotssurlessocitsnonptroliresen droitsdeconsommationen diversimpotsetdroitsen taxesurvaleurajouteen impotsurlerevenuen) long totaldesrecettesfiscalesenmillio
    2022   4   5  8.2 10.2 15.5 28.7 28.4 35448
    2023 2.5 4.8  9.4 10.2 16.8 27.7 28.6 39200
    2024   3   5   10  9.5 15.9 26.9 29.6 41755
    2025 1.5   5 12.6  9.6 15.6   27 28.7 44523
    2026 1.3   5 12.9  9.6 15.4   27 28.5 47773
    end
    format %ty anne
    
    reshape long @en, i(anne) j(which) string
    bys anne (which): gen cumul= sum(en)
    separate cumul, by(which) veryshortlabel
    tw (bar cumul7 cumul6 cumul5 cumul4 cumul3 cumul2 cumul1 anne) ///
       (line total anne, yaxis(2)), xtitle("") ytitle(Percent) ///
       ysc(r(0, 100))  xlab(, noticks) ylab(0(20)100) ///
       plotregion(margin(zero))
    Click image for larger version

Name:	Graph.png
Views:	1
Size:	63.1 KB
ID:	1786124

    Comment


    • #3
      Andrew Musau Thanks for the help... Yet, could it be possible to even better the graph? Like, for example, is it possible to have the percentages of each of the first seven categories inside each bar, and then have the last variable (the one expressed in monetary units) represented wih a line on top of all the bars, with the Y axis representing it (the last variable in monetary units), and not the percentages of the other seven variables (since we would have the values of those inside each bar)? Is that possible? I think it could better the graph even more...

      I hope my idea is well expressed...

      With many thanks!

      Comment


      • #4
        Andrew Musau I kind of got the first part of the solution with this command: "graph bar impotssurlessocitsptroliresen droitsdedouaneen impotssurlessocitsnonptroliresen droitsdeconsommationen diversimpotsetdroitsen taxesurvaleurajouteen impotsurlerevenuen , over( anne ) stack", so that got me the stacked bars grph for the first 7 variables, yet I just want to add, on the same graph, the last variable as a line.

        Comment


        • #5
          Unsurprisingly I agree with Andrew Musau. I am at a loss to know why you didn't try what he suggested.

          Here is an example of that.

          It is hard to imagine your graph being acceptable to any reader (unless you're the entire readership) without work on the category titles and without some consideration of the best ordering of categories. Perhaps the existing order makes full sense, in which case go ahead with it. Here I sorted categories by the median percent across years.

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input int anne float(impotssurlessocitsptroliresen droitsdedouaneen impotssurlessocitsnonptroliresen droitsdeconsommationen diversimpotsetdroitsen taxesurvaleurajouteen impotsurlerevenuen) long totaldesrecettesfiscalesenmillio
          2022   4   5  8.2 10.2 15.5 28.7 28.4 35448
          2023 2.5 4.8  9.4 10.2 16.8 27.7 28.6 39200
          2024   3   5   10  9.5 15.9 26.9 29.6 41755
          2025 1.5   5 12.6  9.6 15.6   27 28.7 44523
          2026 1.3   5 12.9  9.6 15.4   27 28.5 47773
          end
          format %ty anne
          
          reshape long @en, i(anne) j(which) string
          
          myaxis whichd=which, sort(median en) descending
          
          tabplot whichd anne [iw=en], separate(whichd) showval xtitle("") ytitle("") subtitle("pour cent") name(Gt, replace)
          Click image for larger version

Name:	aziz5.png
Views:	1
Size:	57.5 KB
ID:	1786133


          tabplot and myaxis are from the Stata Journal. I can't test them with Stata 13, but I am optimistic that you can get them to work.

          I agree also that the total is just better plotted separately. What you ask is however programmable.

          Comment


          • #6
            Nick Cox Thanks for the suggestion of the code. I did try Andrew Musau code, and it did work for me, bit I wanted a better representation so that the graph could be easier to read. It's not that the suggestion of Andrew Musau was bad. Your suggestion Nick Cox makes the presentation even better. Yet, I still have the problem of drawing the totaldesrecettesfiscalesenmillio variable as a line, all within the same graph of the other 7 variables that are represented by bars (or perhaps, the totaldesrecettesfiscalesenmillio variable line could be presented in a separate sub-graph or something like that.

            I'm open for your suggestions Nick Cox

            With thanks

            Comment


            • #7
              As said I agree with Andrew Musau on the line graph of total amounts. See #2

              You could, for example, place the line graph in a separate panel below the bar graphs.
              Sorry, but I won't suggest code for what I think are bad ideas. That applies whether superimposing a line is your idea or you're under instruction to do this. More positively, you already know how to use twoway line, so there you go.
              Last edited by Nick Cox; 17 May 2026, 05:17.

              Comment


              • #8
                Click image for larger version

Name:	aziz6.png
Views:	1
Size:	57.2 KB
ID:	1786142
                This may help a little.

                The original variable names need to be edited down -- and punctuated. I have made a guess at some shorter names.

                Combining these graphs vertically will just produce a mess as the y axes and their text won't align nicely.

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input int anne float(impotssurlessocitsptroliresen droitsdedouaneen impotssurlessocitsnonptroliresen droitsdeconsommationen diversimpotsetdroitsen taxesurvaleurajouteen impotsurlerevenuen) long totaldesrecettesfiscalesenmillio
                2022   4   5  8.2 10.2 15.5 28.7 28.4 35448
                2023 2.5 4.8  9.4 10.2 16.8 27.7 28.6 39200
                2024   3   5   10  9.5 15.9 26.9 29.6 41755
                2025 1.5   5 12.6  9.6 15.6   27 28.7 44523
                2026 1.3   5 12.9  9.6 15.4   27 28.5 47773
                end
                format %ty anne 
                
                foreach v in impotssurlessocitsptroliresen droitsdedouaneen impotssurlessocitsnonptroliresen droitsdeconsommationen diversimpotsetdroitsen taxesurvaleurajouteen impotsurlerevenuen { 
                    local V : subinstr local v "impotssur" "", all
                    local V : subinstr local V "impotsur" "", all 
                    local V : subinstr local V "droitsde" "" , all 
                    local V : subinstr local V "impotsetdroits" "", all 
                    
                    rename `v' `V'
                }
                
                reshape long @en, i(anne) j(which) string
                
                myaxis whichd=which, sort(median en) descending
                
                tabplot whichd anne [iw=en], barw(0.8) separate(whichd) showval xtitle("") ytitle("") subtitle("pour cent") name(Gt, replace) 
                
                twoway connected total anne, name(Gl, replace) yla(36000(4000)48000) xtitle("")
                
                graph combine Gt Gl

                Comment


                • #9
                  Originally posted by Nick Cox View Post
                  Combining these graphs vertically will just produce a mess as the y axes and their text won't align nicely.
                  In such situations, I introduce a left margin to the graph region of the graph with the smallest margin. Here, a value of 20 works well. The x-axis of the line graph also needs some extension on both sides. The first combined graph shows the alignment, while the second pulls the labels from the graph in the top panel.

                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input int anne float(impotssurlessocitsptroliresen droitsdedouaneen impotssurlessocitsnonptroliresen droitsdeconsommationen diversimpotsetdroitsen taxesurvaleurajouteen impotsurlerevenuen) long totaldesrecettesfiscalesenmillio
                  2022   4   5  8.2 10.2 15.5 28.7 28.4 35448
                  2023 2.5 4.8  9.4 10.2 16.8 27.7 28.6 39200
                  2024   3   5   10  9.5 15.9 26.9 29.6 41755
                  2025 1.5   5 12.6  9.6 15.6   27 28.7 44523
                  2026 1.3   5 12.9  9.6 15.4   27 28.5 47773
                  end
                  format %ty anne
                  
                  foreach v in impotssurlessocitsptroliresen droitsdedouaneen impotssurlessocitsnonptroliresen droitsdeconsommationen diversimpotsetdroitsen taxesurvaleurajouteen impotsurlerevenuen {
                      local V : subinstr local v "impotssur" "", all
                      local V : subinstr local V "impotsur" "", all
                      local V : subinstr local V "droitsde" "" , all
                      local V : subinstr local V "impotsetdroits" "", all
                      
                      rename `v' `V'
                  }
                  
                  reshape long @en, i(anne) j(which) string
                  
                  myaxis whichd=which, sort(median en) descending
                  
                  tabplot whichd anne [iw=en], barw(0.8) separate(whichd) showval xtitle("") ytitle("") subtitle("pour cent") name(Gt, replace)
                  
                  twoway connected total anne, name(Gl, replace) yla(36000(4000)48000) xsc(r(2021.5 2026.5)) xtitle("") graphregion(margin(l=20))
                  
                  graph combine Gt Gl, col(1)
                  
                  tabplot whichd anne [iw=en], barw(0.8) separate(whichd) showval xtitle("") ytitle("") xlab("") subtitle("pour cent") name(Gt, replace)
                  
                  graph combine Gt Gl, col(1)
                  Click image for larger version

Name:	Graph.png
Views:	1
Size:	55.7 KB
ID:	1786144

                  Click image for larger version

Name:	Graph2.png
Views:	1
Size:	52.1 KB
ID:	1786145

                  Last edited by Andrew Musau; 18 May 2026, 06:12.

                  Comment


                  • #10
                    Andrew Musau We're both showing techniques rather than claiming to have produced the best graph. To that end, I will comment for Aziz Essouaied that the two graphs don't have to be equal size.

                    Comment


                    • #11
                      Correct. In #9, e.g., one could make the top panel twice as long as the bottom using the option -fysize()-.

                      Code:
                      * Example generated by -dataex-. To install: ssc install dataex
                      clear
                      input int anne float(impotssurlessocitsptroliresen droitsdedouaneen impotssurlessocitsnonptroliresen droitsdeconsommationen diversimpotsetdroitsen taxesurvaleurajouteen impotsurlerevenuen) long totaldesrecettesfiscalesenmillio
                      2022   4   5  8.2 10.2 15.5 28.7 28.4 35448
                      2023 2.5 4.8  9.4 10.2 16.8 27.7 28.6 39200
                      2024   3   5   10  9.5 15.9 26.9 29.6 41755
                      2025 1.5   5 12.6  9.6 15.6   27 28.7 44523
                      2026 1.3   5 12.9  9.6 15.4   27 28.5 47773
                      end
                      format %ty anne 
                      
                      foreach v in impotssurlessocitsptroliresen droitsdedouaneen impotssurlessocitsnonptroliresen droitsdeconsommationen diversimpotsetdroitsen taxesurvaleurajouteen impotsurlerevenuen { 
                          local V : subinstr local v "impotssur" "", all
                          local V : subinstr local V "impotsur" "", all 
                          local V : subinstr local V "droitsde" "" , all 
                          local V : subinstr local V "impotsetdroits" "", all 
                          
                          rename `v' `V'
                      }
                      
                      reshape long @en, i(anne) j(which) string
                      
                      myaxis whichd=which, sort(median en) descending
                      
                      twoway connected total anne, name(Gl, replace) yla(36000(4000)48000) xsc(r(2021.5 2026.5)) xtitle("") graphregion(margin(l=9)) fysize(50) ytitle(Whatever, orientation(horiz))
                      
                      tabplot whichd anne [iw=en], barw(0.75) separate(whichd) showval xtitle("") ytitle("") xlab("") subtitle("pour cent") name(Gt, replace) fysize(100) 
                      
                      
                      graph combine Gt Gl, col(1)
                      Click image for larger version

Name:	Graph.png
Views:	1
Size:	47.8 KB
ID:	1786149

                      Comment

                      Working...
                      X