Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Barplots with weighted bar widths

    I am working on visualizing the distribution of quality reports across sites and am using bar plots to do so. In this example, each factory reports the proportion of their product of 'bad', 'ok', and 'good' quality, as well as the total amount of product produced. I want to show how this distribution varies across factories, and show the variability in factory size. It was simple to show with uniform bar widths. However, when I went to vary the bar widths by the total amount produced it got a lot more complicated. I have functional code, but am wondering if there is an easier or cleaner way to accomplish this. I'd appreciate any advice on how to clean this up.

    Here is a simple example dataset
    Code:
    clear
    input id bad ok good n
    1 .3 .5 .2 100
    2 .2 .7 .1 200
    3 .4 .1 .5 150
    4 .1 .8 .1 400
    5 .15 .55 .3 50
    end
    Here is the unweighted graph:
    Code:
    graph bar bad ok good, over(id, sort(bad) descending) stack ///
        legend(label(1 bad) label(2 ok) label(3 good)) ytitle("Proportion of Product") title("Product Quality by Factory")


    Here is the variable bar width code. I will plot three different overlaid bar graphs and so must convert the category percentages into cumulative percentages. Then I create an x index based on the cumulative relative size of each factory. Why do I need to add an extra line of data with a final x position to prevent the omission of the last group?

    Code:
    * transform y variables into cumulative variables
    egen c_bad = rowtotal(bad)
    egen c_ok = rowtotal(c_bad ok)
    egen c_good = rowtotal(c_ok good)
    
    * sort the data by the focus variable
    gsort -c_bad
    
    * create width weights and a cumulative x-axis index
    sum n
    gen w = n/(`r(mean)')
    drop x
    gen x = 0 if [_n]==1
    replace x = w[_n-1] + x[_n-1] if [_n]!=1
    
    * add in an extra row as the last x value doesn't get shown
    set obs `=_N + 1'
    replace x = x[_n-1] + w[_n-1] if [_n] == [_N]
    
    graph twoway ///
        bar c_good x, bartype(spanning) || ///
        bar c_ok x, bartype(spanning) || ///
        bar c_bad x, bartype(spanning)||, ///
        legend(label(3 bad) label(2 ok) label(1 good)) ytitle("Proportion of Product") title("Product Quality Distribution") xtitle("factory size weighted index")
    Click image for larger version

Name:	unweight.png
Views:	1
Size:	110.2 KB
ID:	1441649
    Click image for larger version

Name:	weight.png
Views:	1
Size:	127.6 KB
ID:	1441651
    Last edited by Lief Esbenshade; 27 Apr 2018, 02:05.

  • #2
    You can also look at spineplot, see ssc desc spineplot.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Thank you Maarten. From what I can figure out it looks like splineplot expects data in long form. Easy enough to reshape the data into that format, and I can use aweight to adjust the graph along the y-axis. However, I can't seem to figure out the appropriate option to adjust the x-axis. Does spineplot just work best with unsummarized data?

      Code:
      clear
      input id bad ok good n
      1 .3 .5 .2 100
      2 .2 .7 .1 200
      3 .4 .1 .5 150
      4 .1 .8 .1 400
      5 .15 .55 .3 50
      end
      
      egen id_order = axis(bad), reverse
      
      rename (bad ok good) pct#, addnumber
      reshape long pct, i(id) j(rating)
      
      label define ratlab 1 bad 2 ok 3 good
      label values rating ratlab
      
      spineplot rating id_order [aweight=pct]
      * this is the equivalent of my first graph bar, aweight is adjusting the y-axis heights of the bars
      * but how do I create the horizontal weights?
      I can get it working by expanding out the dataset. Is there an option I am missing to create the horizontal weights?

      Code:
      * solution, expand the dataset
      gen n2 = n * pct
      expand n2
      
      spineplot rating id_order

      Comment


      • #4
        spineplot (not splineplot) is from the Stata Journal too.

        SJ-16-2 gr0031_1 . . . . . . . . . . . . . . . Software update for spineplot
        (help spineplot if installed) . . . . . . . . . . . . . . . N. J. Cox
        Q2/16 SJ 16(2):521--522
        x-axis labels have been improved; references added to help file

        SJ-8-1 gr0031 . . . . . . . . . . . Speaking Stata: Spineplots and their kin
        (help spineplot if installed) . . . . . . . . . . . . . . . N. J. Cox
        Q1/08 SJ 8(1):105--121

        I'd approach your example this way. As the help explains spineplot expects to be fed two categorical variables. I'm assuming that your real data have (or should have) integer frequencies, which have to be forced out of your example using round().

        Code:
        clear 
        
        input id bad ok good n
        1 .3 .5 .2 100
        2 .2 .7 .1 200
        3 .4 .1 .5 150
        4 .1 .8 .1 400
        5 .15 .55 .3 50
        end
        
        rename (bad ok good) prop= 
        
        reshape long prop, i(id) j(Grade) string 
        
        gen freq = round(prop * n) 
        
        label def Grade 1 bad 2 ok 3 good 
        encode Grade, gen(grade) label(Grade) 
        
        spineplot grade id [fw=freq]

        Comment


        • #5
          Thank you Nick, this is a very elegant solution. You are correct that the real data has integer frequencies. I apologize for not responding sooner, and have adjusted my email notification settings so as not to miss responses like this in the future.

          As a final adjustment I wanted to apply my own colors and make the bars blend seamlessly into each other. Interestingly, it appears as though Stata is automatically adjusting blcolors to be more intense than the bcolor or bfcolor. I couldn't find any documentation on this behavior, though I certainly understand why it is a nice default setting. By a process of trial and error I found that adjusting the intensity of the blcolor by 0.8 was sufficient to make the bar outlines blend into the bars. Is there a better option for me to make the line and fill colors match? I'd love to see the documentation that explains what is going on.

          Code:
          clear
          
          input id bad ok good n
          1 .3 .5 .2 100
          2 .2 .7 .1 200
          3 .4 .1 .5 150
          4 .1 .8 .1 400
          5 .15 .55 .3 50
          end
          
          rename (bad ok good) prop=
          
          
          reshape long prop, i(id) j(Grade) string
          
          gen freq = round(prop * n)
          
          label def Grade 1 bad 2 ok 3 good
          encode Grade, gen(grade) label(Grade)
          
          * here we see that the lines are darker than the bars, even though we are explicitly defining the same color
          spineplot grade id [fw=freq], ///
              bar1(bcolor(blue) blcolor(blue))  bar2(bcolor(green) blcolor(green))  bar3(bcolor(red) blcolor(red))
          
          * this adjusts the intensity of the blcolor by 0.8, the blcolor is inheriting the specified bcolor
          spineplot grade id [fw=freq], ///
              bar1(bcolor(blue)) bar2(bcolor(green)) bar3(bcolor(red)) ///
              barall(blcolor(*.8))
          Click image for larger version

Name:	image_10931.png
Views:	1
Size:	106.5 KB
ID:	1445786
          Click image for larger version

Name:	image_10930.png
Views:	1
Size:	105.1 KB
ID:	1445785

          edit: picture formatting
          Last edited by Lief Esbenshade; 24 May 2018, 18:55.

          Comment


          • #6
            Originally posted by Nick Cox View Post
            spineplot (not splineplot) is from the Stata Journal too.

            SJ-16-2 gr0031_1 . . . . . . . . . . . . . . . Software update for spineplot
            (help spineplot if installed) . . . . . . . . . . . . . . . N. J. Cox
            Q2/16 SJ 16(2):521--522
            x-axis labels have been improved; references added to help file

            SJ-8-1 gr0031 . . . . . . . . . . . Speaking Stata: Spineplots and their kin
            (help spineplot if installed) . . . . . . . . . . . . . . . N. J. Cox
            Q1/08 SJ 8(1):105--121

            I'd approach your example this way. As the help explains spineplot expects to be fed two categorical variables. I'm assuming that your real data have (or should have) integer frequencies, which have to be forced out of your example using round().

            Code:
            clear
            
            input id bad ok good n
            1 .3 .5 .2 100
            2 .2 .7 .1 200
            3 .4 .1 .5 150
            4 .1 .8 .1 400
            5 .15 .55 .3 50
            end
            
            rename (bad ok good) prop=
            
            reshape long prop, i(id) j(Grade) string
            
            gen freq = round(prop * n)
            
            label def Grade 1 bad 2 ok 3 good
            encode Grade, gen(grade) label(Grade)
            
            spineplot grade id [fw=freq]
            Hi Nick,

            Is it possible to produce this graph on a data set that contains negative cell frequencies? An example is shown below, where the original data set is edited to contain negative cell frequencies for the first observation. This can be possible in studies where we want to plot the percentage contribution of different factors to an outcome, by a third variable, and in many cases a factor can contribute negatively to an outcome. Looks like this cannot be done by spineplot (negative weights)? The original code provided by the OP seems ok (it uses twoway bar, which can handle negative y value), but it also has an issue that I cannot solve by myself: it does not sort the position of the bars correctly.
            Thank you!

            Code:
            clear 
            
            input id bad ok good n
            1 -.3 .5 .8 100
            2 .2 .7 .1 200
            3 .4 .1 .5 150
            4 .1 .8 .1 400
            5 .15 .55 .3 50
            end
            rename (bad ok good) prop= 
            
            reshape long prop, i(id) j(Grade) string 
            
            gen freq = round(prop * n) 
            
            label def Grade 1 bad 2 ok 3 good 
            encode Grade, gen(grade) label(Grade) 
            
            spineplot grade id [fw=freq]
            negative weights encountered

            Comment


            • #7
              spineplot is a dead end for you here, I think. twoway bar is fine with negative values.

              Comment


              • #8
                Originally posted by Nick Cox View Post
                spineplot is a dead end for you here, I think. twoway bar is fine with negative values.
                Thank you Nick for the quick response. By any chance, do you know how to "correct" the position of the bars shown in the graph below (and make it match the xlabel)? For example, the bar with negative values corresponds to id=1 in the data, but is shown at the last position in the graph which corresponds only roughly to id=5.

                Code:
                clear 
                
                input id bad ok good n
                1 -.3 .5 .8 100
                2 .2 .7 .1 200
                3 .4 .1 .5 150
                4 .1 .8 .1 400
                5 .15 .55 .3 50
                end
                
                * transform y variables into cumulative variables
                egen c_bad = rowtotal(bad)
                egen c_ok = rowtotal(c_bad ok)
                egen c_good = rowtotal(c_ok good)
                
                * sort the data by the focus variable
                gsort -c_bad
                
                * create width weights and a cumulative x-axis index
                sum n
                gen w = n/(`r(mean)')
                gen x = 0 if [_n]==1
                replace x = w[_n-1] + x[_n-1] if [_n]!=1
                
                * add in an extra row as the last x value doesn't get shown
                set obs `=_N + 1'
                replace x = x[_n-1] + w[_n-1] if [_n] == [_N]
                
                graph twoway ///
                    bar c_good x, bartype(spanning) || ///
                    bar c_ok x, bartype(spanning) || ///
                    bar c_bad x, bartype(spanning)||, ///
                    legend(label(3 bad) label(2 ok) label(1 good)) ytitle("Proportion of Product") title("Product Quality Distribution") xtitle("factory size weighted index")
                Click image for larger version

Name:	Graph.png
Views:	1
Size:	41.0 KB
ID:	1604493

                Comment

                Working...
                X