Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • twoway scatter plot with bar overlay — opacity issues

    Hi everyone,

    I'm trying to create a bar plot using the twoway command with scatterplots "behind" the bar plots. I have tried to use the opacity option in the bar plots so that the scatter can be seen behind the bars, but even with very large amounts of opacity the data points are largely occluded. I am using Stata version 16.

    To get an idea of what I mean, the code below sets opacity at a 25% fill, and produces the attached graph. As you can see, the data points are still not visible behind the bar plots. When I use the opacity option for other graphs, I haven't had an issue — it seems to be specific to graphs produced in twoway.

    I realize I could rearrange the overlay order so that the data points fall "on top" of the bar chart, but would prefer not to do that for aesthetic reasons. Any suggestions would be much appreciated!


    Code:
    sysuse auto, clear
    
    // generating means and standard errors
    sum price if foreign == 0
    gen mean = r(mean) if foreign == 0
    gen ub = r(mean) + r(sd)/sqrt(r(N)) if foreign == 0
    gen lb = r(mean) - r(sd)/sqrt(r(N)) if foreign == 0
    sum price if foreign == 1
    replace mean = r(mean) if foreign == 1
    replace ub = r(mean) + r(sd)/sqrt(r(N)) if foreign == 1
    replace lb = r(mean) - r(sd)/sqrt(r(N)) if foreign == 1
    
    // graph
    twoway (scatter price foreign if foreign == 0, mcolor(orange) jitter(5) msize(tiny)) ///
             (scatter price foreign if foreign == 1, mcolor(red) jitter(5) msize(tiny)) ///
            (bar mean foreign if foreign == 0, bcolor(orange%25) barwidth(.8)) ///
            (bar mean foreign if foreign == 1, bcolor(red%25) barwidth(.8)) /// 
            (rcap ub lb foreign if foreign == 0, lcolor(black) msize(0) lwidth(medium)) ///
            (rcap ub lb foreign if foreign == 1, lcolor(black) msize(0) lwidth(medium)), ///
             legend(off) xlabel(0 "Domestic" 1 "Foreign", labsize(medlarge) labcolor(black)) graphregion(color(white) lwidth(medium)) ytitle("Price") xsize(3) ysize(4) ylab(, nogrid)
    Graph.jpg
    Attached Files

  • #2
    You might try reducing the opacity to 5%. Also for -twoway bar- there is -fintensity()- option. The values:
    Code:
    ...bcolor(orange%5) fintensity(inten60) ... ///
    ... bcolor(red%5) fintensity(inten60)...
    Click image for larger version

Name:	Graph.png
Views:	1
Size:	16.5 KB
ID:	1563137

    Comment


    • #3
      Thanks for the suggestion, Scott. I guess part of my question is why the opacity works so much differently for twoway than it does for other graphing commands. Even when set at 5% opacity, it still not entirely easy to see the orange data points behind the bar graph. Contrast this with 5% opacity when constructing a histogram, where the same colors are much more transparent/less opaque.

      Also from what I can tell the fintensity does affect transparency of the object, it just lightens the fill color.

      Code:
      sysuse auto, clear
      twoway (histogram price if foreign, width(500) start(2000) color(red%5) disc freq) ///
             (histogram price if !foreign, width(500) start(2000) color(orange%5) disc freq), legend(order(1 "Foreign" 2 "Domestic")) graphregion(color(white) lwidth(medium))
      Click image for larger version

Name:	Graph.jpg
Views:	1
Size:	175.2 KB
ID:	1563143

      Comment


      • #4
        There are several issues here, ranging from aesthetic to technical. I don't understand the point about histograms, as whether you use histogram or twoway histogram you are still using the same opacity or transparency code as you would with twoway bar. My line is that if you plot on top of data points that are shown using tiny marker sizes, they may well be hard to spot.

        There is a negative literature on the main design here -- bars for means starting at zero, more generally estimates, plus error bars for confidence intervals, more generally uncertainty indications -- under headings such as dynamite, detonator or plunger plots.

        As many people face broadly the same issue -- plotting two or more groups of data with summaries, etc. -- I thought I'd widen the thread by showing one of various possibilities with stripplot (community-contributed from SSC). Although in principle you can get there with one command line, in practice I guess that like me you may need to experiment with several alternatives before choosing, which is certainly what i did here.

        Code:
        sysuse auto, clear
        set scheme s1color
        stripplot mpg , by(foreign, note(""))  bar(level(68)) boffset(-0.05) vertical cumul height(0.5) cumprob aspect(1) yla(, ang(h))


        Without wanting to make David's choices for him, I make two wider points:

        1. Bars starting at zero imply that the main deal is how far this group is not zero. That could well be an issue with David's real data, but here and often the main deal is comparing groups with each other, not with zero.

        2. Jittering is widely used and often better than overplotting, but here (and often) there are direct alternatives that allow checking for other details in the data. Although the syntax here asks for a cumulative display, what you are seeing is a quantile plot of the data for each group.

        There is always scope for tinkering. For example, although I tend to use s1color as a personal default scheme, I don't like its choice for subtitle fill colour, and tend to use something more cheerful.
        Click image for larger version

Name:	plunger.png
Views:	1
Size:	28.8 KB
ID:	1563151


        Last edited by Nick Cox; 13 Jul 2020, 02:12.

        Comment


        • #5
          Many thanks, Nick. I wasn't aware of stripplot, which is a very nice alternative. And I agree with your points about why using bar plots are problematic as a visual display of information.

          I'm still confused though why the opacity function is so different when generating a bar plot than when generating a histogram.

          Comment


          • #6
            I don't see that it is different at all. You just don't mind opacity in histogram bars where there is nothing being obscured.

            Comment


            • #7
              I have the same problem.

              The "response" of a bar's observed color opacity to a "unit-change" in opacity in a twoway bar plot added to a previously generated graph is much weaker than in a standalone twoway bar graph.

              For the same level of opacity, the end result is just not the same.
              Last edited by Matteo Pinna Pintor; 16 Mar 2023, 12:37.
              I'm using StataNow/MP 18.5

              Comment


              • #8
                #7 needs more specificity please.

                Comment


                • #9
                  I have same issues with twoway line command since the graph was not appeared instead of appearing this line "(note: default scheme s2color not found, ignored)" and "tsline is not a twoway plot type"

                  Comment


                  • #10
                    Solved, for me but possibly also for the original question. The intuition is that if a color appears (and in fact is) less transparent than expected, based on the specified opacity setting, then one possibility is that the colored object has been created more than once. In my case, I was plotting dummies on a secondary axis to create shaded areas in a graph - along these lines. The problem was that, using panel data, the dummy I was creating was taking the value of one for each panel unit and specified period, and hence it was -I guess- plotted multiple times. Restricting to one panel unit when creating the dummy solves the problem.
                    I'm using StataNow/MP 18.5

                    Comment


                    • #11
                      #10 That sounds plausible. The tag() function of egen was written with problems like this in mind.

                      #9 sounds like a completely different problem. Please start a new thread and follow advice at https://www.statalist.org/forums/help#stata to give a data example and the full command you're using. A title like Problem with tsline should be used.
                      Last edited by Nick Cox; 17 Mar 2023, 02:56.

                      Comment

                      Working...
                      X