Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • New command -colorscatter- available from SSC

    Hi all,

    Thanks to Kit Baum, a new programme -colorscatter- is available for download from SSC. -colorscatter- draws (twoway) scatterplots allowing to vary the marker color by a third varaible.

    The package can be installed using:
    Code:
    ssc install colorscatter
    The program works by merging many normal twoway scatter plots with varying parameters.

    The following code is an example application using all available options:

    Code:
    set obs 1000
    gen x = rnormal()
    gen y = rnormal()
    gen c = min(abs(x),abs(y))
    
    colorscatter x y c,
      scatter_options(msymb(Oh))    /// This is passed to twoway scatter to draw circles as markers
      rgb_low(255 0 0) rgb_high(0 255 0) /// This specifies the colors for low and for high values of c
      cmax(1.5) cmin(0.5) /// This specifies the lowest and highest values for the color  gradient. lower and higher values of c wil all yield the same color.
      keeplegend               /// By default colorscatter creates a custom legend, if the users want to specify their own legend this needs to be specified
      legend(order(2 "c = lowest " 150 "c = highest") pos(2) col(1)) /// This draws a new legend
      title("Twowaytitle")       /// Any option which colorscatter does not know is simply passed on to twoway. Hence twoway options can be specified as usual
    Click image for larger version

Name:	Graph.png
Views:	1
Size:	59.2 KB
ID:	1378143
    Last edited by Simon Heß; 13 Mar 2017, 07:49.

  • #2
    Hi, Thanks so much! I've always gone to matlab in the past, to make such plots; this will really save me some time. Is there some way to pass the "lowest" and "highest" color values (in your example 2 and 150) through? Because at the moment, I have to play around w/ random value to find these values, if I want to customize the legend. It would be great to be able to specify something like legend(order($cmin "Var 3 at Lowest" $cmax "Var 3 at Highest)), if the low/high color value were passed through as globals. Or is this already possible in some way that I'm missing? Thank you!

    Comment


    • #3
      Simon, there is a problem with your code: "///" is missing after "colorscatter x y c,". Without "///" all options are ignored.
      Code:
      colorscatter x y c, ///
        scatter_options(msymb(Oh)) /// This is passed to twoway scatter to draw circles as markers
        rgb_low(255 0 0) rgb_high(0 255 0) /// This specifies the colors for low and for high values of c
        cmax(1.5) cmin(0.5) /// This specifies the lowest and highest values for the color gradient. lower and higher values of c wil all yield the same color.
        keeplegend /// By default colorscatter creates a custom legend, if the users want to specify their own legend this needs to be specified
        legend(order(2 "c = lowest " 150 "c = highest") pos(2) col(1)) /// This draws a new legend
        title("Twowaytitle") /// Any option which colorscatter does not know is simply passed on to twoway. Hence twoway options can be specified as usual

      Comment


      • #4
        About the example code, note that there should be only two forward slashes in the last line of the code, before the comment, like:
        Code:
        title("Twowaytitle") // Any option which colorscatter does not know is simply passed on to twoway. Hence twoway options can be specified as usual
        http://publicationslist.org/eric.melse

        Comment


        • #5
          It would be nice to enable the control of opacity in colorscatter. Like it is possible with regular scatterplots:
          Code:
          twoway scatter x y , msymb(Oh) mlcolor(red)           // Marker line color full red not opaque
          twoway scatter x y , msymb(Oh) mlcolor(red%50)        // Marker line color full red 50% opaque
          twoway scatter x y , msymb(Oh) mlcolor("255 0 0")     // Marker line color full red not opaque
          twoway scatter x y , msymb(Oh) mlcolor("255 0 0%50")  // Marker line color full red 50% opaque
          We could use the same approach in colorscatter, like (which now will not run):
          Code:
          colorscatter x y c, scatter_options(msymb(Oh)) /// 
            rgb_low("255 0 0%50") rgb_high("0 255 0%50") /// This specifies the colors with 50% opacity.
            cmax(1.4) cmin(0.005) keeplegend legend(order(2 "c = lowest " 150 "c = highest") pos(2) col(1)) ///
            title("Twowaytitle") // This syntax example will not run now!
          http://publicationslist.org/eric.melse

          Comment


          • #6
            Originally posted by Leah Bevis View Post
            Hi, Thanks so much! I've always gone to matlab in the past, to make such plots; this will really save me some time. Is there some way to pass the "lowest" and "highest" color values (in your example 2 and 150) through? Because at the moment, I have to play around w/ random value to find these values, if I want to customize the legend. It would be great to be able to specify something like legend(order($cmin "Var 3 at Lowest" $cmax "Var 3 at Highest)), if the low/high color value were passed through as globals. Or is this already possible in some way that I'm missing? Thank you!
            Dear Leah, I am not sure I get what you're asking. Can tell me what text you want to see in the legend of the graph?

            Comment


            • #7
              Dear Simon,
              Thank you for this program.
              I believe there is a small bug in the legend generation. The bug only occurs when I color by a binomially distributed integer.
              This code recreates the problem:
              Code:
              set obs 1000
              gen x = rnormal()
              gen y = rnormal()
              gen c =  rbinomial(1, 0.5)
              
              colorscatter y x c
              Please note the legend:
              Click image for larger version

Name:	bug.png
Views:	1
Size:	63.2 KB
ID:	1419082

              Comment


              • #8
                Simon -- sorry for not seeing your response right away. I guess basically, I think this would be much better if instead of "high" and "low" values, you instead allowed for color spectrum legends as shown on the last page here: https://www.nceas.ucsb.edu/~frazier/...Cheatsheet.pdf. Both matlab and R allow the capability that you are building here in Stata (thanks again!) and they both use spectrum legends rather than point legends.

                With respect to your code particularly, I guess the root problem is (a) I'm not really sure why you chose those values of cmax and cmin, since they are not the max/min of the variable c, (b) if I move the cmax and cmin values to the true max and min, the color spectrum becomes less clear, perhaps because I'm not sure what colors mean what value --- i.e., is brown past red, or between green and red? --- and (c) it's not clear to me why you chose (or how one would know to choose) 2 and 15 as the highest and lowest... normally, legend(order(1 "blah" 2 "blah")) is used for 2 graphs, so presumably there is some internal process with 150 graphs, where you are ignoring all graphs between 1 and 150? But how would I know, as a user, that 1 and 150 are the numbers to choose? And do they really represent the "max" and "min" c values, or do they represent your specified max and min values, that are not the true max and min? Clearly, I'm a little confused here, and likely would understand better if I looked at your internal code... but I guess take this as a naive user perspective.

                Basically, I love this functionality, but I think it should come with a color spectrum bar rather than max/min points, where the default automates the max and min (values represented by the top and bottom colors of color bar), but you can chose to alter those max and min values if you wish.

                Comment


                • #9
                  Originally posted by Leah Bevis View Post
                  Simon -- sorry for not seeing your response right away. I guess basically, I think this would be much better if instead of "high" and "low" values, you instead allowed for color spectrum legends as shown on the last page here: https://www.nceas.ucsb.edu/~frazier/...Cheatsheet.pdf. Both matlab and R allow the capability that you are building here in Stata (thanks again!) and they both use spectrum legends rather than point legends.

                  With respect to your code particularly, I guess the root problem is (a) I'm not really sure why you chose those values of cmax and cmin, since they are not the max/min of the variable c, (b) if I move the cmax and cmin values to the true max and min, the color spectrum becomes less clear, perhaps because I'm not sure what colors mean what value --- i.e., is brown past red, or between green and red? --- and (c) it's not clear to me why you chose (or how one would know to choose) 2 and 15 as the highest and lowest... normally, legend(order(1 "blah" 2 "blah")) is used for 2 graphs, so presumably there is some internal process with 150 graphs, where you are ignoring all graphs between 1 and 150? But how would I know, as a user, that 1 and 150 are the numbers to choose? And do they really represent the "max" and "min" c values, or do they represent your specified max and min values, that are not the true max and min? Clearly, I'm a little confused here, and likely would understand better if I looked at your internal code... but I guess take this as a naive user perspective.

                  Basically, I love this functionality, but I think it should come with a color spectrum bar rather than max/min points, where the default automates the max and min (values represented by the top and bottom colors of color bar), but you can chose to alter those max and min values if you wish.
                  I'll second that. I can infer that the program creates and overlays a lot of individual scatterplots. The problem is, how do we know what numbers to use in the legend? Is it always that 2 is the lowest and 150 the highest? Or do we need to count the number of unique values in the Z-axis? Some guidance would be helpful.

                  Another issue: the graph in the first post says that green represents the lowest values of c, and red represents the highest. However, the RGB codes in the command seem to imply that they are reversed:

                  Code:
                   
                   rgb_low(255 0 0)   
                   rgb_high(0 255 0) help colorstyle
                  The option -rgb_low- is asking for red = 255, green = 0, and blue = 0. The help for color styles appears to confirm that this would be red. The option rgb_high is the code for green. So, when we consider how the legend is ordered, is it that the lowest number in the legend represents the highest value of c?
                  Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                  When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                  Comment


                  • #10
                    Weiwen Ng - I am getting the same problem/result - reversed color gradients.
                    Eric A. Booth | Senior Director of Research | Far Harbor | Austin TX

                    Comment


                    • #11
                      Weiwen Ng eric_a_booth and Leah Bevis

                      Thanks for your posts. Maybe a little disclaimer is necessary here: colorscatter is not something I am actively maintaining. It's a pretty hacky bit of code that performs badly and I only posted here because there are no better options for Stata. I guess in most cases it'd be better to do switch to R or the like for these things.

                      Regarding the issues you raised: (1) I now switched rgb_high and rgb_low, (2) you can always adapt your legend be specifying the -keeplegend- option and then specifying any usual twoway-style -legend(order(1 "abc" 2 "abcd").

                      For you convenience I added gradient-style legend to the code (you can get the latest version from my github https://github.com/simonheb/colorscatter)

                      The graphs look like this.
                      Click image for larger version

Name:	Unbenannt.png
Views:	1
Size:	56.4 KB
ID:	1425608



                      You are also welcome to edit the code of colorscatter yourself, if you want to change something

                      Comment


                      • #12
                        Neat! Thanks so much Simon. I will take a quick look at your code, and may indeed end up trying to edit my own version at some point, if I want a similar but different type of plot. I love the gradient you've added here! I've long wished that Stata had more options for color-as-third-dimension visualizations, so this is great to have as a start!

                        Comment


                        • #13
                          Simon Heß

                          Thank you for sharing this! Is there a way to connect the markers? With the typical scatter command I can use the connect option, but it doesn't work with the colorscatter unfortunately

                          Thanks!

                          Comment


                          • #14
                            Dear farah omran ,

                            Sorry for the late reply. I am not actively monitoring this thread. You can however always reach me via my Github.

                            Regarding your question. No it can't be done. But I updated the code so that now you can add any arbitrary graph below or above the plot. So what you want to do can be easily achieved by something like this:

                            colorscatter x1 x2 y, tw_pre(line x1 x2, lc(gray))

                            This will produce a plot like this:
                            Click image for larger version

Name:	Untitled.png
Views:	1
Size:	72.4 KB
ID:	1479824



                            To be able to use the tw_pre and tw_post option you need to update the code you're using. You can do so by running

                            Code:
                            net describe colorscatter, from(https://raw.githubusercontent.com/simonheb/colorscatter/master/)

                            Comment


                            • #15
                              Simon Heß
                              Thanks for sharing this really useful code, which I came across just now.

                              Unfortunately, it took me quite some time to get the color code right, which I think is due to the fact that the color format in the Stata command is grb (green red blue) instead of rgb (red green blue). At least for me it worked with this revised order.

                              Comment

                              Working...
                              X