Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Scatterplot utilizing weights which control color of the points rather than size of them

    Dear Stata experts!
    I have a question regarding the scatterplot. I tried to search the forum (as well as other I-net resources) but found nothing which would help.
    So, here is the problem:
    I have a cohort of workers and want to make certain scatterplot. On x-axis I have worker's age while on y-axis I have worker's duration of the follow-up, both as continuous variables. So, each point on this plot represents one worker and the location of the dot is defined by values of the two variables. In the same time I have another variable, representing accumulated risk of, say, lung cancer death related to exposure at work. This latter variable is continuous and changes within range [0 1]. What I want is to make a scatterplot where the color of the dots would represent the value of the risk. I can easily categorize risk variable and make a twoway plot of several categories in several colors but using colors changing smoothly seems a more elegant solution. I read here more or less similar question, but there was no solution - which I am not sure even exists. So, anyway, any suggestions would be very much appreciated.

  • #2
    Dear Mik,

    Sadly, Stata does not yet provide the control of color the way you describe.
    Myself, I have suggested this to technical support earlier, including the same for the control of opacity, which would be even more functional.
    However, recently, Simon Hess, of Goethe University Frankfurt, published a package with which we can produce the sort of scatterplot that you are looking for.
    You did not provide any data, so I provide here the example of Simon, somewhat edited:
    Code:
    * Get Package
    ssc install colorscatter
    help colorscatter
    
    * Example
    clear all
    set obs 1000
    gen x = runiform()
    gen y = runiform()
    gen color = min(x,y)
    
    * Create plot
    colorscatter y x color , scatter_options(msymb(o) msize(*1.4)) rgb_low("255 0 0") rgb_high("0 0 255")    ///
        graphregion(lcolor(white) fcolor(white) margin(l-2 r-1 b-3 t-2)) xsize(6) ysize(6)    ///
        legend(region(lcolor(white) fcolor(white))) xscale(noextend) yscale(noextend) ylab(, glcolor(gs14))
    I suppose you can get the same using your 'color' variable data, certainly as it ranges between 0 and 1.
    It would be nice if you upload your graph below, once you get this done.
    Last edited by ericmelse; 11 Nov 2017, 06:29.
    http://publicationslist.org/eric.melse

    Comment


    • #3
      Dear Eric (I assume your name is Eric, right?), thank you very much - the answer you provided is exactly the solution what I was looking for. Not 100% complete though, but you've said it yourself - adding opacity would be good.
      I am trying to get a plot as nice as I can - I would need some tome to get it right as well as to get myself familiar with the forum's rules on uploading images. I also need some time to familiarize myself with the forum rules - in particular what kind of graphic files it accepts. Thank you very much again!
      Sincerely,
      Mik Sokol

      Comment


      • #4
        I don't follow all of this but part of the additions to Stata 15 was opacity; see highlight 14 in
        Code:
        help whatsnew14to15

        Comment


        • #5
          There is also a routine within the brewscheme package (brewterpolate) that can be used to generate the list of RGB values interpolated between two colors in several different color spaces. The challenge with mapping continuous values data (e.g., accumulated risk) to a color is two fold and doesn’t seem to be likely to get resolved any time in the near future. The first issue is that color perception isn’t strictly linear in any of the color spaces. Even using grayscale/monochromatic colors doesn’t completely solve the mapping challenge of continuous values data to equivalently spaced perceptible differences in hue, luminescence, chroma, reds, greens, blues, cyans, yellows, etc... This ends up driving the second issue which is essentially defining a color mapping that is granular enough for the needs but still easily perceptible to end users. If you also consider that some end users are likely to have color sight impairments, you’re again generally relegated to using a monochromatic color mapping where you vary a single property (e.g., in HCL space you might vary the luminence property, or in RGBa space you may end up varying the opacity). But you still need to define some type of value mapping that is more or less categorical. You could theoretically allow a computer to determine this type of mapping for you, but I wouldn’t feel completely comfortable allowing a computer to make decisions about color mappings when you need a human to be able to perceive the difference.

          Comment

          Working...
          X