Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Colour Coding Markers In a Weighted Scatter Plot

    I am trying to create a scatter plot of Disease Mortality (CD_Mortality) and Income (GDP) weighted by Health Expenditure (Health_Exp), I have being able to produce this using the following code:

    twoway (scatter CD_Mortality GDP [aweight = Health_Exp], mcolor(navy8) msymbol(smcircle_hollow))

    Click image for larger version

Name:	Screen Shot 2019-07-12 at 09.43.31.png
Views:	1
Size:	351.1 KB
ID:	1507288

    I want to colour code each country with respect to their WHO region indicator (WHO_IND), however if I separate CD_Mortality into the regions and plot using the twoway scatter function it will weight each category individually (See attachment). Is there a way for the marker colour to represent the WHO region and for the marker to be weighted across all observations not just its WHO region. Thank you for your time. (Using STATA 16)
    Click image for larger version

Name:	Screen Shot 2019-07-12 at 10.10.03.png
Views:	1
Size:	326.4 KB
ID:	1507289

  • #2
    I saw this post a couple of days ago, but I didn't bother to attempt a solution since you do not provide a data example. Please review the FAQs on how to pose questions to increase your chances of getting a helpful reply. Here, the issue is that your weights are not consistent across groups whereas Stata weights markers within groups. There is an easy fix to this... the idea can be found in this thread.

    Reproducible example:

    Code:
    sysuse auto, clear
    *SCATTER MPG WEIGHT WEIGHTED BY REP78
    twoway (scatter mpg weight [aweight = rep78], mcolor(navy8) ///
    msymbol(smcircle_hollow)), scheme(s1color)
    gr save gr1.gph, replace
    
    *DIFFERENTIATE GROUPS BY FOREIGN
    twoway (scatter mpg weight [aweight = rep78] if foreign, mcolor(navy8) ///
    msymbol(smcircle_hollow))(scatter mpg weight [aweight = rep78] if ///
    !foreign, mcolor(blue) msymbol(smcircle_hollow)), scheme(s1color)
    gr save gr2.gph, replace
    gr combine gr1.gph gr2.gph, scheme(s1color)
    Click image for larger version

Name:	gr1.png
Views:	1
Size:	115.5 KB
ID:	1507854


    Here, you can see that with inconsistent weights, the markers across groups have different sizes (see red arrows in the attached figure). You can make sure that weights are consistent across groups by duplicating observations and switching categories for the duplicated set. Thereafter, replace observations of the variables that you use with missing values.

    Code:
    expand 2, gen(set)
    recode foreign (0=1) (1=0) if set
    foreach var in mpg weight{
    replace `var'=. if set
    }
    twoway (scatter mpg weight [aweight = rep78] if foreign, mcolor(navy8) ///
    msymbol(smcircle_hollow))(scatter mpg weight [aweight = rep78] if ///
    !foreign, mcolor(blue) msymbol(smcircle_hollow)), scheme(s1color)
    gr save three.gph, replace
    gr combine one.gph two.gph three.gph, scheme(s1color)
    Click image for larger version

Name:	Graph.png
Views:	1
Size:	137.4 KB
ID:	1507855

    Comment


    • #3
      Hello, any suggestions on how to do this when there are more than two groups? For example, based on the previous examples, the following code works well for 2 groups:

      Code:
      clear all
      input x y weight group
      1 1 1 1
      2 1 2 1
      1 2 4 2
      2 2 8 2
      end

      expand 2, gen(set)
      recode group (2=1) (1=2) if set
      foreach var in y x{
      replace `var'=. if set
      }
      twoway (scatter y x if group==1 [w=weight]) (scatter y x if group==2 [w=weight]), legend(off)

      But what if there are four groups? For example using the data below?

      Code:
      clear all
      input x y weight group
      1 1 1 1
      2 1 2 1
      1 2 4 2
      2 2 8 2
      1 3 10 3
      2 3 12 3
      1 4 14 4
      2 4 18 4
      end

      Comment


      • #4
        I have subsequently worked out a general solution to this problem and based on my example in #2 is

        Code:
        expand 2, gen(set)
        replace foreign=-1 if set
        fillin foreign rep78
        drop if set==1
        provided that

        Code:
        tab foreign
        does not include a category with the value -1.

        Comment

        Working...
        X