Scatter graph by VAR: how to properly weighting

Giorgio Piccitto

Join Date: Oct 2016
Posts: 238

Scatter graph by VAR: how to properly weighting

06 Oct 2021, 09:55

Dear all, I am struggling with a graphical bubble representation of my data.

I would like to plot the predicted probabilities for six groups (my x var) by country with a scatterplot, which at the same time should take into account the 'size' of each group (by country) I am considering.

Here the command I use:

Code:

la val  country country
la def  country ///
1"DE" ///
2"FR" ///
3"SE" ///
4"DK" ///
5"NO" ///
6"UK" ///
7"IE" ///
8"NL" ///
9"BE" ///
10"IT" ///
11"ES" ///
12"GR"  , modify
fre country



la val  or or
la def  or ///
15 "A" ///
14 "b"  ///
13 "C"  ///
12 "D"  ///
11 "E"  ///
10 "F"  , modify
fre id

twoway (scatter margins or [fw=peso ] ,  msymbol(circle_hollow)) ///
  (line margins or if or == 10 | or == 12 | or == 14, lcolor (black) )   ///
    (line margins or if or == 11 | or == 13 | or == 15, lcolor (gs13) )   ///
 , by(country, col(4) note("")) scheme(s1mono)     ///
  graphregion(margin(zero)) ///
 /*yline(0, lc(black) lwidth(vthin) lpattern(solid))*/ ///
 /* xline(13.5 11.5, lc(black) lwidth(vthin) lpattern(dash))*/ ///
 ylab( , nogrid) ///
  ylabel(/*-.6(0.3)0.6*/,angle(0)labsize(small)grid) ///
    xlabel(10(1)15, valuelabel grid labsize(small)) ///
 legend(rows(1)) ///
name(a, replace)
 graph save "a", replace

The graph I obtain (attached) is quite close to what I would like to obtain, but there is one weird (to me) thing that I do not like. Indeed, in different countries the same value of weight generates very different sizes of the balls (see for instance the right ball for 'DE', whose the weight is 39.8, and the last of 'DK', whose the weight is 40.1: they appear clearly of different size, despite their value of weight being pratically the same).
It seems to me that Stata uses the dispersion of the weight to produce balls of different size: in a country where the different groups (A,B,C,D,E,F) considered have very different weight values, then the balls are in general bigger; in country where these weight values are similar across the different groups, all the balls look smaller. I find this way of visualising the data quite misleading to my purposes.

Do you any clue on how to solve this issue?

Thanks a lot in advance, best, G.P.

Here the data

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte or float(pr margins lower upper) byte(country id) float peso str8 var9
10  .6 .5976923 .5635103 .6318744 1  1  5.31 "low-NAT"
11 .51 .5053219 .4365443 .5740996 1  2 24.64 "low-MIG"
12 .74 .7360662 .7261686 .7459637 1  3 50.44 "mid-NAT"
13 .66 .6583158 .6055973 .7110343 1  4 35.03 "mid-MIG"
14 .82 .8172152 .8072185  .827212 1  5 44.15 "high-NAT"
15 .65 .6470902 .5953653  .698815 1  6 39.78 "high-MIG"
10  .6  .601023 .5743551 .6276909 2  7 18.05 "low-NAT"
11 .55 .5517676 .4858773 .6176578 2  8 29.54 "low-MIG"
12 .71 .7145914 .7002927 .7288902 2  9 46.35 "mid-NAT"
13 .61  .613452 .5540551 .6728489 2 10 35.16 "mid-MIG"
14 .79 .7895741 .7733445 .8058037 2 11 35.52 "high-NAT"
15 .64 .6425119 .5816921 .7033318 2 12 35.16 "high-MIG"
10 .75  .748827 .7214509 .7762031 3 13 14.56 "low-NAT"
11 .62 .6180023 .5140577 .7219469 3 14 16.82 "low-MIG"
12 .83 .8290432 .8157801 .8423064 3 15  43.6 "mid-NAT"
13 .69 .6892186  .624153 .7542842 3 16 40.48 "mid-MIG"
14 .91 .9108879 .9007325 .9210434 3 17 41.82 "high-NAT"
15 .81 .8097132 .7571398 .8622867 3 18 41.96 "high-MIG"
10  .7 .7049431 .6761626 .7337236 4 19 14.28 "low-NAT"
11 .34 .3425309 .2333281 .4517336 4 20 30.71 "low-MIG"
12 .79 .7886393 .7726654 .8046131 4 21  37.9 "mid-NAT"
13  .6 .6036703 .4872831 .7200575 4 22 28.63 "mid-MIG"
14 .87 .8668346 .8546618 .8790075 4 23 47.78 "high-NAT"
15 .63 .6278278 .5331815 .7224741 4 24 40.25 "high-MIG"
10 .73 .7337023 .7036204 .7637843 5 25  9.98 "low-NAT"
11 .48 .4844293 .3362609 .6325976 5 26 16.98 "low-MIG"
12 .84 .8406835 .8277858 .8535813 5 27 37.34 "mid-NAT"
13  .7 .6986722 .6015205 .7958239 5 28 35.47 "mid-MIG"
14 .91 .9074479 .8985351 .9163608 5 29 52.59 "high-NAT"
15 .78  .777859 .7008136 .8549044 5 30 47.55 "high-MIG"
end
label values or or
label def or 10 "F", modify
label def or 11 "E", modify
label def or 12 "D", modify
label def or 13 "C", modify
label def or 14 "b", modify
label def or 15 "A", modify
label values country country
label def country 1 "DE", modify
label def country 2 "FR", modify
label def country 3 "SE", modify
label def country 4 "DK", modify
label def country 5 "NO", modify

bubble.gph

Tags: None

Andrew Musau

Join Date: Oct 2014

Posts: 10190
#2

06 Oct 2021, 10:10

You may want to take a look at https://journals.sagepub.com/doi/ful...36867X20931008. I cannot run your code as it uses the community-contributed command fre which I do not want to install. Otherwise, the solutions proposed in the linked note make use of either fillin or separate.
Comment
Giorgio Piccitto

Join Date: Oct 2016

Posts: 238
#3

26 May 2025, 06:27

Solved, thanks!

Last edited by Giorgio Piccitto; 26 May 2025, 06:44.
Comment
Chen Samulsion

Join Date: Jan 2018

Posts: 914
#4

26 May 2025, 06:47

You run two separate scatter plots at once, and there's no variability in weight for each plot. In your first plot, step==0, all weights are set to 800, and in your second plot, step==1, all weights are set to 200. Try:

Code:

twoway scatter val group [fw=group_weight], msymbol(circle_hollow)
Comment
Chen Samulsion

Join Date: Jan 2018

Posts: 914
#5

26 May 2025, 07:02

Cited Statalist FAQ:

Please don't mangle your own posts, even if you solved your problem yourself or realised that the question was silly. Explain the solution, even if it was trivial. Often someone else will have the same problem.
Comment

Announcement

Scatter graph by VAR: how to properly weighting

Comment

Comment

Comment

Comment