Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Scatter graph by VAR: how to properly weighting

    Dear all, I am struggling with a graphical bubble representation of my data.

    I would like to plot the predicted probabilities for six groups (my x var) by country with a scatterplot, which at the same time should take into account the 'size' of each group (by country) I am considering.

    Here the command I use:

    Code:
    la val  country country
    la def  country ///
    1"DE" ///
    2"FR" ///
    3"SE" ///
    4"DK" ///
    5"NO" ///
    6"UK" ///
    7"IE" ///
    8"NL" ///
    9"BE" ///
    10"IT" ///
    11"ES" ///
    12"GR"  , modify
    fre country
    
    
    
    la val  or or
    la def  or ///
    15 "A" ///
    14 "b"  ///
    13 "C"  ///
    12 "D"  ///
    11 "E"  ///
    10 "F"  , modify
    fre id
    
    twoway (scatter margins or [fw=peso ] ,  msymbol(circle_hollow)) ///
      (line margins or if or == 10 | or == 12 | or == 14, lcolor (black) )   ///
        (line margins or if or == 11 | or == 13 | or == 15, lcolor (gs13) )   ///
     , by(country, col(4) note("")) scheme(s1mono)     ///
      graphregion(margin(zero)) ///
     /*yline(0, lc(black) lwidth(vthin) lpattern(solid))*/ ///
     /* xline(13.5 11.5, lc(black) lwidth(vthin) lpattern(dash))*/ ///
     ylab( , nogrid) ///
      ylabel(/*-.6(0.3)0.6*/,angle(0)labsize(small)grid) ///
        xlabel(10(1)15, valuelabel grid labsize(small)) ///
     legend(rows(1)) ///
    name(a, replace)
     graph save "a", replace

    The graph I obtain (attached) is quite close to what I would like to obtain, but there is one weird (to me) thing that I do not like. Indeed, in different countries the same value of weight generates very different sizes of the balls (see for instance the right ball for 'DE', whose the weight is 39.8, and the last of 'DK', whose the weight is 40.1: they appear clearly of different size, despite their value of weight being pratically the same).
    It seems to me that Stata uses the dispersion of the weight to produce balls of different size: in a country where the different groups (A,B,C,D,E,F) considered have very different weight values, then the balls are in general bigger; in country where these weight values are similar across the different groups, all the balls look smaller. I find this way of visualising the data quite misleading to my purposes.

    Do you any clue on how to solve this issue?

    Thanks a lot in advance, best, G.P.

    Here the data

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte or float(pr margins lower upper) byte(country id) float peso str8 var9
    10  .6 .5976923 .5635103 .6318744 1  1  5.31 "low-NAT"
    11 .51 .5053219 .4365443 .5740996 1  2 24.64 "low-MIG"
    12 .74 .7360662 .7261686 .7459637 1  3 50.44 "mid-NAT"
    13 .66 .6583158 .6055973 .7110343 1  4 35.03 "mid-MIG"
    14 .82 .8172152 .8072185  .827212 1  5 44.15 "high-NAT"
    15 .65 .6470902 .5953653  .698815 1  6 39.78 "high-MIG"
    10  .6  .601023 .5743551 .6276909 2  7 18.05 "low-NAT"
    11 .55 .5517676 .4858773 .6176578 2  8 29.54 "low-MIG"
    12 .71 .7145914 .7002927 .7288902 2  9 46.35 "mid-NAT"
    13 .61  .613452 .5540551 .6728489 2 10 35.16 "mid-MIG"
    14 .79 .7895741 .7733445 .8058037 2 11 35.52 "high-NAT"
    15 .64 .6425119 .5816921 .7033318 2 12 35.16 "high-MIG"
    10 .75  .748827 .7214509 .7762031 3 13 14.56 "low-NAT"
    11 .62 .6180023 .5140577 .7219469 3 14 16.82 "low-MIG"
    12 .83 .8290432 .8157801 .8423064 3 15  43.6 "mid-NAT"
    13 .69 .6892186  .624153 .7542842 3 16 40.48 "mid-MIG"
    14 .91 .9108879 .9007325 .9210434 3 17 41.82 "high-NAT"
    15 .81 .8097132 .7571398 .8622867 3 18 41.96 "high-MIG"
    10  .7 .7049431 .6761626 .7337236 4 19 14.28 "low-NAT"
    11 .34 .3425309 .2333281 .4517336 4 20 30.71 "low-MIG"
    12 .79 .7886393 .7726654 .8046131 4 21  37.9 "mid-NAT"
    13  .6 .6036703 .4872831 .7200575 4 22 28.63 "mid-MIG"
    14 .87 .8668346 .8546618 .8790075 4 23 47.78 "high-NAT"
    15 .63 .6278278 .5331815 .7224741 4 24 40.25 "high-MIG"
    10 .73 .7337023 .7036204 .7637843 5 25  9.98 "low-NAT"
    11 .48 .4844293 .3362609 .6325976 5 26 16.98 "low-MIG"
    12 .84 .8406835 .8277858 .8535813 5 27 37.34 "mid-NAT"
    13  .7 .6986722 .6015205 .7958239 5 28 35.47 "mid-MIG"
    14 .91 .9074479 .8985351 .9163608 5 29 52.59 "high-NAT"
    15 .78  .777859 .7008136 .8549044 5 30 47.55 "high-MIG"
    end
    label values or or
    label def or 10 "F", modify
    label def or 11 "E", modify
    label def or 12 "D", modify
    label def or 13 "C", modify
    label def or 14 "b", modify
    label def or 15 "A", modify
    label values country country
    label def country 1 "DE", modify
    label def country 2 "FR", modify
    label def country 3 "SE", modify
    label def country 4 "DK", modify
    label def country 5 "NO", modify
    bubble.gph

  • #2
    You may want to take a look at https://journals.sagepub.com/doi/ful...36867X20931008. I cannot run your code as it uses the community-contributed command fre which I do not want to install. Otherwise, the solutions proposed in the linked note make use of either fillin or separate.

    Comment


    • #3
      Solved, thanks!
      Last edited by Giorgio Piccitto; 26 May 2025, 06:44.

      Comment


      • #4
        You run two separate scatter plots at once, and there's no variability in weight for each plot. In your first plot, step==0, all weights are set to 800, and in your second plot, step==1, all weights are set to 200. Try:
        Code:
        twoway scatter val group [fw=group_weight],  msymbol(circle_hollow)

        Comment


        • #5
          Cited Statalist FAQ:
          Please don't mangle your own posts, even if you solved your problem yourself or realised that the question was silly. Explain the solution, even if it was trivial. Often someone else will have the same problem.

          Comment

          Working...
          X