Help needed: Hide some marker labels and shade some marker labels red in a scatterplot

Wee Yang Ng

Join Date: Dec 2020
Posts: 56

Help needed: Hide some marker labels and shade some marker labels red in a scatterplot

16 Nov 2022, 21:43

Hi all,

I need help in my scatterplot. Below is my data:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input double(Trust_safety FixedCapital) str12 Country int Year
62.1                   . "China"        2018
   .                   . "Hong Kong"    2018
65.1                   . "India"        2018
65.9                   . "Indonesia"    2018
47.9                   . "Japan"        2018
56.5                   . "Malaysia"     2018
56.7                   . "Philippines"  2018
49.8                   . "Singapore"    2018
47.7                   . "ROK"          2018
35.1                   . "Thailand"     2018
67.4                   . "Vietnam"      2018
73.2  2.8343838429868544 "China"        2019
   .  1.2972031932934354 "Hong Kong"    2019
70.7 -1.1228850019365764 "India"        2019
69.5 -1.2269742825688812 "Indonesia"    2019
44.1   .5939266551266914 "Japan"        2019
63.6 -.07421756851016817 "Malaysia"     2019
  56 -1.1756676833233706 "Philippines"  2019
52.6  .44717951149067064 "Singapore"    2019
46.2   .6093922000991148 "ROK"          2019
57.3   -.956631552055974 "Thailand"     2019
66.5 -.39595896084634924 "Vietnam"      2019
55.7 -.06068384945470433 "Saudi Arabia" 2019
53.2   .4518415311414805 "Australia"    2019
59.8   .4508400249329308 "Canada"       2019
60.6 -.34168072936252025 "Chile"        2019
61.3  -.5915731420517907 "Mexico"       2019
59.2  .33182234205797173 "New Zealand"  2019
  60 -1.0703165310188145 "Peru"         2019
   .                   . ""                .
   .                   . ""                .
   .                   . ""                .
   .                   . ""                .
   .                   . ""                .
   .                   . ""                .
   .                   . ""                .
   .                   . ""                .
   .                   . ""                .
   .                   . ""                .
   .                   . ""                .
   .                   . ""                .
   .                   . ""                .
   .                   . ""                .
   .                   . ""                .
   .                   . ""                .
   .                   . ""                .
   .                   . ""                .
   .                   . ""                .
   .                   . ""                .
   .                   . ""                .
end

And below is my scatterplot:

Click image for larger version

Name: Graph_forum.png
Views: 1
Size: 74.5 KB
ID: 1689804

I would like to only display the marker labels for 10 countries and shade them in red: China, India, Indonesia, Japan, Malaysia, Philippines, Singapore, ROK, Thailand, Vietnam. I would like to hide the marker labels for those countries not included in the list.

These are my codes:

Code:


correlate Trust_safety FixedCapital  if Year == 2019 & Country != "Hong Kong" & Country!= "China" // Removing China improves fit
local r(rho): display %5.4f e(r(rho))
twoway scatter Trust_safety FixedCapital if Year == 2019 & Country != "Hong Kong" & Country!= "China" , mlabel(Country) || lfit Trust_safety FixedCapital if Year == 2019 & Country != "Hong Kong" & Country!= "China"  , note(correlation coefficient = `r(rho)') ytitle(Trust & Safety score) range(-2 1.5)    // with labels and linear fit, need to adjust the x-axis so that it can fit, y-axis label name too

Is it also possible to display the correlation coefficient in two decimal places in the note section of the scatterplot?

Thanks!

Tags: None

Hemanshu Kumar

Join Date: Mar 2015
Posts: 1411

17 Nov 2022, 00:39

Consider this:

Code:

correlate Trust_safety FixedCapital  if Year == 2019 & !inlist(Country,"Hong Kong","China")
local rho: display %3.2f r(rho)
sum FixedCapital if Year == 2019
local min = r(min)
local max = r(max)

#delimit ;
twoway             (scatter Trust_safety FixedCapital if Year == 2019 & Country!= "Hong Kong") ||
                (scatter Trust_safety FixedCapital if Year == 2019 & inlist(Country,"China", "India", "Indonesia", "Japan", "Malaysia", "Philippines")
                    | inlist(Country, "Singapore", "ROK", "Thailand", "Vietnam"), msymbol(i) mlabcolor(red) mlabel(Country))||
                (lfit Trust_safety FixedCapital if Year == 2019 & !inlist(Country,"Hong Kong","China"), range(`min' `max'))
                , note(Note: Line of best fit and correlation coefficient exclude China and Hong Kong)
                text(43 2.5 "{&rho} = `rho'")
                ytitle(Trust & Safety score)
                xtitle(Fixed Capital)
                legend(off)
                scheme(s2color)
                ;
#delimit cr

which produces:

Click image for larger version

Name: Screenshot 2022-11-17 at 1.13.17 PM.png
Views: 1
Size: 665.4 KB
ID: 1689819

Last edited by Hemanshu Kumar; 17 Nov 2022, 00:44.

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35726
#3

18 Nov 2022, 02:10

Various techniques can be competitive here, including

1. Not showing a marker at all, but just showing a marker label at the position the marker would have occupied.

2. Abbreviations that are one, two or three letters long can be informative and not too cryptic. Chemists were way ahead of most disciplines in standardizing on Na, Cu and Zn and all the rest for elements. In this case there are standard abbreviations of country names.

More, although not a complete analysis, at https://www.stata-journal.com/articl...article=gr0023
Comment

Wee Yang Ng

Join Date: Dec 2020
Posts: 56

22 Nov 2022, 05:02

Hi,

Thanks for the help! I have another problem:

Click image for larger version

Name: Graph_forum.png
Views: 1
Size: 123.4 KB
ID: 1690478

There seems to be double marker labels for some of the countries. I used the following code:

Code:

correlate Trust_safety SubscriptionDensityInternetU  if Year == 2019 & !inlist(Country,"Hong Kong")
local rho: display %3.2f r(rho)
sum SubscriptionDensityInternetU if Year == 2019
local min = r(min)
local max = r(max)

#delimit ;
twoway             (scatter Trust_safety SubscriptionDensityInternetU if Year == 2019 & Country!= "Hong Kong") ||
                (scatter Trust_safety SubscriptionDensityInternetU if Year == 2019 & inlist(Country,"China", "India", "Indonesia", "Japan", "Malaysia", "Philippines")
                    | inlist(Country, "Singapore", "ROK", "Thailand", "Vietnam"), msymbol(i) mlabcolor(red) mlabel(Country))||
                (lfit Trust_safety SubscriptionDensityInternetU if Year == 2019 & !inlist(Country,"Hong Kong"), range(`min' `max'))
                , note(Note: Line of best fit and correlation coefficient exclude Hong Kong )
                text(50 50 "{&rho} = `rho'")
                ytitle(Trust & Safety score)
                xtitle(Subscriptions density)
                legend(off)
                scheme(s2color)
                ;
#delimit cr

Below is the data I used for this chart:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input double(Trust_safety SubscriptionDensityInternetU) int Year str12 Country
62.1        59.2 2018 "China"       
   . 90.50739533 2018 "Hong Kong"   
65.1 20.08130004 2018 "India"       
65.9 39.90463864 2018 "Indonesia"   
47.9       91.28 2018 "Japan"       
56.5 81.20104862 2018 "Malaysia"    
56.7       46.88 2018 "Philippines" 
49.8 88.16563603 2018 "Singapore"   
47.7 96.02285958 2018 "ROK"         
35.1 56.81748093 2018 "Thailand"    
67.4 69.84792868 2018 "Vietnam"     
73.2 64.56912253 2019 "China"       
   . 91.74340039 2019 "Hong Kong"   
70.7          41 2019 "India"       
69.5 47.69064898 2019 "Indonesia"   
44.1 92.73039781 2019 "Japan"       
63.6 84.18714501 2019 "Malaysia"    
  56       46.88 2019 "Philippines" 
52.6 88.94925269 2019 "Singapore"   
46.2 96.15757918 2019 "ROK"         
57.3 66.65241946 2019 "Thailand"    
66.5 68.66158021 2019 "Vietnam"     
55.7    95.72474 2019 "Saudi Arabia"
53.2        88.6 2019 "Australia"   
59.8        96.5 2019 "Canada"      
60.6        86.1 2019 "Chile"       
61.3 70.06991047 2019 "Mexico"      
59.2        90.2 2019 "New Zealand" 
  60 59.95050106 2019 "Peru"        
   .           .    . ""            
   .           .    . ""            
   .           .    . ""            
   .           .    . ""            
   .           .    . ""            
   .           .    . ""            
   .           .    . ""            
   .           .    . ""            
   .           .    . ""            
   .           .    . ""            
   .           .    . ""            
   .           .    . ""            
   .           .    . ""            
   .           .    . ""            
   .           .    . ""            
   .           .    . ""            
   .           .    . ""            
   .           .    . ""            
   .           .    . ""            
   .           .    . ""            
   .           .    . ""            
end

I tried the code for other variables and the rest does not seem to have that problem. Do I need to modify the code?

Thank!

Last edited by Wee Yang Ng; 22 Nov 2022, 05:20.

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35726
#5

22 Nov 2022, 06:38

Every country named positively also qualifies as not being Hong Kong.
Comment

Hemanshu Kumar

Join Date: Mar 2015
Posts: 1411

22 Nov 2022, 06:49

The problem is actually with the code not telling Stata how to correctly evaluate OR and AND conditions, with the result that it is picking up some values from your 2018 data.

Just do this (the only difference is the brackets highlighted in red):

Code:

correlate Trust_safety SubscriptionDensityInternetU  if Year == 2019 & !inlist(Country,"Hong Kong")
local rho: display %3.2f r(rho)
sum SubscriptionDensityInternetU if Year == 2019
local min = r(min)
local max = r(max)

#delimit ;
twoway  (scatter Trust_safety SubscriptionDensityInternetU if Year == 2019 & Country!= "Hong Kong") ||
        (scatter Trust_safety SubscriptionDensityInternetU if Year == 2019 &
        (inlist(Country,"China", "India", "Indonesia", "Japan", "Malaysia", "Philippines") | inlist(Country, "Singapore", "ROK", "Thailand", "Vietnam"))
            , msymbol(i) mlabcolor(red) mlabel(Country)) ||
                (lfit Trust_safety SubscriptionDensityInternetU if Year == 2019 & !inlist(Country,"Hong Kong"), range(`min' `max'))
                , note(Note: Line of best fit and correlation coefficient exclude Hong Kong )
                text(50 50 "{&rho} = `rho'")
                ytitle(Trust & Safety score)
                xtitle(Subscriptions density)
                legend(off)
                scheme(s2color)
                ;
#delimit cr

Announcement

Help needed: Hide some marker labels and shade some marker labels red in a scatterplot

Comment

Comment

Comment

Comment

Comment