Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help needed: Hide some marker labels and shade some marker labels red in a scatterplot

    Hi all,

    I need help in my scatterplot. Below is my data:



    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double(Trust_safety FixedCapital) str12 Country int Year
    62.1                   . "China"        2018
       .                   . "Hong Kong"    2018
    65.1                   . "India"        2018
    65.9                   . "Indonesia"    2018
    47.9                   . "Japan"        2018
    56.5                   . "Malaysia"     2018
    56.7                   . "Philippines"  2018
    49.8                   . "Singapore"    2018
    47.7                   . "ROK"          2018
    35.1                   . "Thailand"     2018
    67.4                   . "Vietnam"      2018
    73.2  2.8343838429868544 "China"        2019
       .  1.2972031932934354 "Hong Kong"    2019
    70.7 -1.1228850019365764 "India"        2019
    69.5 -1.2269742825688812 "Indonesia"    2019
    44.1   .5939266551266914 "Japan"        2019
    63.6 -.07421756851016817 "Malaysia"     2019
      56 -1.1756676833233706 "Philippines"  2019
    52.6  .44717951149067064 "Singapore"    2019
    46.2   .6093922000991148 "ROK"          2019
    57.3   -.956631552055974 "Thailand"     2019
    66.5 -.39595896084634924 "Vietnam"      2019
    55.7 -.06068384945470433 "Saudi Arabia" 2019
    53.2   .4518415311414805 "Australia"    2019
    59.8   .4508400249329308 "Canada"       2019
    60.6 -.34168072936252025 "Chile"        2019
    61.3  -.5915731420517907 "Mexico"       2019
    59.2  .33182234205797173 "New Zealand"  2019
      60 -1.0703165310188145 "Peru"         2019
       .                   . ""                .
       .                   . ""                .
       .                   . ""                .
       .                   . ""                .
       .                   . ""                .
       .                   . ""                .
       .                   . ""                .
       .                   . ""                .
       .                   . ""                .
       .                   . ""                .
       .                   . ""                .
       .                   . ""                .
       .                   . ""                .
       .                   . ""                .
       .                   . ""                .
       .                   . ""                .
       .                   . ""                .
       .                   . ""                .
       .                   . ""                .
       .                   . ""                .
       .                   . ""                .
    end
    And below is my scatterplot:

    Click image for larger version

Name:	Graph_forum.png
Views:	1
Size:	74.5 KB
ID:	1689804



    I would like to only display the marker labels for 10 countries and shade them in red: China, India, Indonesia, Japan, Malaysia, Philippines, Singapore, ROK, Thailand, Vietnam. I would like to hide the marker labels for those countries not included in the list.


    These are my codes:

    Code:
    
    correlate Trust_safety FixedCapital  if Year == 2019 & Country != "Hong Kong" & Country!= "China" // Removing China improves fit
    local r(rho): display %5.4f e(r(rho))
    twoway scatter Trust_safety FixedCapital if Year == 2019 & Country != "Hong Kong" & Country!= "China" , mlabel(Country) || lfit Trust_safety FixedCapital if Year == 2019 & Country != "Hong Kong" & Country!= "China"  , note(correlation coefficient = `r(rho)') ytitle(Trust & Safety score) range(-2 1.5)    // with labels and linear fit, need to adjust the x-axis so that it can fit, y-axis label name too

    Is it also possible to display the correlation coefficient in two decimal places in the note section of the scatterplot?


    Thanks!

  • #2
    Consider this:
    Code:
    correlate Trust_safety FixedCapital  if Year == 2019 & !inlist(Country,"Hong Kong","China")
    local rho: display %3.2f r(rho)
    sum FixedCapital if Year == 2019
    local min = r(min)
    local max = r(max)
    
    #delimit ;
    twoway             (scatter Trust_safety FixedCapital if Year == 2019 & Country!= "Hong Kong") ||
                    (scatter Trust_safety FixedCapital if Year == 2019 & inlist(Country,"China", "India", "Indonesia", "Japan", "Malaysia", "Philippines")
                        | inlist(Country, "Singapore", "ROK", "Thailand", "Vietnam"), msymbol(i) mlabcolor(red) mlabel(Country))||
                    (lfit Trust_safety FixedCapital if Year == 2019 & !inlist(Country,"Hong Kong","China"), range(`min' `max'))
                    , note(Note: Line of best fit and correlation coefficient exclude China and Hong Kong)
                    text(43 2.5 "{&rho} = `rho'")
                    ytitle(Trust & Safety score)
                    xtitle(Fixed Capital)
                    legend(off)
                    scheme(s2color)
                    ;
    #delimit cr
    which produces:
    Click image for larger version

Name:	Screenshot 2022-11-17 at 1.13.17 PM.png
Views:	1
Size:	665.4 KB
ID:	1689819

    Last edited by Hemanshu Kumar; 17 Nov 2022, 00:44.

    Comment


    • #3
      Various techniques can be competitive here, including

      1. Not showing a marker at all, but just showing a marker label at the position the marker would have occupied.

      2. Abbreviations that are one, two or three letters long can be informative and not too cryptic. Chemists were way ahead of most disciplines in standardizing on Na, Cu and Zn and all the rest for elements. In this case there are standard abbreviations of country names.


      More, although not a complete analysis, at https://www.stata-journal.com/articl...article=gr0023

      Comment


      • #4
        Hi,

        Thanks for the help! I have another problem:
        Click image for larger version

Name:	Graph_forum.png
Views:	1
Size:	123.4 KB
ID:	1690478



        There seems to be double marker labels for some of the countries. I used the following code:

        Code:
        correlate Trust_safety SubscriptionDensityInternetU  if Year == 2019 & !inlist(Country,"Hong Kong")
        local rho: display %3.2f r(rho)
        sum SubscriptionDensityInternetU if Year == 2019
        local min = r(min)
        local max = r(max)
        
        #delimit ;
        twoway             (scatter Trust_safety SubscriptionDensityInternetU if Year == 2019 & Country!= "Hong Kong") ||
                        (scatter Trust_safety SubscriptionDensityInternetU if Year == 2019 & inlist(Country,"China", "India", "Indonesia", "Japan", "Malaysia", "Philippines")
                            | inlist(Country, "Singapore", "ROK", "Thailand", "Vietnam"), msymbol(i) mlabcolor(red) mlabel(Country))||
                        (lfit Trust_safety SubscriptionDensityInternetU if Year == 2019 & !inlist(Country,"Hong Kong"), range(`min' `max'))
                        , note(Note: Line of best fit and correlation coefficient exclude Hong Kong )
                        text(50 50 "{&rho} = `rho'")
                        ytitle(Trust & Safety score)
                        xtitle(Subscriptions density)
                        legend(off)
                        scheme(s2color)
                        ;
        #delimit cr
        Below is the data I used for this chart:

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input double(Trust_safety SubscriptionDensityInternetU) int Year str12 Country
        62.1        59.2 2018 "China"       
           . 90.50739533 2018 "Hong Kong"   
        65.1 20.08130004 2018 "India"       
        65.9 39.90463864 2018 "Indonesia"   
        47.9       91.28 2018 "Japan"       
        56.5 81.20104862 2018 "Malaysia"    
        56.7       46.88 2018 "Philippines" 
        49.8 88.16563603 2018 "Singapore"   
        47.7 96.02285958 2018 "ROK"         
        35.1 56.81748093 2018 "Thailand"    
        67.4 69.84792868 2018 "Vietnam"     
        73.2 64.56912253 2019 "China"       
           . 91.74340039 2019 "Hong Kong"   
        70.7          41 2019 "India"       
        69.5 47.69064898 2019 "Indonesia"   
        44.1 92.73039781 2019 "Japan"       
        63.6 84.18714501 2019 "Malaysia"    
          56       46.88 2019 "Philippines" 
        52.6 88.94925269 2019 "Singapore"   
        46.2 96.15757918 2019 "ROK"         
        57.3 66.65241946 2019 "Thailand"    
        66.5 68.66158021 2019 "Vietnam"     
        55.7    95.72474 2019 "Saudi Arabia"
        53.2        88.6 2019 "Australia"   
        59.8        96.5 2019 "Canada"      
        60.6        86.1 2019 "Chile"       
        61.3 70.06991047 2019 "Mexico"      
        59.2        90.2 2019 "New Zealand" 
          60 59.95050106 2019 "Peru"        
           .           .    . ""            
           .           .    . ""            
           .           .    . ""            
           .           .    . ""            
           .           .    . ""            
           .           .    . ""            
           .           .    . ""            
           .           .    . ""            
           .           .    . ""            
           .           .    . ""            
           .           .    . ""            
           .           .    . ""            
           .           .    . ""            
           .           .    . ""            
           .           .    . ""            
           .           .    . ""            
           .           .    . ""            
           .           .    . ""            
           .           .    . ""            
           .           .    . ""            
           .           .    . ""            
        end

        I tried the code for other variables and the rest does not seem to have that problem. Do I need to modify the code?

        Thank!
        Last edited by Wee Yang Ng; 22 Nov 2022, 05:20.

        Comment


        • #5
          Every country named positively also qualifies as not being Hong Kong.

          Comment


          • #6
            The problem is actually with the code not telling Stata how to correctly evaluate OR and AND conditions, with the result that it is picking up some values from your 2018 data.

            Just do this (the only difference is the brackets highlighted in red):

            Code:
            correlate Trust_safety SubscriptionDensityInternetU  if Year == 2019 & !inlist(Country,"Hong Kong")
            local rho: display %3.2f r(rho)
            sum SubscriptionDensityInternetU if Year == 2019
            local min = r(min)
            local max = r(max)
            
            #delimit ;
            twoway  (scatter Trust_safety SubscriptionDensityInternetU if Year == 2019 & Country!= "Hong Kong") ||
                    (scatter Trust_safety SubscriptionDensityInternetU if Year == 2019 &
                    (inlist(Country,"China", "India", "Indonesia", "Japan", "Malaysia", "Philippines") | inlist(Country, "Singapore", "ROK", "Thailand", "Vietnam"))
                        , msymbol(i) mlabcolor(red) mlabel(Country)) ||
                            (lfit Trust_safety SubscriptionDensityInternetU if Year == 2019 & !inlist(Country,"Hong Kong"), range(`min' `max'))
                            , note(Note: Line of best fit and correlation coefficient exclude Hong Kong )
                            text(50 50 "{&rho} = `rho'")
                            ytitle(Trust & Safety score)
                            xtitle(Subscriptions density)
                            legend(off)
                            scheme(s2color)
                            ;
            #delimit cr

            Comment

            Working...
            X