Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Obtaining a legend on a scatterplot

    Hello all,

    I'm trying to obtain a legend for a scatterplot to make the graph neater.

    Following this link - http://statadaily.com/2010/10/04/drawing-scatter-plots/, I attempted to obtain the legend, however, it does not work (I followed the code from the third graph).

    Is it perhaps because I'm trying to plot three separate graphs on a "two-way" graph?

    Code:
    tw (scatter agr_share_emp lnGDP_pc if (Year==1965 | Year==1970 | Year==1975 | Year ==1980 | Year==1985| Year==1990 | Year==1995 | Year==2000| Year==2005 | Year==2010) & Country=="ARG", mlabel(Year))
    tw (scatter agr_share_emp lnGDP_pc if (Year==1965 | Year==1970 | Year==1975 | Year ==1980 | Year==1985| Year==1990 | Year==1995 | Year==2000| Year==2005 | Year==2010) & Country=="ZMB", mlabel(Year))
    tw (scatter agr_share_emp lnGDP_pc if (Year==1965 | Year==1970 | Year==1975 | Year ==1980 | Year==1985| Year==1990 | Year==1995 | Year==2000| Year==2005 | Year==2010) & Country=="MYS", mlabel(Year)),
    legend(label(1 “Argentina”) label(2 “Zambia”) label(3 "Malaysia"))

  • #2
    Interesting question. Let's first trim down your syntax with some small tricks. We can put the years selection into a local macro.

    Code:
    local myyears (mod(Year, 5) == 0) & inrange(Year, 1965, 2010)
    The rule appears to be "every 5 years from 1965 to 2010". If you don't have intervening years such as 1966 and 2009 the inrange() condition will suffice.

    If we now consider separate plots, that could be

    Code:
    scatter agr_share_emp lnGDP_pc if `myyears' & inlist(Country, "ARG", "ZMB", "MYS"), mlabel(Year) by(Country)
    If you want a superimposed plot that could be

    Code:
    ssc inst sepscatter
    sepscatter agr_share_emp lnGDP_pc if `myyears' & inlist(Country, "ARG", "ZMB", "MYS"), mlabel(Year ..) mlabcolor(red blue black) sep(Country)
    The legend is then automatic.

    By the way, I would suggest two-digit labels "65" ... "95" "00" ... "10", which lose no information and will reduce the clutter on the graph. You can put such labels in a variable

    Code:
    gen mylabel = string(mod(Year, 100), "%02.0f")
    Posting the data would help people make other suggestions. See dataex from SSC.

    For the original announcement of sepscatter, see http://www.statalist.org/forums/foru...lable-from-ssc

    By the way, I would use GDP pc and then xsc(log). Numbers like 200 500 1000 2000 5000 10000 20000 on your axis will mean much, much more even to economists than natural logarithms such as 5 to 9. (I guess use of USD.) Using multiples of 2 5 10 is of course an old device to get "nice numbers" and approximately equal spacing on a log scale.
    Last edited by Nick Cox; 14 Aug 2015, 03:49.

    Comment


    • #3
      Hi Nick,

      Thanks for the comprehensive response.

      Just one question - how does one get the years to appear for all countries, not just ARG?

      I am referring to this piece of code:

      Code:
       
       sepscatter agr_share_emp lnGDP_pc if `myyears' & inlist(Country, "ARG", "ZMB", "MYS"), mlabel(Year ..) mlabcolor(red blue black) sep(Country)
      Click image for larger version

Name:	Graph.png
Views:	1
Size:	52.0 KB
ID:	1306185

      Comment


      • #4
        Please post some data as suggested to make it easier for me (us)....

        (It's detail, but you missed my suggestions on logarithmic scale and two-digit labels, added in edits.)

        Comment


        • #5
          Here is some data, as requested:

          Code:
          agr_share_emp                lnGDP_pc            Country                Year
          0.4344794            8.333673              ARG       1965
          0.3894863            8.455118              ARG       1970
          0.3389854            8.523973              ARG       1975
          0.2930621            8.586847              ARG       1980
          0.2876115            8.382429              ARG       1985
          0.2514319            8.286225              ARG       1990
          0.2241983            8.538069              ARG       1995
          0.1911284            8.607544              ARG       2000
          0.1750841            8.660036              ARG       2005
          0.1439953            8.89476 ARG       2010
          0.5033736            7.232221              MYS       1965
          0.4466213            7.455417              MYS       1970
          0.4001099            7.748571              MYS       1975
          0.3682338            7.866854              MYS       1980
          0.3356007            8.054242              MYS       1985
          0.2927735            8.377438              MYS       1990
          0.2724524            8.489184              MYS       1995
          0.2521025            8.622272              MYS       2000
          0.2264241            8.751317              MYS       2005
          0.6415154            6.974977              ZMB       1965
          0.6768489            6.891357              ZMB       1970
          0.6426947            6.839091              ZMB       1975
          0.5749   6.693619              ZMB       1980
          0.5954052            6.561144              ZMB       1985
          0.5962523            6.503304              ZMB       1990
          0.5899882            6.369982              ZMB       1995
          0.5658078            6.414368              ZMB       2000
          0.5340025            6.588106              ZMB       2005
          0.4968146            6.863708              ZMB       2010
          I will look at your suggestions now. Thanks.
          Last edited by Chris Rooney; 14 Aug 2015, 08:20.

          Comment


          • #6
            Thanks. Here are some suggestions:

            Code:
            clear
            input agr_share_emp  lnGDP_pc  str3  Country  Year
            0.4344794  8.333673  ARG 1965
            0.3894863  8.455118  ARG 1970
            0.3389854  8.523973  ARG 1975
            0.2930621  8.586847  ARG 1980
            0.2876115  8.382429  ARG 1985
            0.2514319  8.286225  ARG 1990
            0.2241983  8.538069  ARG 1995
            0.1911284  8.607544  ARG 2000
            0.1750841  8.660036  ARG 2005
            0.1439953  8.89476 ARG 2010
            0.5033736  7.232221  MYS 1965
            0.4466213  7.455417  MYS 1970
            0.4001099  7.748571  MYS 1975
            0.3682338  7.866854  MYS 1980
            0.3356007  8.054242  MYS 1985
            0.2927735  8.377438  MYS 1990
            0.2724524  8.489184  MYS 1995
            0.2521025  8.622272  MYS 2000
            0.2264241  8.751317  MYS 2005
            0.6415154  6.974977  ZMB 1965
            0.6768489  6.891357  ZMB 1970
            0.6426947  6.839091  ZMB 1975
            0.5749 6.693619  ZMB 1980
            0.5954052  6.561144  ZMB 1985
            0.5962523  6.503304  ZMB 1990
            0.5899882  6.369982  ZMB 1995
            0.5658078  6.414368  ZMB 2000
            0.5340025  6.588106  ZMB 2005
            0.4968146  6.863708  ZMB 2010
            end
            gen GDP_pc = exp(lnGDP_pc)
            gen mylabel = string(mod(Year, 100), "%02.0f")
            
            sepscatter agr_share GDP_pc, separate(Country) ///
            mlabel(mylabel mylabel mylabel) mlabpos(0 ..) mlabcolor(blue black orange) ms(none ..) ///
            xsc(log titlegap(*5)) xla(500 1000 2000 5000 10000) legend(off) ///
            text(.1 5000 "ARG", color(blue)) text(.5 2000 "MYS", color(black)) text(.7 750 "ZMB", color(orange))
            Click image for larger version

Name:	rooney.png
Views:	1
Size:	11.8 KB
ID:	1306218



            Notes:

            0. Two digit marker labels and x axis done the way I suggested.

            1. You need to spell out the marker label variable name every time you use the variable. That was wrong in my first reply, but is explicit in the documentation of marker label options. Under the hood, and temporarily, sepscatter is using as many y variables as there are separate groups.

            2. I don't see that you need markers as well as marker labels.

            3. I don't see that you need a legend when you can place text informatively.

            Of course, if this is just a sandbox and if the real graph shows say 30 countries not 3, this won't work.

            (I am using scheme s1color.)

            Comment

            Working...
            X