Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple lines in graph - odd lines going back and forth (explanation in text body)

    Dear Statalists,

    I am trying to plot how the population ranks of a number of cities in the U.S. changed over time.

    The following is an extract of my data that clarifies the problem. The full data set consists of more than two cities but the problem can be reproduced using only two cities.
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int(city year) float(city_rank_ top20_1950 top20_2010 top20_always)
    4010 1950 26 0 1 0
    4010 1960 22 0 1 0
    4010 1970 17 0 1 0
    4010 1980 14 0 1 0
    4010 1990 18 0 1 0
    4010 2000 18 0 1 0
    4010 2010 20 0 1 0
    4610 1950  1 1 1 1
    4610 1960  1 1 1 1
    4610 1970  1 1 1 1
    4610 1980  1 1 1 1
    4610 1990  1 1 1 1
    4610 2000  1 1 1 1
    4610 2010  1 1 1 1
    end
    label values city city_lbl
    label def city_lbl 4010 "Memphis, TN", modify
    label def city_lbl 4610 "New York, NY", modify
    This is my state code: Essentially, it draws a line for each city and how its population rank developed over time. The lines are in red if the city is in the 20 most populated city and grey if its in the top 50.
    Code:
            twoway line city_rank_ year if top20_1950==0, mlabel(city) msize(0)  lcolor(gray)  ||  line city_rank_ year if top20_1950==1, mlabel(city) msize(0) lcolor(red) ||  scatter city_rank_ year if top20_1950==1 & year==1950, mlabel(city) msize(0) mlabp(9) mlabs(2) mlabc(black) ||  scatter city_rank_ year if top20_1950==1 & top20_always==1 & year==1950, mlabel(city) msize(0) mlabp(9) mlabs(2) mlabc(red)  legend(off)
    And this is the resulting graph:
    Click image for larger version

Name:	graph3.png
Views:	1
Size:	18.3 KB
ID:	1381994



    The somewhat obvious problem is that my code plots the line for Memphis (at the top) as well as the line for New York (at the bottom) but because New York directly follows Memphis in the alphabet (in my data set), there is also some connection to the line.

    Either there is something I can write differently in my code, or I need to reshape the data in some way before (or both) such that stata realizes that the lines for Memphis and the lines for New York are (or should be!) separate entities. If anyone could give me a hint as to how I can tackle the problem, I would appreciate it a lot.

    Thank you for reading my post.

    Best wishes,
    Milan
    Last edited by Milan Quentel; 04 Apr 2017, 08:53. Reason: Added tags

  • #2
    I don't get the graph you show with the code you give. I get a graph without a marker label for Memphis and that's because no code produces that. Here is my slightly revised version of your code. The main change was to add another scatter call. All other changes were just to make the code a little easier for me to read.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int(city year) float(city_rank_ top20_1950 top20_2010 top20_always)
    4010 1950 26 0 1 0
    4010 1960 22 0 1 0
    4010 1970 17 0 1 0
    4010 1980 14 0 1 0
    4010 1990 18 0 1 0
    4010 2000 18 0 1 0
    4010 2010 20 0 1 0
    4610 1950  1 1 1 1
    4610 1960  1 1 1 1
    4610 1970  1 1 1 1
    4610 1980  1 1 1 1
    4610 1990  1 1 1 1
    4610 2000  1 1 1 1
    4610 2010  1 1 1 1
    end
    label values city city_lbl
    label def city_lbl 4010 "Memphis, TN", modify
    label def city_lbl 4610 "New York, NY", modify
    
    local mopts msize(0) mlabp(9) mlabs(2) mlabel(city)
    
     twoway line city_rank_ year if top20_1950==0, ///
     mlabel(city) msize(0)  lcolor(gray)           ///
     ||  scatter city_rank_ year if top20_1950==0 & year==1950, `mopts' mlabc(grey)  ///
     ||  line city_rank_ year if top20_1950==1, mlabel(city) msize(0) lcolor(red) ///
     ||  scatter city_rank_ year if top20_1950==1 & year==1950, `mopts' mlabc(black) ///
     ||  scatter city_rank_ year if top20_1950==1 & top20_always==1 & year==1950, `mopts' mlabc(red)  legend(off)
    Click image for larger version

Name:	milan.png
Views:	1
Size:	14.8 KB
ID:	1382005


    I'd add that ysc(reverse) would put New York at the top, where New Yorkers feel most comfortable.

    Comment


    • #3
      Thank you, Nick Cox, this is as always very helpful. Do you know why the scatter changes the result? Using your data my data excerpt and your code I get the same graph as you which is very good. However, when I use the same code for all 50 cities, I get these odd cross-lines again.

      These are the first 10 cities (alphabetically) + your adapted version of code:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input int(city year) float(city_rank_ top20_1950 top20_2010 top20_always)
        10 1950 39 0 0 0
        10 1960 44 0 0 0
        10 1970 51 0 0 0
        10 1980 51 0 0 0
        10 1990 51 0 0 0
        10 2000 51 0 0 0
        10 2010 51 0 0 0
        70 1950 51 0 0 0
        70 1960 51 0 0 0
        70 1970 51 0 0 0
        70 1980 43 0 0 0
        70 1990 38 0 0 0
        70 2000 35 0 0 0
        70 2010 32 0 0 0
       290 1950 51 0 0 0
       290 1960 51 0 0 0
       290 1970 51 0 0 0
       290 1980 51 0 0 0
       290 1990 51 0 0 0
       290 2000 51 0 0 0
       290 2010 50 0 0 0
       350 1950 33 0 0 0
       350 1960 24 0 0 0
       350 1970 27 0 0 0
       350 1980 29 0 0 0
       350 1990 36 0 0 0
       350 2000 39 0 0 0
       350 2010 40 0 0 0
       490 1950 51 0 1 0
       490 1960 51 0 1 0
       490 1970 51 0 1 0
       490 1980 41 0 1 0
       490 1990 27 0 1 0
       490 2000 16 0 1 0
       490 2010 14 0 1 0
       530 1950  6 1 0 0
       530 1960  6 1 0 0
       530 1970  7 1 0 0
       530 1980 10 1 0 0
       530 1990 12 1 0 0
       530 2000 17 1 0 0
       530 2010 21 1 0 0
       770 1950 34 0 0 0
       770 1960 36 0 0 0
       770 1970 47 0 0 0
       770 1980 49 0 0 0
       770 1990 51 0 0 0
       770 2000 51 0 0 0
       770 2010 51 0 0 0
       810 1950 10 1 0 0
       810 1960 13 1 0 0
       810 1970 16 1 0 0
       810 1980 20 1 0 0
       810 1990 20 1 0 0
       810 2000 20 1 0 0
       810 2010 22 1 0 0
       890 1950 15 1 0 0
       890 1960 20 1 0 0
       890 1970 28 1 0 0
       890 1980 38 1 0 0
       890 1990 49 1 0 0
       890 2000 51 1 0 0
       890 2010 51 1 0 0
      1090 1950 51 0 1 0
      1090 1960 51 0 1 0
      1090 1970 51 0 1 0
      1090 1980 46 0 1 0
      1090 1990 35 0 1 0
      1090 2000 26 0 1 0
      1090 2010 17 0 1 0
      end
      label values city city_lbl
      label def city_lbl 10 "Akron, OH", modify
      label def city_lbl 70 "Albuquerque, NM", modify
      label def city_lbl 290 "Arlington, TX", modify
      label def city_lbl 350 "Atlanta, GA", modify
      label def city_lbl 490 "Austin, TX", modify
      label def city_lbl 530 "Baltimore, MD", modify
      label def city_lbl 770 "Birmingham, AL", modify
      label def city_lbl 810 "Boston, MA", modify
      label def city_lbl 890 "Buffalo, NY", modify
      label def city_lbl 1090 "Charlotte, NC", modify
      
              
              local mopts msize(0) mlabp(9) mlabs(2) mlabel(city)
      
       twoway line city_rank_ year if top20_1950==0, ///
       mlabel(city) msize(0)  lcolor(gray)           ///
       ||  scatter city_rank_ year if top20_1950==0 & year==1950, `mopts' mlabc(grey)  ///
       ||  line city_rank_ year if top20_1950==1, mlabel(city) msize(0) lcolor(red) ///
       ||  scatter city_rank_ year if top20_1950==1 & year==1950, `mopts' mlabc(black) ///
       ||  scatter city_rank_ year if top20_1950==1 & top20_always==1 & year==1950, `mopts' mlabc(red)  legend(off)
      And this is the graph I get for the first 10 cities (with the same problem as before):

      Click image for larger version

Name:	70.png
Views:	1
Size:	30.7 KB
ID:	1382025


      However, when I look at the full data, the problem seems slightly different: Now every city is connected with the top and I cannot see why.

      Click image for larger version

Name:	full.png
Views:	1
Size:	94.7 KB
ID:	1382026


      I find this confusing. Is it okay, if I post the full data here with dataex? That would be 480 observations. Has anyone had the problem before and can give me some indication as to why this is occurring?

      Best wishes,
      Milan

      Comment


      • #4
        I think you need the option c(L): see

        Code:
        help connectstyle
        Stata is doing what you ask: drawing a line plot and connecting according to the current order of observations. You don't want that.

        Comment


        • #5
          Spot on! Specifying c(L) did the job. Thank you very much.

          Comment

          Working...
          X