Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • drawing twoway scatter plot AND connecting lines BY groups?

    KakaoTalk_20240718_165144006.jpg

    The above is the way I want my graph to look like.

    gen percentage=.
    gen payment_method=.
    gen variation=.
    set obs 10

    replace percentage=0.16689679 in 1
    replace percentage=0.31108774 in 2
    replace percentage=0.18705977 in 3
    replace percentage=0.13920317 in 4
    replace percentage=0.19575253 in 5
    replace variation=0 in 1/5
    replace payment_method=1 in 1
    replace payment_method=2 in 2
    replace payment_method=3 in 3
    replace payment_method=4 in 4
    replace payment_method=5 in 5

    replace percentage=0.15781844 in 6
    replace percentage=0.31281248 in 7
    replace percentage=0.17682493 in 8
    replace percentage=0.16750586 in 9
    replace percentage=0.1850383 in 10
    replace variation=1 in 6/10
    replace payment_method=1 in 6
    replace payment_method=2 in 7
    replace payment_method=3 in 8
    replace payment_method=4 in 9
    replace payment_method=5 in 10

    list

    the above is how my dataset look like

    basically what I want to do is twoway connected percentage variation
    BUT I want the marker shape to vary on the payment_method and want to put legends on each marker shape

    so I tried:
    twoway (connected percentage variation if payment_method==1, msymbol(circle))(connected percentage variation payment_method==2, msymbol(triangle))(connected percentage variation payment_method==3, msymbol(diamond))(connected percentage variation payment_method==4, msymbol(square))(connected percentage variation payment_method==5, msymbol(X))

    but all I get was:
    == invalid name
    r(198);

    Does anyone have idea? Thank you in advance!

  • #2


    Thanks for the clear question. Your code had if payment_method == 1 as a good start; the problem was omitting the if in other cases. Stata failed at the first instance.

    Here is another way to do it. See https://journals.sagepub.com/doi/pdf...867X0500500412 or sepscatter from SSC. And another way to do it.

    In fact all of these also need something like

    Code:
    xlabel(0 "Before treatment" 1 "After treatment", noticks)
    (Is your "percentage" really a proportion or fraction?)

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(percentage payment_method variation)
     .1668968 1 0
     .3110877 2 0
    .18705977 3 0
    .13920318 4 0
    .19575253 5 0
    .15781844 1 1
     .3128125 2 1
    .17682493 3 1
    .16750586 4 1
     .1850383 5 1
    end
    
    set scheme stcolor
    
    twoway (connected percentage variation if payment_method==1, msymbol(circle)) ///
    (connected percentage variation if payment_method==2, msymbol(triangle))      ///
    (connected percentage variation if payment_method==3, msymbol(diamond))       ///
    (connected percentage variation if payment_method==4, msymbol(square))        ///
    (connected percentage variation if payment_method==5, msymbol(X) name(Park1, replace))
    
    separate percentage, by(payment_method) veryshortlabel
    twoway connected percentage? variation,  msymbol(circle triangle diamond square X) name(Park2, replace)
    
    local call
    
    forval j = 1/5 {
        local call `call' || scatter percentage`j' variation if variation == 1, ms(none) mlabsize(medlarge) mlabel(payment_method) mlabcolor(stc`j')
    }  
    
    twoway connected percentage? variation,  msymbol(circle triangle diamond square X) ///
    `call' legend(off) ytitle(percentage) name(Park3, replace) aspect(1)
    Click image for larger version

Name:	Park1.png
Views:	1
Size:	52.1 KB
ID:	1759079

    Click image for larger version

Name:	Park2.png
Views:	1
Size:	46.5 KB
ID:	1759080

    Attached Files
    Last edited by Nick Cox; 18 Jul 2024, 02:34.

    Comment


    • #3
      Nick Cox Thank you very much for your thorough explanation, it helped a lot and now I have the very graph that I need.
      Click image for larger version

Name:	Graph.png
Views:	2
Size:	59.8 KB
ID:	1759126


      The alternative means that you have provided is also very intuitive! I have been trying several alternatives according to your guide.
      I guess the only remaining problem is that I cannot set scheme stcolor but I have figured out that it is because my stata version is 17.
      I think I can check tomorrow if my university provides the newest version.

      Thank you again, have a nice day.

      Best regards
      Soeun Park
      Attached Files

      Comment


      • #4
        Here is another take. I used myaxis and fabplot from the Stata Journal.

        Code:
        SJ-21-3 st0654  . . Speaking Stata: Ordering or ranking groups of observations
                (help myaxis if installed)  . . . . . . . . . . . . . . . .  N. J. Cox
                Q3/21   SJ 21(3):818--837
                discusses procedures for datasets based on aggregate
                frequencies and for datasets based on individuals and
                introduce a new convenience command, myaxis, that handles
                many cases directly
        
        SJ-21-2 gr0087  . . Front-and-back plots to ease spaghetti and paella problems
                (help fabplot if installed) . . . . . . . . . . . . . . . .  N. J. Cox
                Q2/21   SJ 21(2):539--554
                explores front-and-back plots, in which each subset of data
                is shown separately with the other subsets as backdrop

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input float(percentage payment_method variation)
         .1668968 1 0
         .3110877 2 0
        .18705977 3 0
        .13920318 4 0
        .19575253 5 0
        .15781844 1 1
         .3128125 2 1
        .17682493 3 1
        .16750586 4 1
         .1850383 5 1
        end
        
        set scheme stcolor
        
        myaxis method=payment_method, sort(mean percentage) descending subset(variation==1)
        
        fabplot connected percentage variation, xsc(r(-0.2 1.2)) xla(0 "Before" 1 "After", tlc(none)) by(method) name(Park4, replace) aspect(1) frontopts(lw(thick) ms(O) msize(medlarge)) ytitle(proportion)

        The rationale for myaxis is that your methods 1 2 3 4 5 are likely to be in arbitrary order. Hence order by something else, such as result after treatment.

        Also, your methods presumably have names that in a full report would be used as a value labels.

        The rationale for fabplot is that results for most of your methods are hard to tell apart on any graph, with the exception of method 2. Even direct labelling might be hard to read with your full data if they are more extensive than what you show.

        I am taking further the idea that your "percentage" is really a proportion between 0 and 1. If uptake is only at best something like 0.3% all your methods look like failures.

        Click image for larger version

Name:	Park4.png
Views:	1
Size:	83.1 KB
ID:	1759130

        Comment


        • #5
          Reply to #3

          Thanks for positive comments.

          See https://www.statalist.org/forums/help#version for our convention that you're presumed to have access to the latest version of Stata if you don't state otherwise.

          For Stata 17 I recommend almost any scheme but the default s2color. I like s1color. Colours stc1 to stc5 aren't available as such in Stata 17 but can be specified directly. See
          https://www.statalist.org/forums/for...scheme-stcolor

          If you prefer a legend (I don't) consider whether you can improve on text such as Group 1 and consider adding order(2 5 3 4 1) as a legend() suboption.
          Last edited by Nick Cox; 18 Jul 2024, 06:23.

          Comment

          Working...
          X