Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Spaghetti plot with line sections color-coded according to binary variable and overlaid scatter

    Hello,

    I'm using Stata 18 with the following data:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(pid VISIT age outcome pre_therapy age_therapy_start)
     1 1 4.5863013   .2691433 1   6.02601
     1 3   5.99726 -.13580847 1   6.02601
     1 4  6.967124  -.5588566 0   6.02601
     1 5  7.972678 -.29172885 0   6.02601
     2 3 3.5561645  1.0334456 1 3.6084874
     2 4 4.6748633  .13152725 0 3.6084874
     3 1  4.210959   .5987669 1   5.68104
     3 2  5.158904   .6081526 1   5.68104
     3 3  5.621918   .7302112 1   5.68104
     4 3  4.060109 .024490165 1 4.1122518
     5 3  3.432877   1.504983 1  3.520876
     6 2  3.238356          . 1  4.073922
     6 3 4.0273223   -.972908 1  4.073922
     6 4   5.00274 -1.3019816 0  4.073922
     7 1  4.783562   .4694017 1   6.06434
     7 2  5.780822  .07471675 1   6.06434
     7 3  5.994521  .26519963 1   6.06434
     7 4  7.013661  .23719777 0   6.06434
     7 5  7.986339 .005793178 0   6.06434
     8 1  4.783562   1.251511 1   6.06434
     8 2  5.780822   .5124904 1   6.06434
     8 3  5.994521   .7227523 1   6.06434
     8 4  7.013661   .8358337 0   6.06434
     8 5  7.986339   .5839239 0   6.06434
     9 3  5.361644  -.4420079 1  5.423682
     9 4  6.434426  -.7953085 0  5.423682
    10 3  3.204918 -1.5974298 1  3.233402
    10 4  4.347945  -.8368508 0  3.233402
    11 1  3.457534  1.2626574 1         .
    11 2 4.5342464          . 1         .
    end
    My PI is asking me to make a figure with these data that I can't figure out how to make (I'm not sure it is even possible to make). Specifically, she is asking for a spaghetti plot that would look something like what is generated with
    Code:
    xtline outcome, t(age) i(pid) overlay legend(off)
    But, she would like the sections of the lines for each individual color-coded according to the variable pre_therapy, and then she would like a marker to appear on the line for each individual to indicate where they began therapy (age_therapy_start). Pre_therapy is coded 1 if the visit/outcome measure was completed pre_therapy, and coded 0 if after initiating therapy.

    I'm not sure how to even attempt to go about coloring different sections of the same line according to a binary variable, and I ran into trouble trying to overlay a scatter on a twoway line plot because my x variables aren't the same.

    Any help would be greatly appreciated! Or if anyone can tell me that this is impossible, so I can tell her that..

    Thanks,
    Meghan

  • #2
    I don't know whether this will be helpful or not. You can check an Mario A. Cleves' command -spaghetti-.
    Code:
    clear
    input float(pid VISIT age outcome pre_therapy age_therapy_start)
     1 1 4.5863013   .2691433 1   6.02601
     1 3   5.99726 -.13580847 1   6.02601
     1 4  6.967124  -.5588566 0   6.02601
     1 5  7.972678 -.29172885 0   6.02601
     2 3 3.5561645  1.0334456 1 3.6084874
     2 4 4.6748633  .13152725 0 3.6084874
     3 1  4.210959   .5987669 1   5.68104
     3 2  5.158904   .6081526 1   5.68104
     3 3  5.621918   .7302112 1   5.68104
     4 3  4.060109 .024490165 1 4.1122518
     5 3  3.432877   1.504983 1  3.520876
     6 2  3.238356          . 1  4.073922
     6 3 4.0273223   -.972908 1  4.073922
     6 4   5.00274 -1.3019816 0  4.073922
     7 1  4.783562   .4694017 1   6.06434
     7 2  5.780822  .07471675 1   6.06434
     7 3  5.994521  .26519963 1   6.06434
     7 4  7.013661  .23719777 0   6.06434
     7 5  7.986339 .005793178 0   6.06434
     8 1  4.783562   1.251511 1   6.06434
     8 2  5.780822   .5124904 1   6.06434
     8 3  5.994521   .7227523 1   6.06434
     8 4  7.013661   .8358337 0   6.06434
     8 5  7.986339   .5839239 0   6.06434
     9 3  5.361644  -.4420079 1  5.423682
     9 4  6.434426  -.7953085 0  5.423682
    10 3  3.204918 -1.5974298 1  3.233402
    10 4  4.347945  -.8368508 0  3.233402
    11 1  3.457534  1.2626574 1         .
    11 2 4.5342464          . 1         .
    end
    
    separate outcome, by(pre_therapy)
    graph drop _all
    xtline outcome, t(age) i(pid) overlay legend(off)
    sum pid, meanonly
    local nn=r(max)
    forvalue i=1(1)`nn' {
      local myg `myg' || conn outcome0 age if pid==`i', msymbol(T) lwidth(thick)
    }
    graph graph addplot `myg'
    Click image for larger version

Name:	foo.png
Views:	1
Size:	167.9 KB
ID:	1775752

    Comment


    • #3
      Thanks for this! I attempted with the full dataset and got this error:


      . forvalue i=1(1)`nn' {
      2. local myg `myg' || conn outcome0 age if pid==`i', msymbol(T) lwidth(thick)
      3. }
      macro substitution results in line that is too long
      The line resulting from substituting macros would be longer than allowed. The maximum allowed length is 645,216
      characters, which is calculated on the basis of set maxvar.

      You can change that in Stata/SE and Stata/MP. What follows is relevant only if you are using Stata/SE or
      Stata/MP.

      The maximum line length is defined as 16 more than the maximum macro length, which is currently 645,200
      characters. Each unit increase in set maxvar increases the length maximums by 129. The maximum value of set
      maxvar is 32,767. Thus, the maximum line length may be set up to 4,227,159 characters if you set maxvar to its
      largest value.


      Altogether I have 15 individuals (lines) I'm trying to plot.

      Comment


      • #4
        It's hard to follow why the command line would be so long with just 15 replications of the main idea.

        Here is my guess. Your real identifiers aren't to 1 to 15 at all. They are much more complicated. The number of commands you're writing down is driven by the value of local macro nn, which you don't show us, but I'll bet it's much larger than 15. So, you're writing code for identifiers that just don't exist in the data.

        You have at least two choices. You can map your real identifiers to 1 to 15 with say

        Code:
        egen PID = group(pid), label
        and then loop in terms of that new variable.

        Or you can loop over the values of pid that actually occur in the data. https://www.stata.com/support/faqs/d...-with-foreach/ is a general discussion. Method 3 does not apply to your problem.

        Comment


        • #5
          Apologies - the full code I used was what Chen had suggested above:

          Code:
          separate outcome, by(pre_therapy)
          graph drop _all
          xtline outcome, t(age) i(pid) overlay legend(off)
          sum pid, meanonly
          local nn=r(max)
          forvalue i=1(1)`nn' {
          local myg `myg' || conn outcome0 age if pid==`i', msymbol(T) lwidth(thick)
          }
          graph graph addplot `myg'
          Last edited by Meghan Bezerra; 11 Apr 2025, 06:50.

          Comment


          • #6
            However, you have more problems than one.

            Do you care about identifying individuals (modulo some anonymous identifier). If so you need a legend. If so, that is going to be a problem as well as a feature.

            In any case are 15 distinct colours going to be easy to show?

            Some variant on a front-to-back design may be helpful. (On front-and-back plots, see https://journals.sagepub.com/doi/pdf...6867X211025838 -- except that I fear you need different code.)

            I doubt I've understood all the details but this may show you a useful direction. I used linkplot from SSC. Extending to 15 should not be a major problem in itself.


            Code:
            forval j = 1/11 { 
                linkplot outcome age, link(pid) subtitle("`j'") lc(gs8) ms(none) addplot(line outcome age if pid == `j', lc(stc1) lw(thick) || scatter outcome age if pid == `j' & pre_therapy, ms(T) msize(medlarge) mcolor(stc1)) name(G`j', replace) legend(off)
                local G `G' G`j'
            }
            
            graph combine `G'
            Click image for larger version

Name:	linkplot.png
Views:	1
Size:	141.6 KB
ID:	1775781

            Comment


            • #7
              Thanks Nick! To your question about identifying individuals, no, I don't need to - we're just looking to color-code by the pre_therapy variable and put some marker in for where therapy started, according to the age_therapy_start variable.

              Comment

              Working...
              X