Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using loop(s) to plot, name and save multiple graphs

    Hello,

    I'm working with a dataset of 32,399 observations and 4 variables in Stata 17 on Windows 10.

    I'd like to use a loop to produce, name, and save multiple graphs. I found this post: Using for loop to plot multiple graphs. - Statalist, but wasn't able to work out how to modify the code for use with my data.

    My data are long with: 73 unique patient identifiers (PID); between 1 and 3 visits per patient; a time variable that runs from 0-180 minutes for each PID-visit combination; and a variable Y where each value is a measure of the variable at each of the 180 minutes. I hope the dataex below is sufficient, I was worried about including too many observations and making this post very unwieldy.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(PID visit time) double Y
    4472 1  0 22.548
    4472 1  1 28.894
    4472 1  2  35.73
    4472 1  3 38.848
    4472 1  4 40.612
    4472 1  5 41.891
    4472 1  6 43.003
    4472 1  7 44.068
    4472 1  8 45.127
    4472 1  9 46.198
    4472 1 10 47.285
    4472 1 11 44.999
    4472 1 12 42.425
    4472 1 13 42.182
    4472 1 14   42.8
    4472 1 15 43.738
    4472 1 16 44.797
    4472 1 17 45.904
    4472 1 18 47.033
    4472 1 19 48.173
    4472 1 20 49.321
    4472 1 21  53.73
    4472 1 22 58.438
    4472 1 23 60.935
    4472 1 24 62.632
    4472 1 25 64.046
    4472 1 26 65.368
    4472 1 27 66.668
    4472 1 28 67.972
    4472 1 29 69.289
    4472 1 30 70.624
    4472 1 31 68.578
    4472 1 32 66.244
    4472 1 33 66.243
    4472 1 34 67.103
    4472 1 35 68.282
    4472 1 36 69.581
    4472 1 37 70.928
    4472 1 38 72.294
    4472 1 39  73.67
    4472 1 40 75.053
    4472 1 41 76.442
    4472 1 42 77.834
    4472 1 43 79.231
    4472 1 44 80.633
    4472 1 45 82.038
    4472 1 46 83.449
    4472 1 47 84.863
    4472 1 48 86.281
    4472 1 49 87.704
    end

    I tried a nested loop with:

    Code:
    foreach i of varlist PID {
        foreach j in visit {
            twoway line Y time if PID == `i', ytitle(Y) title(`i' `j')
            graph save graph_`i'_`j', replace
        }
    }

    This gave me


    Click image for larger version

Name:	Graph.png
Views:	1
Size:	562.3 KB
ID:	1727949


    When what I am after is the following for each PID at each visit, with the title being the PID and visit number:


    Click image for larger version

Name:	Graph2.png
Views:	1
Size:	91.7 KB
ID:	1727950


    Any advice would be greatly appreciated! Thanks very much.

  • #2
    Let's examine your code and see it the way Stata does.

    Code:
    foreach i of varlist PID {
    I think what you mean to do here write a loop that will iterate over the different values of PID. But instead what you are doing is looping over a list of variables (that is, after all, what varlist means), and that list contains only one item: PID. Where do we reference local macro i in the code? Well, we do it in the -twoway line- command in its -if PID == `i'- clause. But since i is always PID, this becomes -if PID == PID-, and that is always true. So the -if- command has no effect at all and this means that all of the data in the entire data set gets plotted on a single graph.

    Similarly
    Code:
    foreach j in visit {
    which I imagine you intended to be a loop iterating over the different values of visit, is in fact a loop over just the single token "visit." So since both your outer (`i') an inner (`j') loops iterate only over a single item, the body of code inside the loop gets executed exactly 1 x 1 = 1 time, with `i' == "PID" and `j' == "visit". I'll also point out that you use `j' only as a decoration in this code: it is part of the title and part of the name of the graph--but it is not used in determining which part of the data set is to be included in the plot.

    Now if, as best I can guess is the case, you really want a separate graph for each visit of each patient, the code would look like this:

    Code:
    levelsof PID, local(pids)
    foreach p of local pids {
        levelsof visit if PID == `p', local(visits)
        foreach v of local visits {
            twoway line Y time if PID == `p' & visit == `v', ytitle(Y) title(`p' `v')
            graph save graph_`p'_`v', replace
        }
    }
    That said, I think you will regret doing this. You said that each patient has between 1 and 3 visits, and you have 73 patients. Consequently you will end up with somewhere between 73 and 219 separate graphs. I think it will prove an unmanageable task to deploy such a large set of graphs in any useful way. Imagine yourself having to sit through a slide show of all of these graphs. Or if you tried to combine all 73 as small panels in a combined graph, they would inevitably be too small to read. The usual ways of displaying multiple graphs fail at this scale. So I think you need to think about ways of reducing the number of graphs by combining or grouping together meaningful subsets of them.

    The specifics of how you would debulk this unwieldy collection of graphs really depends on what kinds of questions you want to answer with them. But, for example, if the differences between what happens at each of the three visits is important, you could, instead of having a separate graph for each visit, put all of a single patient's visits on the same graph. Like this:
    Code:
    reshape wide Y, i(PID time) j(visit)
    forvalues i = 1/3 {
        label var Y`i' "Y (time = `i')"
    }
    levelsof PID, local(pids)
    foreach p of local pids {
        twoway line Y* time if PID == `p', ytitle(Y) title(`p')
    }
    Or perhaps a more useful way to reduce the number of graphs would be to identify some meaningful groups of patients and put their graphs together in one multi-patient graph for each group.

    Or consider data reduction: perhaps the mean values from some subsets of the patients are more helpful to examine visually than the individual curves.

    There are many possibilities, depending on how you wish to use the graphs.

    Comment


    • #3
      Thank you Clyde! The code you suggested worked brilliantly and is exactly what I needed. Thanks also for the explanation of what's going on with the different lines of code, that's really helpful to someone new to Stata and new to using for loops.

      You're absolutely right about the number of graphs, it's 165. But we only need it for internal purposes, to assess the presence of multiple peaks (the Y variable is a measure of insulin secretion).

      Comment

      Working...
      X