Using loop(s) to plot, name and save multiple graphs

Meghan Bezerra

Join Date: Sep 2023
Posts: 18

Using loop(s) to plot, name and save multiple graphs

22 Sep 2023, 14:43

Hello,

I'm working with a dataset of 32,399 observations and 4 variables in Stata 17 on Windows 10.

I'd like to use a loop to produce, name, and save multiple graphs. I found this post: Using for loop to plot multiple graphs. - Statalist, but wasn't able to work out how to modify the code for use with my data.

My data are long with: 73 unique patient identifiers (PID); between 1 and 3 visits per patient; a time variable that runs from 0-180 minutes for each PID-visit combination; and a variable Y where each value is a measure of the variable at each of the 180 minutes. I hope the dataex below is sufficient, I was worried about including too many observations and making this post very unwieldy.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(PID visit time) double Y
4472 1  0 22.548
4472 1  1 28.894
4472 1  2  35.73
4472 1  3 38.848
4472 1  4 40.612
4472 1  5 41.891
4472 1  6 43.003
4472 1  7 44.068
4472 1  8 45.127
4472 1  9 46.198
4472 1 10 47.285
4472 1 11 44.999
4472 1 12 42.425
4472 1 13 42.182
4472 1 14   42.8
4472 1 15 43.738
4472 1 16 44.797
4472 1 17 45.904
4472 1 18 47.033
4472 1 19 48.173
4472 1 20 49.321
4472 1 21  53.73
4472 1 22 58.438
4472 1 23 60.935
4472 1 24 62.632
4472 1 25 64.046
4472 1 26 65.368
4472 1 27 66.668
4472 1 28 67.972
4472 1 29 69.289
4472 1 30 70.624
4472 1 31 68.578
4472 1 32 66.244
4472 1 33 66.243
4472 1 34 67.103
4472 1 35 68.282
4472 1 36 69.581
4472 1 37 70.928
4472 1 38 72.294
4472 1 39  73.67
4472 1 40 75.053
4472 1 41 76.442
4472 1 42 77.834
4472 1 43 79.231
4472 1 44 80.633
4472 1 45 82.038
4472 1 46 83.449
4472 1 47 84.863
4472 1 48 86.281
4472 1 49 87.704
end

I tried a nested loop with:

Code:

foreach i of varlist PID {
    foreach j in visit {
        twoway line Y time if PID == `i', ytitle(Y) title(`i' `j')
        graph save graph_`i'_`j', replace
    }
}

This gave me

Click image for larger version

Name: Graph.png
Views: 1
Size: 562.3 KB
ID: 1727949

When what I am after is the following for each PID at each visit, with the title being the PID and visit number:

Click image for larger version

Name: Graph2.png
Views: 1
Size: 91.7 KB
ID: 1727950

Any advice would be greatly appreciated! Thanks very much.

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30066
#2

22 Sep 2023, 15:32

Let's examine your code and see it the way Stata does.

Code:

foreach i of varlist PID {

I think what you mean to do here write a loop that will iterate over the different values of PID. But instead what you are doing is looping over a list of variables (that is, after all, what varlist means), and that list contains only one item: PID. Where do we reference local macro i in the code? Well, we do it in the -twoway line- command in its -if PID == `i'- clause. But since i is always PID, this becomes -if PID == PID-, and that is always true. So the -if- command has no effect at all and this means that all of the data in the entire data set gets plotted on a single graph.

Similarly

Code:

foreach j in visit {

which I imagine you intended to be a loop iterating over the different values of visit, is in fact a loop over just the single token "visit." So since both your outer (`i') an inner (`j') loops iterate only over a single item, the body of code inside the loop gets executed exactly 1 x 1 = 1 time, with `i' == "PID" and `j' == "visit". I'll also point out that you use `j' only as a decoration in this code: it is part of the title and part of the name of the graph--but it is not used in determining which part of the data set is to be included in the plot.

Now if, as best I can guess is the case, you really want a separate graph for each visit of each patient, the code would look like this:

Code:

levelsof PID, local(pids) foreach p of local pids { levelsof visit if PID == `p', local(visits) foreach v of local visits { twoway line Y time if PID == `p' & visit == `v', ytitle(Y) title(`p' `v') graph save graph_`p'_`v', replace } }

That said, I think you will regret doing this. You said that each patient has between 1 and 3 visits, and you have 73 patients. Consequently you will end up with somewhere between 73 and 219 separate graphs. I think it will prove an unmanageable task to deploy such a large set of graphs in any useful way. Imagine yourself having to sit through a slide show of all of these graphs. Or if you tried to combine all 73 as small panels in a combined graph, they would inevitably be too small to read. The usual ways of displaying multiple graphs fail at this scale. So I think you need to think about ways of reducing the number of graphs by combining or grouping together meaningful subsets of them.

The specifics of how you would debulk this unwieldy collection of graphs really depends on what kinds of questions you want to answer with them. But, for example, if the differences between what happens at each of the three visits is important, you could, instead of having a separate graph for each visit, put all of a single patient's visits on the same graph. Like this:

Code:

reshape wide Y, i(PID time) j(visit) forvalues i = 1/3 { label var Y`i' "Y (time = `i')" } levelsof PID, local(pids) foreach p of local pids { twoway line Y* time if PID == `p', ytitle(Y) title(`p') }

Or perhaps a more useful way to reduce the number of graphs would be to identify some meaningful groups of patients and put their graphs together in one multi-patient graph for each group.

Or consider data reduction: perhaps the mean values from some subsets of the patients are more helpful to examine visually than the individual curves.

There are many possibilities, depending on how you wish to use the graphs.
2 likes
Comment
Meghan Bezerra

Join Date: Sep 2023

Posts: 18
#3

26 Sep 2023, 10:41

Thank you Clyde! The code you suggested worked brilliantly and is exactly what I needed. Thanks also for the explanation of what's going on with the different lines of code, that's really helpful to someone new to Stata and new to using for loops.

You're absolutely right about the number of graphs, it's 165. But we only need it for internal purposes, to assess the presence of multiple peaks (the Y variable is a measure of insulin secretion).
Comment

Announcement

Using loop(s) to plot, name and save multiple graphs

Comment

Comment