Running parallel loops

Ali Arya

Join Date: Nov 2024

Posts: 3
#1

Running parallel loops

17 Mar 2026, 21:45

Dear Statalist community,

I am encountering an issue when running parallel loops. Since my real dataset is confidential, I am demonstrating the problem using a mock dataset.

My goal is to generate a string variable, color_name, that contains the full color names based on an existing variable, color_code. The variable color_code includes some missing values, and I would like color_name to remain missing in those cases as well.

The problem is that when I run the code below, observations with missing values end up with "red" in color_name instead of missing. After examining the output more closely, I noticed that during the first iteration of the loop, color_name is set to "red" for all observations. In subsequent iterations, this value is correctly replaced for observations whose color_code appears in list1. However, for observations with missing values (id number 10) or those whose codes are not present in list1 (id number 9) the initial value "red" remains.

I was hoping to get some insight into why Stata behaves this way and how I might modify my code to avoid this issue.

I am using Stata version 18.0 on Windows 10.

Code:

clear set obs 10 gen id = _n input str12 color_code rd blk bl yl vl cy gr wt pnk end gen color_name = "" local list1 "rd" "blk" "bl" "yl" "vl" "cy" "gr" "wt" local list2 "red" "black" "blue" "yellow" "violet" "cyan" "green" "white" local n : word count `list1' di "`n'" forvalues i = 1/`n' { local x : word `i' of `list1' local y : word `i' of `list2' di "`x' " di " `y'" replace color_name = "`y'" if color_code == "`x'" list color_code color_name in 1/10 }
Tags: None
Joseph Coveney

Join Date: Apr 2014

Posts: 4540
#2

18 Mar 2026, 01:10

Originally posted by Ali Arya View Post

My goal is to generate a string variable, color_name, that contains the full color names based on an existing variable, color_code. The variable color_code includes some missing values, and I would like color_name to remain missing in those cases as well.

I'm not sure that parallel loops over arrays stored in local macros, looping over observations, is a particularly Stata-ish way of accomplishing your goal. You might be better with a join approach. Something like the following.

Code:

version 18 clear * /* set obs 10 gen id = _n */ input str12 color_code rd blk bl yl vl cy gr wt pnk "" end * * Begin here * frame create Colors frame Colors { input str12(color_code color_name) "rd" "red" "blk" "black" "bl" "blue" "yl" "yellow" "vl" "violet" "cy" "cyan" "gr" "green" "wt" "white" end isid color_code, sort } frlink m:1 color_code, frame(Colors) frget color_name, from(Colors) // Done list color_*, noobs separator(0) abbreviate(20) exit

You could also consider a value-label approach, but I'm not sure that it would save much coding.

Last edited by Joseph Coveney; 18 Mar 2026, 01:12.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36058
#3

18 Mar 2026, 05:22

The problem with the code in #1 is that the definitions of the local macros don't do what you want. Stata strips the outermost "" as delimiters and so each macro is created messed up.

Code:

. local list1 "rd" "blk" "bl" "yl" "vl" "cy" "gr" "wt" . di `"`list1'"' rd" "blk" "bl" "yl" "vl" "cy" "gr" "wt

This is nothing to do with loops in general or parallel loops in particular. I think you should get closer to what you want by omitting all the quotation marks in

Code:

local list1 "rd" "blk" "bl" "yl" "vl" "cy" "gr" "wt" local list2 "red" "black" "blue" "yellow" "violet" "cyan" "green" "white"

as each element is already a word in Stata's sense and the quotation marks are not needed at all.

Code:

local list1 rd blk bl yl vl cy gr wt local list2 red black blue yellow violet cyan green white

I rarely disagree with Joseph Coveney but on this occasion setting this up as a frame problem isn't needed at all.
Comment
Ali Arya

Join Date: Nov 2024

Posts: 3
#4

18 Mar 2026, 18:27

Thank you both. The problem was due to the incorrect use of quotation marks, and the suggestion in #3 fully resolved it.

I am just curious how I could go around it if I had two-word elements in list2, something like dark blue instead of blue.

Last edited by Ali Arya; 18 Mar 2026, 19:16.
1 like
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10482

19 Mar 2026, 10:17

Originally posted by Ali Arya View Post

I am just curious how I could go around it if I had two-word elements in list2, something like dark blue instead of blue.

You should be fine if you use compound double quotes correctly.

Code:

local list2 red black "dark blue" yellow violet cyan green white

forval i= 1/8{
    di  `" Word `i' is `:word `i' of `list2'' "'
}

Res.:

Code:

. forval i= 1/8{
  2. 
.     di  `" Word `i' is `:word `i' of `list2'' "'
  3. 
. }
 Word 1 is red 
 Word 2 is black 
 Word 3 is dark blue 
 Word 4 is yellow 
 Word 5 is violet 
 Word 6 is cyan 
 Word 7 is green 
 Word 8 is white 

.

Last edited by Andrew Musau; 19 Mar 2026, 10:24.

Comment

Joseph Coveney

Join Date: Apr 2014

Posts: 4540
#6

19 Mar 2026, 21:37

Originally posted by Andrew Musau View Post

You should be fine if you use compound double quotes correctly.

Perhaps, but

Code:

local list2 "fire engine red" black "dark blue" yellow violet cyan green "bone white" forval i= 1/8{ di `" Word `i' is `:word `i' of `list2'' "' }

Ali might attain his goal more quickly and with less head-scratching just using underscores for spaces in the local macro and then subinstr(color_name, "_", " ", .) on the generated variable afterward.

But this whole approach is liable to become unnecessarily involved, and I still recommend going with a more conventional foreign key . . . references approach that I suggested above.

I can see where Nick is coming from in that there is more up-front coding in my suggestion for what seems like such a simple task starting out. But he and I will need to respectfully disagree in that that investment in coding is able to pay dividends down the road, for example, here, where the unanticipated decision afterward to change blue to dark blue would be trivial to effect.

There are other advantages to implementing the lookup table as a separate, stand-alone object (whether in a frame as I illustrated above or in a separate dataset), for example, color_code-color_name tuples are naturally paired as Stata observations and are more easily kept in sync. The approach also facilitates good data-management practices in natural ways that parallel looping over macros doesn’t—entity integrity and referential integrity among them.

Yes, Ali can add code if he thinks of it to impose such integrity constraints after executing the parallel loop over the two local macros, but with frlink m:1 and merge m:1, entity integrity of color_code and referential integrity between main dataset and lookup table are built in, for example, they naturally flag to the unawares user that orphaned pnk observation lurking in Ali's example dataset above.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36058
#7

20 Mar 2026, 01:12

Everything hinges on how many more complications arise
in the real problem. I was addressing the problem posed in #1.
Comment

Announcement

Running parallel loops

Comment

Comment

Comment

Comment

Comment

Comment