Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Running parallel loops

    Dear Statalist community,

    I am encountering an issue when running parallel loops. Since my real dataset is confidential, I am demonstrating the problem using a mock dataset.

    My goal is to generate a string variable, color_name, that contains the full color names based on an existing variable, color_code. The variable color_code includes some missing values, and I would like color_name to remain missing in those cases as well.

    The problem is that when I run the code below, observations with missing values end up with "red" in color_name instead of missing. After examining the output more closely, I noticed that during the first iteration of the loop, color_name is set to "red" for all observations. In subsequent iterations, this value is correctly replaced for observations whose color_code appears in list1. However, for observations with missing values (id number 10) or those whose codes are not present in list1 (id number 9) the initial value "red" remains.

    I was hoping to get some insight into why Stata behaves this way and how I might modify my code to avoid this issue.

    I am using Stata version 18.0 on Windows 10.

    Code:
    clear
    set obs 10
    gen id = _n
    input str12 color_code
    rd
    blk
    bl
    yl
    vl
    cy
    gr
    wt
    pnk
    end
    gen color_name = ""
    local list1 "rd" "blk" "bl" "yl" "vl" "cy" "gr" "wt"
    local list2 "red" "black" "blue" "yellow" "violet" "cyan" "green" "white"
    local n : word count `list1'
    di "`n'"
    forvalues i = 1/`n' {
        local x : word `i' of `list1'
        local y : word `i' of `list2'
        di "`x' " 
        di " `y'" 
        replace color_name = "`y'" if color_code == "`x'"
        list color_code color_name in 1/10
    }

  • #2
    Originally posted by Ali Arya View Post
    My goal is to generate a string variable, color_name, that contains the full color names based on an existing variable, color_code. The variable color_code includes some missing values, and I would like color_name to remain missing in those cases as well.
    I'm not sure that parallel loops over arrays stored in local macros, looping over observations, is a particularly Stata-ish way of accomplishing your goal. You might be better with a join approach. Something like the following.
    Code:
    version 18
    
    clear *
    
    /* set obs 10
    gen id = _n */
    
    input str12 color_code
    rd
    blk
    bl
    yl
    vl
    cy
    gr
    wt
    pnk
    ""
    end
    
    *
    * Begin here
    *
    frame create Colors
    frame Colors {
    
        input str12(color_code color_name)
        "rd" "red"
        "blk" "black"
        "bl" "blue"
        "yl" "yellow"
        "vl" "violet"
        "cy" "cyan"
        "gr" "green"
        "wt" "white"
        end
    
        isid color_code, sort
    }
    
    frlink m:1 color_code, frame(Colors)
    frget color_name, from(Colors)
    
    // Done
    
    list color_*, noobs separator(0) abbreviate(20)
     
    exit
    You could also consider a value-label approach, but I'm not sure that it would save much coding.
    Last edited by Joseph Coveney; 18 Mar 2026, 01:12.

    Comment


    • #3
      The problem with the code in #1 is that the definitions of the local macros don't do what you want. Stata strips the outermost "" as delimiters and so each macro is created messed up.

      Code:
      .  local list1 "rd" "blk" "bl" "yl" "vl" "cy" "gr" "wt"
      
      . di `"`list1'"'
      rd" "blk" "bl" "yl" "vl" "cy" "gr" "wt
      This is nothing to do with loops in general or parallel loops in particular. I think you should get closer to what you want by omitting all the quotation marks in

      Code:
      local list1 "rd" "blk" "bl" "yl" "vl" "cy" "gr" "wt"
      
      local list2 "red" "black" "blue" "yellow" "violet" "cyan" "green" "white"
      as each element is already a word in Stata's sense and the quotation marks are not needed at all.

      Code:
      local list1 rd blk bl yl vl cy gr wt
      
      local list2 red black blue yellow violet cyan green white


      I rarely disagree with Joseph Coveney but on this occasion setting this up as a frame problem isn't needed at all.

      Comment


      • #4
        Thank you both. The problem was due to the incorrect use of quotation marks, and the suggestion in #3 fully resolved it.

        I am just curious how I could go around it if I had two-word elements in list2, something like dark blue instead of blue.
        Last edited by Ali Arya; 18 Mar 2026, 19:16.

        Comment


        • #5
          Originally posted by Ali Arya View Post
          I am just curious how I could go around it if I had two-word elements in list2, something like dark blue instead of blue.
          You should be fine if you use compound double quotes correctly.

          Code:
          local list2 red black "dark blue" yellow violet cyan green white
          
          forval i= 1/8{
              di  `" Word `i' is `:word `i' of `list2'' "'
          }
          Res.:

          Code:
          . forval i= 1/8{
            2. 
          .     di  `" Word `i' is `:word `i' of `list2'' "'
            3. 
          . }
           Word 1 is red 
           Word 2 is black 
           Word 3 is dark blue 
           Word 4 is yellow 
           Word 5 is violet 
           Word 6 is cyan 
           Word 7 is green 
           Word 8 is white 
          
          .
          Last edited by Andrew Musau; 19 Mar 2026, 10:24.

          Comment


          • #6
            Originally posted by Andrew Musau View Post
            You should be fine if you use compound double quotes correctly.
            Perhaps, but
            Code:
            local list2 "fire engine red" black "dark blue" yellow violet cyan green "bone white"
            
            forval i= 1/8{
                di  `" Word `i' is `:word `i' of `list2'' "'
            }
            Ali might attain his goal more quickly and with less head-scratching just using underscores for spaces in the local macro and then subinstr(color_name, "_", " ", .) on the generated variable afterward.

            But this whole approach is liable to become unnecessarily involved, and I still recommend going with a more conventional foreign key . . . references approach that I suggested above.

            I can see where Nick is coming from in that there is more up-front coding in my suggestion for what seems like such a simple task starting out. But he and I will need to respectfully disagree in that that investment in coding is able to pay dividends down the road, for example, here, where the unanticipated decision afterward to change blue to dark blue would be trivial to effect.

            There are other advantages to implementing the lookup table as a separate, stand-alone object (whether in a frame as I illustrated above or in a separate dataset), for example, color_code-color_name tuples are naturally paired as Stata observations and are more easily kept in sync. The approach also facilitates good data-management practices in natural ways that parallel looping over macros doesn’t—entity integrity and referential integrity among them.

            Yes, Ali can add code if he thinks of it to impose such integrity constraints after executing the parallel loop over the two local macros, but with frlink m:1 and merge m:1, entity integrity of color_code and referential integrity between main dataset and lookup table are built in, for example, they naturally flag to the unawares user that orphaned pnk observation lurking in Ali's example dataset above.

            Comment


            • #7
              Everything hinges on how many more complications arise
              in the real problem. I was addressing the problem posed in #1.

              Comment

              Working...
              X