Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Looping over a list of phrases

    I have a variable called color which records the fine-grained color of each observation. I would like to introduce a new variable called coarse-grained color that will have values that group together various specific colors within the same general color category e.g. different shades of green.

    I am a beginner Stata user but based on a few posts I've read on this forum, I tried the following code:

    Code:
    generate coarse_grained_color = ""
    
    local green_list `" "light green" "green, olive" "green - dark" "lime green" "'
    * note I also tried removing the outer pair of double quotes since that seemed to be making this just one big string but that did not work either
    
    foreach v of local green_list{
        replace coarse_grained_color = "green" if fine_grained_color == "`v'"
    }
    I am trying to assign the variable coarse-grained_color the value green whenever it's fine-grained color is one of the values in the macro green_list. Please suggest a way that I can do this. When I tried my code, I don't get any errors but no changes seem to occur. In particular, the new variable coarse-grained color is still blank. thank you very much

  • #2
    I cannot reproduce your code. It runs fine for me:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str12 fine_grained_color
    "light green"
    "green, olive"
    "green - dark"
    "lime green"
    "royal blue"  
    "navy blue"  
    end
    
    generate coarse_grained_color = ""
    
    local green_list `" "light green" "green, olive" "green - dark" "lime green" "'
    * note I also tried removing the outer pair of double quotes since that seemed to be making this just one big string but that did not work either
    
    foreach v of local green_list{
        replace coarse_grained_color = "green" if fine_grained_color == "`v'"
    }
    
    list
    which gives me
    Code:
         +-------------------------+
         | fine_grain~r   coarse~r |
         |-------------------------|
      1. |  light green      green |
      2. | green, olive      green |
      3. | green - dark      green |
      4. |   lime green      green |
      5. |   royal blue            |
         |-------------------------|
      6. |    navy blue            |
         +-------------------------+
    There is nothing wrong with the code. I suspect that the problem is with your data. For example, if there are spelling errors, or extraneous leading, trailing or internal blanks, or differences in capitalization, then you will not get matches to your list. So you might want to clean up the data first with:

    Code:
    replace coarse_grained_color = lower(trim(itrim(coarse_grained_color)))
    That will assure that everything is lower case and that all extraneous blanks are removed. Of course, that won't help you with spelling errors. Similarly, if you have "green, dark" in the data, that will not match with "green - dark" in your list. So you might also want to go through the fine_grained_color variable and also either purge all punctuation or enforce uniform use of punctuation. (If you do that, make sure that the punctuation in your list is likewise either purged or made consistent.)

    Comment


    • #3
      Also note that in this particular example, this could also be simplified with:
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str12 fine_grained_color
      "light green"
      "green, olive"
      "green - dark"
      "lime green"
      "royal blue"  
      "navy blue"  
      end
      
      generate coarse_grained_color = ""
      replace coarse_grained_color = "green" if strpos(fine_grained_color, "green")
      And to catch different capitalization:
      Code:
      replace coarse_grained_color = "green" if strpos(fine_grained_color, "green") | strpos(fine_grained_color, "Green") | strpos(fine_grained_color, "GREEN")

      Comment


      • #4
        FWIW, I considered the solution offered in #3 as well. I decided against it because it might not be fully general. For example, "cerulean" or "turquoise" might be a fine_grained_color corresponding to blue, or "aquamarine" to green, "crimson" to red, etc.

        Comment


        • #5
          Thank you for your replies. One follow-up question. If I have a list of strings in local macro like
          Code:
          local green_list `" "light green" "green, olive" "green - dark" "lime green" "'
          but it's very long is there a way I can wrap it so it's on multiple lines? The usual triple slash trick does not work here. Thanks

          Comment


          • #6
            Check the help for -delimit-.

            Comment


            • #7
              Hm I'm not sure it is a problem with the data. So here's my real code which executes but does not appear to make any changes to the data. It does not say anything about changes being made and when I go into the specialty_group variable it is still blank.

              Code:
              local anesthesiology_list `" "Anesthesiology" "Anesthesiology, Critical Care Medicine" "Anesthesiology, Pain Medicine" "'
               foreach v of local anesthesiology_list{
                  replace specialty_group = "anesthesiology group" if auth_spec_1 == "`v'"
              }
              However, when I execute e.g.
              Code:
              replace specialty_group = "anesthesiology group" if auth_spec_1 == "Anesthesiology, Critical Care Medicine"
              the proper changes are made it to the data. In particular, specialty group is assigned the value "anesthesiology group" when auth_spec_1 is equal to "Anesthesiology, Critical Care Medicine"
              So it seems to me that the problem may be with something about the loop or the macro. Is that possible? thanks
              Last edited by Alex Boche; 25 Dec 2018, 00:53.

              Comment


              • #8
                This works for me.
                Code:
                clear
                input str50 auth_spec_1
                "Anesthesiology"
                "Anesthesiology, Critical Care Medicine"
                "Anesthesiology, Pain Medicine"
                "Surgery"
                end
                generate str specialty_group = ""
                local anesthesiology_list `" "Anesthesiology" "Anesthesiology, Critical Care Medicine" "Anesthesiology, Pain Medicine" "'
                foreach v of local anesthesiology_list{
                    replace specialty_group = "anesthesiology group" if auth_spec_1 == "`v'"
                }
                list, clean noobs
                Code:
                . generate str specialty_group = ""
                (4 missing values generated)
                
                . local anesthesiology_list `" "Anesthesiology" "Anesthesiology, Critical Care Medicine" "Anesthe
                > siology, Pain Medicine" "'
                
                . foreach v of local anesthesiology_list{
                  2.     replace specialty_group = "anesthesiology group" if auth_spec_1 == "`v'"
                  3. }
                variable specialty_group was str1 now str20
                (1 real change made)
                (1 real change made)
                (1 real change made)
                
                . list, clean noobs
                
                                               auth_spec_1        specialty_group  
                                            Anesthesiology   anesthesiology group  
                    Anesthesiology, Critical Care Medicine   anesthesiology group  
                             Anesthesiology, Pain Medicine   anesthesiology group  
                                                   Surgery                        
                
                .
                Last edited by William Lisowski; 25 Dec 2018, 06:46.

                Comment


                • #9
                  Here's my best guess for the divergent results between Alex Boche's two approaches in #7 and William Lisowski's. It's only a guess, but remember that local macros have limited scope. In particular, if you execute a do file one line at a time, or highlighting a section and running just the section, any macros defined in that line or section disappear once the execution of that line or section finishes. So if Alex runs the line defining local macro anesthesiology_list by itself, the macro is defined and then immediately goes out of existence. When Alex then tries to run the loop, the undefined local macro anesthesiology_list is interpreted by Stata as an empty string. So there is nothing to iterate over, and the loop does nothing at all. If this is what was done, the solution is to simply run the entire block of code, i.e. both the definition of the local macro and the loop over it, in one pass.

                  Comment


                  • #10
                    I believe Clyde Schechter has identified the problem, which concerns scope. thanks. So is it correct that I can just replace "local" with "global" and everything should work okay? Seems to work as far as I can see. Happy holidays folks!

                    Comment


                    • #11
                      In the code as presented in post #7, you would be better advised to run the entire code block at the same time, as Clyde recommended. There are good reasons for not making use of global macros if you can avoid doing so, and good reasons to get used to the limitations on local macros.

                      To the extent that the real code you are running differs from the real code you presented in post #7, globals might be necessary. '

                      One takeaway from this is in presenting code you claim gives you problems, you should be sure to follow the advice of the Statalist FAQ and copy both the code and Stata's output from your Results window. The appearance of the "do file" messages would have made your problem readily apparent.

                      Comment


                      • #12
                        Yes, that will "work." But global macros are inherently unsafe. They never go out of scope until you shut Stata down. So if you have any other program running that uses a global macro having the same name, they will clash and you may end up overwriting the global macro and breaking the other program, or it may overwrite the reference in your current program and break that. Moreover, given that programs may call other programs, you never really know for sure if you have a program using a same-named macro going. And even if you were to carefully check for that, your global initialized now may interfere with a program that you call on later!

                        So if you are going to use a global macro, at least give it some totally bizarre name that is unlikely to clash with any other global macros that are or will be in the system. If you do that, you reduce the chance of a clash. If you do encounter this problem, however, I promise you that debugging it will be one of the worst experiences of your coding life, and you will never want to use a global macro again, ever. My advice, which I have posted frequently on this forum, is never to use a global macro unless there is absolutely no viable alternative. To give you a sense of that, I have been using Stata since 1994 and have used a global macro in my code only once.

                        This particular situation certainly does not justify the use of a global macro. Just run the code that defines the local macro and the code that refers to it in one fell swoop. If they happen to be separated from each other by other code that you do not want to run in between, just comment out the intervening code. These are easy workarounds to the scope issues and are much better solutions than using global macros.

                        Comment

                        Working...
                        X