Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Macros, quotes, and looping

    I am trying to combine three different datasets into one dataset, which will be (eventually) reshaped. All in all, the datasets contain the following variables:

    Dataset 1
    - var_i02
    - var_i12

    Dataset 2
    - var_g02

    Dataset 3
    - var_h02
    - var_h12

    I'm trying to loop over the datasets and rename the variables as "var_02_datasetnumber" and "var_12_datasetnumber" so they can be later merged and reshaped. Of course, I could do this manually in few minutes. However, although that would solve the current problem, it won't work should the number of datasets and/or variables be higher. So, I would like to learn to write code that "scales" across different number datasets and variables.

    Now, if the datasets included both variables, I wouldn't be asking for help. But as it turns out, looping does not work well with missing variable in the dataset 2, as there is nothing to be renamed. I solved this by checking first if the variables are present in the dataset, and if so, proceed to rename it. I first tried "confirm", but it does not work with wildcards needed in the variable names. I substituted it with "ds", which allows wildcards and produces an error message that can be used in the loop. The following code works, although I need to check one variable at the time:

    Code:
    global PATH "C:\..Datasets"
    
    foreach num in 7 8 9 {
        use "$PATH\data_w`num'.dta", clear
        
        capture ds var_?02
        if !_rc {
            local var1 r(varlist)
            rename ``var1'' fi_test02_w`num'
        }
    
        capture ds var_?12
        if !_rc {
        local var2 r(varlist)
        rename ``var2'' fi_test12_w`num'
        }
        
        su
        global vars`num' = "Variables!"
    }
    So, I managed to check if the variables are in the dataset and rename them. Next, I need to the local macros to create a global macro, which contains the found variables. This global would be used in merging, so I don't have to manually check which variables are found in which dataset. And this is where I have serious problems:
    1. I discovered macros quite recently and don't know how to refer to them properly, as they act in mysterious ways! For example, I have no idea why
      Code:
      rename ``var1'' fi_test02_w`num'
      results in variable name, but
      Code:
      rename `var1' fi_test02_w`num'
      does not.
    2. This is related to the first problem. I don't know how to make the global including the variable names (or is it even possible). This far I have read Stata manuals, the help files, and Stata forum. I have prayed, cursed, yelled, and waved a stick at the sky. But with scant success. I have also tried every combination of quotes imaginable. But no! Stata stubbornly refuses to put anything other than r(varlist) in the global.

    So, please, help me to understand the macros and how to use them!

  • #2
    So, in the code
    Code:
    if !_rc {
        local var1 r(varlist) 
        rename ``var1'' fi_test02_w`num'
     }
    I suggest that, for learning purposes, you insert the command -macro list _var1- (on separate lines) before and after the -local- command so that you can see exactly what it's doing. But, in the meantime, here is the explanation.

    When you run a command -local my_local whatever-, the contents of local macro my_local are set to the literal text of whatever, unless whatever, itself, contains embedded macros. In the latter case, the embedded macros get expanded first. So -local var1 r(varlist)- sets the contents of var1 to "r(varlist)". In particular, it does not put the variable names into var1. So if you then tried to run -rename `var1' fi_test02_w`num'-, Stata would expand this to -rename r(varlist) fi_test02_w_#-, where # would be replaced by 7, 8, or 9,depending on which iteration of the outer loop is executing. This would throw a syntax error since -rename- expects a variable name in the place where you have the expression r(varlist).

    You ran, happily, -rename ``var1'' fi_test02_w`num'- and this works because `var1' is expanded to r(varlist), and then the outer ` ' further expands that to var?02, substituting for ? whichever character in that data set provides a valid variable name. Now, while this worked in your case, this is not a safe way to do this. The problem is that r(varlist) itself is volatile: if you had inserted any other commands between -local- and -rename- that left behind anything in -r()-, then r(varlist) would have disappeared (or, if you gave another command that leaves an -r(varlist)- behind, it would have been overwritten by the new variable list). So the -rename ``var1''...- command would either have nothing or a different variable list than you intended. Here's the safe way to do this:

    Code:
    if !_rc {
        local var1 `r(varlist)'
        rename `var1' fi_test02_w`num'
    }
    By putting `r(varlist)' in the -local- command, local macro var1 is directly set to contain the name of the variable identified in the -ds- command, and the reference to `var1' in the -rename- command would then, as intended, expand to that same variable name.

    It takes a while to get used to working with local macros. If you keep trying, and making mistakes, and correcting them, it will eventually become pretty much second nature. Yes, there will sometimes be complicated cases with multiple levels of embedding of macros, and those may require you to explicitly work through how they expand in order to get them right. Putting -macro list _local_name- commands in your code shows you what Stata is doing with the local macros when you are not, yourself, sure. But apart from the rare seriously complicated situations, you will, in time, develop an intuitive understanding of them.

    Comment


    • #3
      Thank you so much for your help! I got my global workin on the first try.

      Looking backwards, I think my problem stems from the fact that I first tried using globals or locals without the quotes (see the code below). At the moment, it seems that locals and globals have different number of layers with globals having an extra layer:

      Code:
      . ds variableX
      variableX
      . global test r(varlist)
      
      . disp $test
      variableX   // The variable names, as intended.
      
      . disp "$test"
      r(varlist)   // Saved results from -ds-.
      
      . disp "`$test'"
      variableX   // Variable name again, for some reason.
      The same does not seem to work with locals, as you wrote. But! On the other hand, using global AND quotes does not work either:

      Code:
      . ds variableX
      variableX
      . global test `r(varlist)'
      
      . disp $test
      0
      . disp `$test'
         // Nothing.
      . disp "`test'"
         /// Nothing again.
      At this point I am lost. What is happening here and why? Like they say in the internet, "I am confusion!"

      Comment


      • #4
        Hello Samuli Koponen

        1. Everything 'works' fine here. Macros are so fundamental in Stata that if anything went wrong with them much of the ado-code will go buggy. So it's a matter of learning how it behaves.

        2. I hope this code will clarify your confusion:

        Code:
        clear all
        set obs 1
        generate variableX=123
        
        ds variableX
        global test r(varlist)  // this is equal literally "r(varlist)"
        display `"$test => `$test' => `=`$test''"'
        
        ds variableX
        global test=r(varlist)  // this is equal literally "variableX"
        display `"$test => `=$test'"'
        Importantly, notice the equal sign used in the last block of code, which makes all the difference! Also, display evaluates the content to be displayed, so be careful with it and compare the behavior with macro dir/macro list commands.

        Clyde Schechter has made an excellent pointer about the volatility of the r(varlist), which would depend on the earlier run commands.

        Finally, nothing in this task seems to warrant the use of globals. Consider working a solution using locals only.

        Comment


        • #5
          Thank you, Sergiy, for clarifying this!

          As far as I understand, locals disappear after the loop finishes, but globals remain (at least for the session?). To merge the files, I need to do the loop, open the main dataset, and merge the variables from the datasets edited in the loop. The only way I found to store the variables is a global macro. Of course, I'm open to better/smarter/more efficient ways to do this. So if you have something in your mind, please let me know!

          The ways on doing things in Stata actually relates to another issue in coding. Basic commands are usually introduced widely, but the less commonly used commands, such as -ds- or -confim-, are unknown to many. This severely limits the users understanding of what can be done. I only discovered these myself when dealing with the current problem. Although I enjoyed trying to solve the problem, I spent 6-8 hours in the process (which could've been done manually in 10 minutes). But I gained a few new commands and better understanding in the process, so it was worth it!

          Comment


          • #6
            Indeed: sometimes a problem turns out to require some stuff you didn't know about. Here's a confession. I've been using Stata almost every working day since 1991 and I often need to learn new syntax, look things up or try an experiment to see what works.

            The problem here can be abstracted slightly

            In each file
            if there's a variable matching a certain pattern then rename it


            I agree with experienced users that this problem doesn't need any global macros.

            Your code simplifies down to something like

            Code:
            cd "C:\..Datasets"
            
            forval num = 7/9 {
                use "data_w`num'.dta", clear
                
                capture noisily rename var_?02 fi_test02_w`num'
                
                capture noisily rename var_?12 fi_test12_w`num'
                
                save, replace
            }
            where I have added

            * a save command as I can't see any point to using the rename command otherwise.

            * a noisily as that way you'd get some information if the command doesn't work.

            I note your comments on ds -- which I had something to do with -- but note also that describe, varlist lets you do what you were trying, and that you can go straight to trying a rename without what I call a middle macro -- a macro in the middle of some processing that can be cut.

            The point of this post is not "This is obvious" as I really don't think that. It's just in the same spirit as earlier replies from others that if you ask for help, you will get it.

            Comment


            • #7
              As far as I understand, locals disappear after the loop finishes
              I've seen this idea on the forum before that the scope of a variable ends when a code block ends, but I'm not sure what precisely is meant by that. Consider the following code.

              Code:
              local test1 = "test1"
              foreach test2 in "test2" {
                  display "`test2'"
                  local test3 = "test3"
              }
              display "`test1'"
              display "`test2'"
              display "`test3'"
              The first and third macro persist after the foreach loop finishes. The second macro, which is defined as part of the loop, seems to disappear after the loop finishes. Here is the output I get.

              Code:
              . local test1 = "test1"
              
              . foreach test2 in "test2" {
              2. display "`test2'"
              3. local test3 = "test3"
              4. }
              test2
              
              . display "`test1'"
              test1
              
              . display "`test2'"
              
              
              . display "`test3'"
              test3
              The fact that the second macro falls out of scope after the loop finishes is a little unusual. In a lot of programming languages, locals or their equivalents that are defined by a for loop are within the scope of whatever function/namespace/whatever executes the for loop. This might be about avoiding certain kinds of errors where you'd like to reuse a local, I'm not sure.

              Comment


              • #8
                Daniel Schaefer has the gist of it right. There are two complementary rules:

                Rule 1:Local macros defined inside a loop persist after the loop ends. They are just like local macros defined anywhere else in the same block of code.
                Rule 2: By contrast, local macros defined by the looping command itself go out of scope after the loop is exited.

                There is one subtlety in the latter statement. It applies without question to local macros that serve as the iterators in any of the -foreach- commands. But there are some other ways of looping in Stata. For example, you can do something 5 times in Stata with:

                Code:
                local i  = 0
                while `i' < 5 {
                    // do the thing
                    local ++i
                }
                display `i'
                and you will see that i contains 5 at the end. The key observation here is that even though i is the iterator of the -while- loop, it is not defined by the -while- command. So it falls under the Rule 1.

                Comment


                • #9
                  An important conceptual point here, which might not be obvious to less experienced folk looking at this thread, is that it's not just the loop that defines the scope of a macro. A macro remains in scope while the block of code that created it is running, but it is out of scope (nonexistent) after that block has run. Here's a minimal example for a new user of macros:

                  Code:
                  // Run just the following line from the do-file editor
                  local test1 = "something"
                  // Run the next line
                  di "test1 = `test1'"
                  test1 will be blank, as it only exists for the life of the code block within which it is run.

                  To my understanding, Stata runs a selection of code by writing it to some temporary do-file, and feeding that file to Stata. When that file has run, none of its macros exist any longer and so are blank.

                  However:

                  1) If the preceding lines were typed into the command window one at a time, "test1" would remain scope for that interactive environment, and it will contain "something".

                  2) By contrast: If the content "something" were assigned to a string scalar called (say) test2, rather than to a local macro, that scalar would remain after the one line code block executes. I never use string scalars, and I suspect almost no one else does either, but here's an example, again to be run line by line:
                  Code:
                  // Run the following line from the do-file editor
                  scalar test2 = "something"
                  // Run the next line
                  di test2
                  test2 will exist and contain "something"

                  The preceding examples are contrived, but intended as instructive.

                  Comment

                  Working...
                  X