Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Loop through Strings Containing Spaces in a Variable / Macro

    I have two string variables that contain movie names: Name1 and Name2. Some values repeat, both within each one and across the two.

    Example:

    Names1

    Indiana Jones
    James Bond

    Names2

    James Bond
    Top Gun

    I need to loop through all of the movie names, but only once per name.

    The general steps I have in mind:

    1. Create a macro that contains a unique list of names from both variables.

    2. Loop through the macro and tabulate each movie name with a third variable.


    I am having a lot of trouble, I think mostly because of the spaces in the movie names. I know about `" "', used for quotation marks within strings, but I can't get the macro to work.

    I used levelsof and stored the results in a local macro. I then looped through the local variable and concatenated all of the values into a new veritable.

    The variable first looked like this: Indiana Jones James Bond Top Gun.

    Then I added quotation marks: "Indiana Jones" "James Bond" "Top Gun."

    And created a macro to hold all of these.

    But I have tried everything and could not loop through that macro. I tried without quotation marks, with quotation marks, and `" "' . Nothing has worked.

    There must be a better way in the first place.

    Any suggestions?

    Thank you.

  • #2
    My main suggestion is that you should show us the exact code you tried and give precise reports on what went wrong, not "Nothing has worked". This is already the advice in the FAQ: see 12.1 within http://www.statalist.org/forums/help#stata

    This shows some technique. I include reshaping code because some questions are easier to answer with a long version of your dataset.

    Code:
    clear 
    input str42 (Names1 Names2) 
    "Indiana Jones"  "James Bond"
    "James Bond" "Top Gun"
    end 
    levelsof Names1, local(N1) 
    levelsof Names2, local(N2) 
    local Names : list N1 | N2 
    
    foreach N of local Names { 
          di `"`N'"' 
        count if Names1 == `"`N'"' 
        count if Names2 == `"`N'"' 
    } 
    
    gen id = _n 
    reshape long Names, i(id) j(which)
    list, sepby(id) 
    
         +----------------------------+
         | id   which           Names |
         |----------------------------|
      1. |  1       1   Indiana Jones |
      2. |  1       2      James Bond |
         |----------------------------|
      3. |  2       1      James Bond |
      4. |  2       2         Top Gun |
         +----------------------------+

    Comment


    • #3
      Thank you for the quick reply. That was my first post, so thanks also for pointing me to the FAQ.

      I have tried the following:

      Code:
      clear
      input str42 (Names1 Names2)
      "Indiana Jones"  "James Bond"
      "James Bond" "Top Gun"
      end
      
      drop NameVar
      
      gen NameVar = ""
      
      levelsof Names1, local(NameString)
      
      foreach n of local NameString {
          replace NameVar = NameVar + " " + `" "`n'" "'
      }
      
      global Varlist = `" NameVar "'
      
      display $Varlist
      
      foreach l of varlist $Varlist {
          display `l'
      }
      
      foreach l of varlist $Varlist {
          display `" "`l'" "'
      }
      
      foreach l of varlist $Varlist {
          display `" `l' "'
      }
      The last three loops returned the following results:

      Code:
      . do "C:\Temp\2
      > \STD00000000.tmp"
      
      . foreach l of varlist $Varlist {
        2.         display `l'
        3. }
        "Indiana Jones"   "James Bond"
      
      .
      end of do-file
      
      . do "C:\Temp\2
      > \STD00000000.tmp"
      
      .
      . foreach l of varlist $Varlist {
        2.         display "`l'"
        3. }
      NameVar
      
      .
      end of do-file
      
      . do "C:\\Temp\2
      > \STD00000000.tmp"
      
      . foreach l of varlist $Varlist {
        2.         display `" "`l'" "'
        3. }
       "NameVar"
      
      .
      end of do-file
      
      . do "C:\Local\Temp\2
      > \STD00000000.tmp"
      
      .
      . foreach l of varlist $Varlist {
        2.         display `" `l' "'
        3. }
       NameVar
      None of those is what I am looking for. The first one treats it as one long string rather than looping through each sub-string. The other two clearly don't do what I need. I suspect this has something to do with the quotation marks, in various parts of the code, but I can't get it right.

      Thanks again.
      Last edited by John Grove; 21 Aug 2016, 06:59.

      Comment


      • #4
        I also tried the reshaping code you provided, but I am not sure how to proceed from that point.

        Comment


        • #5
          My post #2 showed how to put all the distinct (you say "unique", but unique values occur precisely once, not your case) names in a local macro. It also showed how to loop over the elements of that list of distinct names. I used counting frequency as an example.

          If you have done that, there is no gain into packing all names into one value of a variable. That's just like copying the values from one box into another box.

          Your extra commands achieve looping over a variable list: in your case you have just one name in that variable list. That is, you put one name into a list of names; then you display what you just put into that list, one name. The display doesn't do anything to loop over the values of the variable. Either way, your extra code doesn't seem to do anything extra with the data.

          #1 mentioned tabulating each movie name with a third variable. I don't know what that means, as you haven't given examples of other variables or what the results would look like. We'd be happy to make suggestions if that became clear.

          Comment


          • #6
            Thank you very much. I understand now how to use the code you provided, and it works well. I think the part I am not understanding is this:

            Code:
            local Names : list N1 | N2
            I hadn't used "list" and "|" in this manner before.

            What exactly is put into Names? How is it discarding duplicates between N1 and N2?

            Thanks again!
            Last edited by John Grove; 21 Aug 2016, 08:04.

            Comment


            • #7
              See help macro list for an explanation of the macro list function. In this case, it creates a list containing distinct entries that appear in either N1 or N2 or both N1 and N2.

              Comment


              • #8
                See

                Code:
                help macro
                and start from there. The | operator produces the union of two sets of values.

                Comment


                • #9
                  Excellent. Thank you!

                  I was precisely looking for a "union" method...

                  Ps. Apologies for not providing all of the information in the desired manner. I'll try to produce better posts next time...

                  Comment


                  • #10
                    Can the macro list operator be used with more than two local macros at a time, something like:

                    Code:
                    local Names : list N1 | N2 | N3
                    That doesn't seem to work. As a workaround, I am doing this:

                    Code:
                    local Names : list N1 | N2
                    followed by

                    Code:
                    local AllNames : list Names | N3
                    But I wonder if there is an easier way.

                    Thank you.

                    Comment


                    • #11
                      No, you can't iterate the | operator in that command.

                      But you don't need to choose a new local macro name each time. You could just do:

                      Code:
                      local Names: list N1 | N2
                      local Names: list Names | N3
                      If you actually have a long list of macros, N1, N2, N3, ... , N137 that you want to combine, you can do it in a loop:

                      Code:
                      local Names `N1'
                      forvalues i = 2/137 {
                          local Names: list Names | N`i'
                      }

                      Comment


                      • #12
                        Good point, thank you!

                        Comment

                        Working...
                        X