Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Foreach accross a list of strings?

    Hello,

    I have a string variable called text, which consists of whole sentences, where the names of some cities appear. I am interested in creating a dummy variable if text contains some specific cities, lets say Paris, Madrid, Berlin, New York. Some of these names contains a space, like New York.

    I could do it with repeted commands of strpos(text, "Paris") etc. But I have more than 10 cities so I am thinking a loop may be usefull.

    What I have tried so far is the following:

    generate dummy = 0
    foreach city in `"Paris" "Madrid" "Berlin" "New York" ' {
    replace dummy = 1 if strpos(text, " `city' ") >0
    }

    What I get is dummy=1 for anymatch within the text. The problem is quite obvious for New York, where I get dummy= 1 even if the word "new" appears by its own within the text, whereas I am only searching for the sequence "New York". It is also the case for Parisian for example.

  • #2

    Code:
    gen wanted = strpos(text, "Paris") | strpos(text, "Madrid") | strpos(text, "Berlin") | strpos(text, "New York")
    should do what you want. It works because strpos() returns zero for no find and a positive number for a find and non-zero counts as true.

    (Look! No loop! is the Stata equivalent of the child or circus performer's No hands!)

    But you have more than 4. What you are missing, I think, is the combination of double quotes and compound double quotes:

    Code:
    foreach city in `" "Paris" "Madrid" "Berlin" "New York" "'
    Last edited by Nick Cox; 21 Jan 2020, 09:19.

    Comment


    • #3
      Thank you Nick. You are right, as I understand it, it's the missing compound double quotes whithin the strpos() command!
      I works without the double quotes " " at the begining and at the end of the list, but with the compound double quotes ` " ` city ' " ' inside the strpos command:

      foreach city in "Paris" "Madrid" "Berlin" "New York" {
      replace dummy = 1 if strpos(text, ` " `city' " ')
      }

      Comment


      • #4
        Nick Cox


        what if

        Code:
         
         foreach city in `" "Paris" "Madrid" "Berlin" "New York" "'
        is actually an array of thousand strings?

        Is there a way I can have an external text file (e.g., see below) with a thousand strings (with spaces in between) referenced in a local loop for this purpose?

        Code:
        Paris
        Madrid
        Berlin
        New York
        South East Corner of Washington Square Park
        .
        .
        .
        .
        <1000 lines>
        Last edited by Suresh Paul; 18 Nov 2020, 11:56.

        Comment


        • #5
          In general you should them read them all in once and then refer to them within a loop. If they aren't part of your main dataset, that's not fatal as you could store them in other ways, e.g. as a set of scalars or as a vector in Mata.

          Comment


          • #6
            Thanks Nick!

            Comment

            Working...
            X