Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to replace repeated words in string

    Hi everyone,

    I just learned how to use the regexr command and I believe it can help solve my problem.

    I have a string variable "response":
    list response
    I'm going there
    where where did you say
    sometimes it is where you think
    i think its where where you go
    its everywhere where you are
    i am planning on going where where where i want to

    As you can see, the word 'where' is repeated quite often. I want to replace strings "where where" and "where where where" with "where"
    However, I don't want to replace (e.g.) "everywhere where" with "where"

    I know I can do this manually, but I was hoping to condense the code into as few lines as possible. This is what I've been trying so far:
    gen temp = regexr(response, " (where)+ where ", " where ")
    replace temp = regexr(response, "^(where)+ where ", "where ")
    I have been using "(where)+" to capture both "where where" and "where where where" but it doesn't seem to work. I also split the code into two commands, one begins with "^(where)" and the other with " (where)" in order to avoid capturing the 'where' in "everywhere" but if I could condense this into one command, that would be ideal as well.

    Would appreciate any advice! Thanks!




  • #2
    Hi Tiffany,

    Try subinstr.

    Your sintax could be something like this:

    Code:
    gen temp = subinstr(response," where where", "where",.)
    What this will do is create a variable called temp which which contain the values of the variable response, but whenever " where where" (please note the space before the first where), it will replace it by "where". Check if it does the trick.




    Comment


    • #3
      Originally posted by Igor Paploski View Post
      Hi Tiffany,

      Try subinstr.

      Your sintax could be something like this:

      Code:
      gen temp = subinstr(response," where where", "where",.)
      What this will do is create a variable called temp which which contain the values of the variable response, but whenever " where where" (please note the space before the first where), it will replace it by "where". Check if it does the trick.



      Thanks for this!

      Is there a way I could replace "where where where" AND "where where" with "where" in a single line?

      Comment


      • #4
        It might be possible, but I can't think of a straightforward way to do so.

        You can think of creative ways to tackle the issue (e.g., creating a new variable that counts how many times the word "where" appears on each observation of your variable response, and then you can set up a loop that repeats the subinstr command for the maximum value of this count variable), but at least the way I see it, it would require more coding and effort and you might end up using more lines to create your code anyway (excep that the subinstr line would be only one, that would be repeated x many times via a loop). Somebody else might have a better solution though.

        Comment


        • #5

          Code:
          clear 
          input str100 response 
          "I'm going there"
          "where where did you say"
          "sometimes it is where you think"
          "i think its where where you go"
          "its everywhere where you are"
          "i am planning on going where where where i want to"
          end 
          
          split response, gen(word) 
          local nwords : word count `r(varlist)' 
          
          gen prev = lower(word1) == "where"  
          gen this = 0 
          forval j = 2/`nwords' { 
              replace this = lower(word`j') == "where" 
              replace word`j' = "" if prev & this 
              replace prev = this 
          } 
          
          egen wanted = concat(word*), p(" ")  
          replace wanted = itrim(wanted) 
          
          list wanted , sep(0) 
          
          
          
          
               +----------------------------------------+
               |                                 wanted |
               |----------------------------------------|
            1. |                        I'm going there |
            2. |                      where did you say |
            3. |        sometimes it is where you think |
            4. |               i think its where you go |
            5. |           its everywhere where you are |
            6. | i am planning on going where i want to |
               +----------------------------------------+
          
          .

          Comment


          • #6
            A solution using the more general regular expression engine introduced in Stata 14.
            Code:
            cls
            clear 
            input str100 response 
            "I'm going there"
            "where where did you say"
            "sometimes it is where you think"
            "i think its where where you go"
            "its everywhere where you are"
            "i am going where where where i want to"
            "where where where did you say where where"
            end 
            
            generate wanted = trim( ustrregexra(" "+response+" ", "( where)+ ", " where ") )
            list, clean noobs
            Code:
            . list, clean noobs
            
                                                 response                            wanted  
                                          I'm going there                   I'm going there  
                                  where where did you say                 where did you say  
                          sometimes it is where you think   sometimes it is where you think  
                           i think its where where you go          i think its where you go  
                             its everywhere where you are      its everywhere where you are  
                   i am going where where where i want to        i am going where i want to  
                where where where did you say where where           where did you say where

            Comment


            • #7
              Cross-posted at https://stackoverflow.com/questions/...th-single-word Please note our policy on cross-posting, which is to tell us about it. https://www.statalist.org/forums/help#crossposting

              Comment


              • #8
                Originally posted by Nick Cox View Post
                Code:
                clear
                input str100 response
                "I'm going there"
                "where where did you say"
                "sometimes it is where you think"
                "i think its where where you go"
                "its everywhere where you are"
                "i am planning on going where where where i want to"
                end
                
                split response, gen(word)
                local nwords : word count `r(varlist)'
                
                gen prev = lower(word1) == "where"
                gen this = 0
                forval j = 2/`nwords' {
                replace this = lower(word`j') == "where"
                replace word`j' = "" if prev & this
                replace prev = this
                }
                
                egen wanted = concat(word*), p(" ")
                replace wanted = itrim(wanted)
                
                list wanted , sep(0)
                
                
                
                
                +----------------------------------------+
                | wanted |
                |----------------------------------------|
                1. | I'm going there |
                2. | where did you say |
                3. | sometimes it is where you think |
                4. | i think its where you go |
                5. | its everywhere where you are |
                6. | i am planning on going where i want to |
                +----------------------------------------+
                
                .

                Thank you, this is very helpful!

                Comment


                • #9
                  Originally posted by William Lisowski View Post
                  A solution using the more general regular expression engine introduced in Stata 14.
                  Code:
                  cls
                  clear
                  input str100 response
                  "I'm going there"
                  "where where did you say"
                  "sometimes it is where you think"
                  "i think its where where you go"
                  "its everywhere where you are"
                  "i am going where where where i want to"
                  "where where where did you say where where"
                  end
                  
                  generate wanted = trim( ustrregexra(" "+response+" ", "( where)+ ", " where ") )
                  list, clean noobs
                  Code:
                  . list, clean noobs
                  
                  response wanted
                  I'm going there I'm going there
                  where where did you say where did you say
                  sometimes it is where you think sometimes it is where you think
                  i think its where where you go i think its where you go
                  its everywhere where you are its everywhere where you are
                  i am going where where where i want to i am going where i want to
                  where where where did you say where where where did you say where
                  Thank you as well, this is very helpful!

                  Comment


                  • #10
                    Originally posted by Nick Cox View Post
                    Cross-posted at https://stackoverflow.com/questions/...th-single-word Please note our policy on cross-posting, which is to tell us about it. https://www.statalist.org/forums/help#crossposting
                    Hi, sorry about that--I am new to Statalist. I will definitely make sure to mention cross-posting moving forward!

                    Comment

                    Working...
                    X