Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to replace certain occurrences of certain characters with others in a given string with a restricted pattern?

    I have exactly 15 character long strings like these. There are ALWAYS only two "1"s which may or may not be separated by "0"s.

    110000000000000
    100010000000000
    000100001000000
    000000010010000
    000000000001100
    000011000000000

    What I want to do is this. Replace all such "0"s that are between the two "1"s. I do not want to replace the "0"s that are after the second "1" or before the first "1".

    So the above data after the solution will look like

    110000000000000
    111110000000000
    000111111000000
    000000011110000
    000000000001100
    000011000000000

    Thank you.
    This is basically a string I created out of 15 counting variables with values 0 or 1 and this is a long-winded hack to make the "in between" zeroes count for another purpose.

  • #2
    This is pedestrian but it works. Regular expression solutions eagerly awaited. I was using an old Stata not yet allowing string multiplication.

    Code:
     
    . l myvar, sep(0)
    
         +-----------------+
         |           myvar |
         |-----------------|
      1. | 110000000000000 |
      2. | 100010000000000 |
      3. | 000100001000000 |
      4. | 000000010010000 |
      5. | 000000000001100 |
      6. | 000011000000000 |
         +-----------------+
    
    . gen first = strpos(myvar, "1")
    
    . gen second = 16 - strpos(reverse(myvar), "1")
    
    . gen myvar2 = subinstr(myvar, substr(myvar, first, second - first + 1), substr("111111111111111", first, second - first + 1), 1)
    
    . l, sep(0)
    
         +----------------------------------------------------+
         |           myvar   first   second            myvar2 |
         |----------------------------------------------------|
      1. | 110000000000000       1        2   110000000000000 |
      2. | 100010000000000       1        5   111110000000000 |
      3. | 000100001000000       4        9   000111111000000 |
      4. | 000000010010000       8       11   000000011110000 |
      5. | 000000000001100      12       13   000000000001100 |
      6. | 000011000000000       5        6   000011000000000 |
         +----------------------------------------------------+
    .

    Comment


    • #3
      Nick, sorry to say that I don't think a regular expression solution is possible in Stata. In general, regular expression substitution (even in languages like Perl) doesn't support variable-length substitutions (or didn't, the last time I looked). Perhaps Perl's ability to do repeated matches and substitutions within a string with one command would make a regular expression solution possible.

      In your solution, I might use the marginally-easier-to-comprehend-at-the-end-of-a-long-day
      Code:
      gen myvar2 = substr(myvar,1,first)                                   ///
                   + substr("111111111111111", first, second - first + 1)  ///
                   + substr(myvar,second,.)

      Comment


      • #4
        even longer-hand
        Code:
        gen str15 var3=""
        forvalues i=1/15 {
                replace var3=var3+"0" if `i' <first
                replace var3=var3+"1" if `i' >=first & `i'<=second
                replace var3=var3+"0" if `i' >second
        }

        Comment


        • #5
          Another less efficient, but hopefully easier to read approach:

          Code:
          forval i=1/13 {
            replace myvar=subinstr(myvar,"1"+"0"*`i'+"1","1"*(`i'+2),.)
          }
          (The approach is less efficient since it will do 13 blind replacements instead of 1, which is necessary. 13 is not a typo, it is 13=15-2)

          Best, Sergiy Radyakin

          Comment


          • #6
            If we're going to use string multiplication, which Nick did not have available, then I'd go for
            Code:
            gen myvar2 = "0"*(first-1) + "1"*(second-first+1) + "0"*(15-second)

            Comment


            • #7
              Revisiting this, here's a revised Lisowski-Cox solution and one using regular expressions.

              Required: Stata 14 and moss (SSC).

              Code:
              . clear
              
              . input str16 myvar
              
                              myvar
                1. 110000000000000
                2. 100010000000000
                3. 000100001000000
                4. 000000010010000
                5. 000000000001100
                6. 000011000000000
                7. end
              
              . gen p1 = strpos(myvar, "1") - 1
              
              . gen p2 = strrpos(myvar, "1")
              
              . gen myvar2 = p1 * "0" + (p2 - p1) * "1" + (16 - p2) * "0"
              
              . l, sep(0)
              
                   +----------------------------------------------+
                   |           myvar   p1   p2             myvar2 |
                   |----------------------------------------------|
                1. | 110000000000000    0    2   1100000000000000 |
                2. | 100010000000000    0    5   1111100000000000 |
                3. | 000100001000000    3    9   0001111110000000 |
                4. | 000000010010000    7   11   0000000111100000 |
                5. | 000000000001100   11   13   0000000000011000 |
                6. | 000011000000000    4    6   0000110000000000 |
                   +----------------------------------------------+
              
              . moss myvar, match("(1.*1)") regex
              
              . l _match1, sep(0)
              
                   +---------+
                   | _match1 |
                   |---------|
                1. |      11 |
                2. |   10001 |
                3. |  100001 |
                4. |    1001 |
                5. |      11 |
                6. |      11 |
                   +---------+
              
              . gen myvar3 = subinstr(myvar, _match1, length(_match1) * "1", 1)
              
              .  drop _count _pos1
              
              . l, sep(0)
              
                   +--------------------------------------------------------------------------+
                   |           myvar   p1   p2             myvar2   _match1            myvar3 |
                   |--------------------------------------------------------------------------|
                1. | 110000000000000    0    2   1100000000000000        11   110000000000000 |
                2. | 100010000000000    0    5   1111100000000000     10001   111110000000000 |
                3. | 000100001000000    3    9   0001111110000000    100001   000111111000000 |
                4. | 000000010010000    7   11   0000000111100000      1001   000000011110000 |
                5. | 000000000001100   11   13   0000000000011000        11   000000000001100 |
                6. | 000011000000000    4    6   0000110000000000        11   000011000000000 |
                   +--------------------------------------------------------------------------+

              Comment


              • #8
                In the above, please substitute

                Code:
                  
                gen myvar2 = p1 * "0" + (p2 - p1) * "1" + (15 - p2) * "0"

                Comment

                Working...
                X