How to replace certain occurrences of certain characters with others in a given string with a restricted pattern?

Saurabh Chavan

Join Date: Jun 2015

Posts: 25
#1

How to replace certain occurrences of certain characters with others in a given string with a restricted pattern?

01 Jul 2015, 17:27

I have exactly 15 character long strings like these. There are ALWAYS only two "1"s which may or may not be separated by "0"s.

110000000000000
100010000000000
000100001000000
000000010010000
000000000001100
000011000000000

What I want to do is this. Replace all such "0"s that are between the two "1"s. I do not want to replace the "0"s that are after the second "1" or before the first "1".

So the above data after the solution will look like

110000000000000
111110000000000
000111111000000
000000011110000
000000000001100
000011000000000

Thank you.
This is basically a string I created out of 15 counting variables with values 0 or 1 and this is a long-winded hack to make the "in between" zeroes count for another purpose.
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35709

01 Jul 2015, 17:46

This is pedestrian but it works. Regular expression solutions eagerly awaited. I was using an old Stata not yet allowing string multiplication.

Code:

 
. l myvar, sep(0)

     +-----------------+
     |           myvar |
     |-----------------|
  1. | 110000000000000 |
  2. | 100010000000000 |
  3. | 000100001000000 |
  4. | 000000010010000 |
  5. | 000000000001100 |
  6. | 000011000000000 |
     +-----------------+

. gen first = strpos(myvar, "1")

. gen second = 16 - strpos(reverse(myvar), "1")

. gen myvar2 = subinstr(myvar, substr(myvar, first, second - first + 1), substr("111111111111111", first, second - first + 1), 1)

. l, sep(0)

     +----------------------------------------------------+
     |           myvar   first   second            myvar2 |
     |----------------------------------------------------|
  1. | 110000000000000       1        2   110000000000000 |
  2. | 100010000000000       1        5   111110000000000 |
  3. | 000100001000000       4        9   000111111000000 |
  4. | 000000010010000       8       11   000000011110000 |
  5. | 000000000001100      12       13   000000000001100 |
  6. | 000011000000000       5        6   000011000000000 |
     +----------------------------------------------------+

Comment

William Lisowski

Join Date: Dec 2014

Posts: 10150
#3

01 Jul 2015, 18:55

Nick, sorry to say that I don't think a regular expression solution is possible in Stata. In general, regular expression substitution (even in languages like Perl) doesn't support variable-length substitutions (or didn't, the last time I looked). Perhaps Perl's ability to do repeated matches and substitutions within a string with one command would make a regular expression solution possible.

In your solution, I might use the marginally-easier-to-comprehend-at-the-end-of-a-long-day

Code:

gen myvar2 = substr(myvar,1,first) /// + substr("111111111111111", first, second - first + 1) /// + substr(myvar,second,.)
1 like
Comment

ben earnhart

Join Date: May 2014
Posts: 1027

01 Jul 2015, 19:09

even longer-hand

Code:

gen str15 var3=""
forvalues i=1/15 {
        replace var3=var3+"0" if `i' <first
        replace var3=var3+"1" if `i' >=first & `i'<=second
        replace var3=var3+"0" if `i' >second
}

Comment

Sergiy Radyakin

Join Date: Apr 2014

Posts: 1867
#5

01 Jul 2015, 19:31

Another less efficient, but hopefully easier to read approach:

Code:

forval i=1/13 { replace myvar=subinstr(myvar,"1"+"0"*`i'+"1","1"*(`i'+2),.) }

(The approach is less efficient since it will do 13 blind replacements instead of 1, which is necessary. 13 is not a typo, it is 13=15-2)

Best, Sergiy Radyakin
1 like
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#6

01 Jul 2015, 19:40

If we're going to use string multiplication, which Nick did not have available, then I'd go for

Code:

gen myvar2 = "0"*(first-1) + "1"*(second-first+1) + "0"*(15-second)
2 likes
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35709

02 Jul 2015, 11:31

Revisiting this, here's a revised Lisowski-Cox solution and one using regular expressions.

Required: Stata 14 and moss (SSC).

Code:

. clear

. input str16 myvar

                myvar
  1. 110000000000000
  2. 100010000000000
  3. 000100001000000
  4. 000000010010000
  5. 000000000001100
  6. 000011000000000
  7. end

. gen p1 = strpos(myvar, "1") - 1

. gen p2 = strrpos(myvar, "1")

. gen myvar2 = p1 * "0" + (p2 - p1) * "1" + (16 - p2) * "0"

. l, sep(0)

     +----------------------------------------------+
     |           myvar   p1   p2             myvar2 |
     |----------------------------------------------|
  1. | 110000000000000    0    2   1100000000000000 |
  2. | 100010000000000    0    5   1111100000000000 |
  3. | 000100001000000    3    9   0001111110000000 |
  4. | 000000010010000    7   11   0000000111100000 |
  5. | 000000000001100   11   13   0000000000011000 |
  6. | 000011000000000    4    6   0000110000000000 |
     +----------------------------------------------+

. moss myvar, match("(1.*1)") regex

. l _match1, sep(0)

     +---------+
     | _match1 |
     |---------|
  1. |      11 |
  2. |   10001 |
  3. |  100001 |
  4. |    1001 |
  5. |      11 |
  6. |      11 |
     +---------+

. gen myvar3 = subinstr(myvar, _match1, length(_match1) * "1", 1)

.  drop _count _pos1

. l, sep(0)

     +--------------------------------------------------------------------------+
     |           myvar   p1   p2             myvar2   _match1            myvar3 |
     |--------------------------------------------------------------------------|
  1. | 110000000000000    0    2   1100000000000000        11   110000000000000 |
  2. | 100010000000000    0    5   1111100000000000     10001   111110000000000 |
  3. | 000100001000000    3    9   0001111110000000    100001   000111111000000 |
  4. | 000000010010000    7   11   0000000111100000      1001   000000011110000 |
  5. | 000000000001100   11   13   0000000000011000        11   000000000001100 |
  6. | 000011000000000    4    6   0000110000000000        11   000011000000000 |
     +--------------------------------------------------------------------------+

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35709
#8

02 Jul 2015, 12:48

In the above, please substitute

Code:

gen myvar2 = p1 * "0" + (p2 - p1) * "1" + (15 - p2) * "0"
1 like
Comment

Announcement

How to replace certain occurrences of certain characters with others in a given string with a restricted pattern?

Comment

Comment

Comment

Comment

Comment

Comment

Comment