Remove duplicates special characters in a string

Paulo Matos

Join Date: Jun 2018

Posts: 5
#1

Remove duplicates special characters in a string

06 Sep 2018, 18:37

Dear Stata users,

I am trying to separate addresses inside a string variable into new observations, like:

from:

"address11 // address12 // address13"
"address21 // address22 // address23"

to:

address11
address12
address13
address21
address22
address23

The problem is not really it, but it is that the separator ("//") is not constant between observations and I actually have a dataframe like:

"address11 //////// address12 / address13"
"address21 ///// address22 /// address23"
"address31 // address32 // address33"

Now I am trying to make the number of ("/") constant between rows, in order to then I can separate the addresses. So I am trying to convert the previous dataframe into a one similar to the first I showed.

I tried using "moss" ssc command, but if you can give any idea to performed that I will be very grateful.

Thank you in advanced,
Paulo

Last edited by Paulo Matos; 06 Sep 2018, 18:40.
Tags: None

Chris Larkin

Join Date: Apr 2016
Posts: 296

06 Sep 2018, 20:34

Hi Paulo,

Welcome to Statalist.

The first part of your problem just requires the use of the subinstr() function. For the second part, you can use a combination of split and stack to get what you want.

Here's some code:

Code:

clear
input str42 var1
"address11 //////// address12 / address13"
"address21 ///// address22 /// address23"
"address31 // address32 // address33"    
end

*Remove the / characters
replace var1 = subinstr(var1, "/", "",.)

*Split the variable by empty space and remove the original
split var1
drop var1

*Stack the three variables created by split into a single variable
stack var11 var12 var13, into(var1) clear
drop _stack

*Now display the output
sort var1
list, clean

          var1  
  1.   address11  
  2.   address12  
  3.   address13  
  4.   address21  
  5.   address22  
  6.   address23  
  7.   address31  
  8.   address32  
  9.   address33

Note above the use of code delimiters to display the code and share the data in this forum. Do use those in the future, and please share data using the dataex command (ssc install dataex)

Last edited by Chris Larkin; 06 Sep 2018, 20:40.

Comment

Jorrit Gosens

Join Date: Jan 2015

Posts: 1019
#3

07 Sep 2018, 01:39

If there are also spaces in the address strings, you could do:

Code:

split var1, p("/ ") stack var11 var12 var13, into(var1) clear replace var1 = subinstr(var1, "/", "",.) replace var1 = trim(var1) drop _stack
1 like
Comment

Daniel Bela

Join Date: Apr 2014
Posts: 246

07 Sep 2018, 02:54

Another solution for replacing several occurrences of the separating character by a single instance is to do so via a regular expression.

Here's my shot on the problem:

Code:

clear
input str42 var1
"address11 //////// address12 / address13"
"address21 ///// address22 /// address23"
"address31 // address32 // address33"    
end

* replace several conscutive occurences of "/" with a single "/"
replace var1=ustrregexra(var1,"/+","/")
list

* split into several variables, remove original
split var1 , parse("/")
drop var1
list

* reshape to long format
generate id=_n
reshape long var1@ , i(id) j(addressno)
list

This also creates an id variable and an enumerator for the addresses per observation, which might be useful lateron.

Regards
Bela

Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

07 Sep 2018, 08:31

You can also find multiple occurrences of substrings that do not contain the delimiter with moss (from SSC):

Code:

clear
input str42 var1
"address11 //////// address12 / address13"
"address21 ///// address22 /// address23"
"address31 // address32 // address33"    
end

moss var1, match("([^/]+)") regex
list _match*

and the results:

Code:

. list _match*

     +---------------------------------------+
     |    _match1       _match2      _match3 |
     |---------------------------------------|
  1. | address11     address12     address13 |
  2. | address21     address22     address23 |
  3. | address31     address32     address33 |
     +---------------------------------------+

.

Comment

Paulo Matos

Join Date: Jun 2018

Posts: 5
#6

07 Sep 2018, 09:30

Thank you so much everyone for the quick answers! All the advices were very useful!
Comment

Announcement