local with semi-colon using #delimit

Bryony Simmons

Join Date: Jan 2018

Posts: 37
#1

local with semi-colon using #delimit

10 Jul 2019, 04:38

Hello,

I am cleaning a large dataset & now need to make some manual changes to a string variable so that it is in the correct format to split & reshape.

I want to build a local in the format <new>|<old> and use gettoken to separate the local & then run the replace command. Some of the strings are very long & so I have changed the delimiter in order to split over rows. An extract of the do file code for the local is as follows (there are many more strings to be changed):

Code:

* syntax convention: `" <new>|<old> <new>|<old> [...] "' #delimit ; local change `" "Pre-registration, USA, Europe.|Pre-registration, USA and Europe." "Marketed, UK. Phase III, USA.|Marketed, UK, Phase III, USA." "Registered, UK. Pre-registration, Worldwide.|Registered, UK; Pre-registration, Worldwide." "' ; #delimit cr

The code works fine until I reach a string with a semicolon where I get the error "invalid syntax". Is there a way to overcome this using this current method? Is it possible to change the delimiter to something else?

Alternatively, I can add to the local in each line using local change `" `change' "new text|old text" "' , but I would prefer the first method for readability of the do file. I am working on a PC in Stata/MP 14.2.

Thank you for any help you can provide.

Best wishes,
Bryony
Tags: delimit, delimiter, local

Bjarte Aagnes

Join Date: Apr 2014
Posts: 785

10 Jul 2019, 05:55

Two alternatives; using a local for the colon, reading from file:

Code:

local colon = ";"

#delimit ;

local change `"
"Pre-registration, USA, Europe.|Pre-registration, USA and Europe."
"Marketed, UK. Phase III, USA.|Marketed, UK, Phase III, USA."
"Registered, UK. Pre-registration, Worldwide.|Registered, UK`colon' Pre-registration, Worldwide."
"' ;

#delimit cr

di `"`change'"'

* make example text file
tempfile changetext
local OK = filewrite("`changetext'",`"`change'"',1) 
type `changetext'

* read textfile

local change2 = fileread("`changetext'")

assert `"`change'"' == `"`change2'"'

Comment

Bryony Simmons

Join Date: Jan 2018

Posts: 37
#3

11 Jul 2019, 07:39

Hi - thank you for your really useful answer. The first method, using the local containing the semi-colon works really well.

I am having a bit of trouble understanding & implementing the second method & would like to get my head around it. Would you be able to explain any further?

Thank you again for your time,
Bryony
Comment

Bjarte Aagnes

Join Date: Apr 2014
Posts: 785

11 Jul 2019, 09:24

Hi, you could have the definitions saved in a text file (change.txt) :

"Pre-registration, USA, Europe.|Pre-registration, USA and Europe."
"Marketed, UK. Phase III, USA.|Marketed, UK, Phase III, USA."
"Registered, UK. Pre-registration, Worldwide.|Registered, UK; Pre-registration, Worldwide."

Then read the definitions and parse:

Code:

********************************************************************************
* assumption: text file with definitions (change.txt) 
********************************************************************************

version 14

type "change.txt"

********************************************************************************
* split using gettoken
********************************************************************************

local change = ustrregexra( fileread("change.txt"), "\r\n", "" )

tokenize `"`change'"'
    
qui forvalues i = 1/1000 {

    if ( "``i''" == "" ) {
    
        continue, break
    }
    
    gettoken from to : `i' , parse("|")
    gettoken  sep to : to  , parse("|")
    
    noi di _n "from:  `from'" _n "  to:  `to'"
}

Results:

Code:

. ********************************************************************************
. * assumption: text file with definitions (change.txt) 
. ********************************************************************************
. 
. type "change.txt"
 "Pre-registration, USA, Europe.|Pre-registration, USA and Europe." 
 "Marketed, UK. Phase III, USA.|Marketed, UK, Phase III, USA." 
 "Registered, UK. Pre-registration, Worldwide.|Registered, UK; Pre-registration, Worldwide." 

. 
. ********************************************************************************
. * split using gettoken
. ********************************************************************************
. 
. local change = ustrregexra( fileread("change.txt"), "\r\n", "" )

. 
. tokenize `"`change'"'

.         
. qui forvalues i = 1/1000 {

from:  Pre-registration, USA, Europe.
  to:  Pre-registration, USA and Europe.

from:  Marketed, UK. Phase III, USA.
  to:  Marketed, UK, Phase III, USA.

from:  Registered, UK. Pre-registration, Worldwide.
  to:  Registered, UK; Pre-registration, Worldwide.

. 
end of do-file

The splitting of the "|" separated pairs may alternatively be done using a regex or using substr():

Code:

********************************************************************************
* split using regexm()
********************************************************************************

local change = ustrregexra( fileread("change.txt"), "\r\n", "" )

tokenize `"`change'"'
    
qui forvalues i = 1/1000 {

    if ( "``i''" == "" ) {
    
        continue, break
    }
    
    local ismatch = regexm("``i''", "^(.*)[|](.*)$" )
    local from = regexs(1) /* 1. subexpression of regexm() */ 
    local to = regexs(2)   /* 2. subexpression of regexm() */ 
    
    noi di _n "from:  `from'" _n "  to:  `to'"
}

********************************************************************************
* split using substr() 
********************************************************************************

local change = ustrregexra( fileread("change.txt"), "\r\n", "" )

tokenize `"`change'"'
    
qui forvalues i = 1/1000 {

    if ( "``i''" == "" ) { 
    
        continue, break
    }
    
    local pair "``i''"
    local sep = "|"
    
    local from = substr( "`pair'", 1 , strpos("`pair'", "`sep'" ) - 1 )
    local to   = substr( "`pair'", strpos("`pair'", "`sep'") + 1, . )

    noi di _n "from:  `from'" _n "  to:  `to'"
}

********************************************************************************

exit

Comment

Bryony Simmons

Join Date: Jan 2018

Posts: 37
#5

16 Jul 2019, 03:57

Thank you so much for such a comprehensive reply - this is really useful!
Comment

Announcement