Splitting the variable into two part

Khati Zolfaghari

Join Date: Mar 2021
Posts: 219

Splitting the variable into two part

27 Jan 2023, 01:25

Hello all,

in my dataset, I have string data related to the iso_code of the regional and it is in terms of the text and number i.e" :AT11, AT12, AU1, ES62, FI19,..... I tried to separate the part of the number from the text.

I did several ways, but they did not separate.

Code:

territorylevelandtypology    reg_id    region    year    emp_new    NUTS2_id
Large regions (TL2)    AT11    Burgenland    1999    123600    AT11
Large regions (TL2)    AT11    Burgenland    2000    124900    AT11
Large regions (TL2)    AT11    Burgenland    2001    120400    AT11
Large regions (TL2)    AT11    Burgenland    2002    122000    AT11
Large regions (TL2)    AT11    Burgenland    2003    124700    AT11
Large regions (TL2)    AT11    Burgenland    2004    119200    AT11
Large regions (TL2)    AT11    Burgenland    2005    126400    AT11
Large regions (TL2)    AT11    Burgenland    2006    128000    AT11
Large regions (TL2)    AT11    Burgenland    2007    132400    AT11
Large regions (TL2)    AT11    Burgenland    2008    133700    AT11
Large regions (TL2)    AT11    Burgenland    2009    133800    AT11
Large regions (TL2)    AT11    Burgenland    2010    135100    AT11

I followed the help of split in the STATA:

Code:

split reg_id, generate(ISO2)

also:

Code:

split reg_id, p(")")
foreach v in `r(varlist)' {
replace `v' = `v' + ")"
}

I appreciate any help and guidance you can provide.

Many thanks in advance for your valuable time.

best,
Khati

Last edited by Khati Zolfaghari; 27 Jan 2023, 01:27.

Tags: None

Nick Cox

Join Date: Mar 2014

Posts: 35720
#2

27 Jan 2023, 03:32

split isn't intended for run-together strings without parsing characters to separate substrings. I can say that insofar as I was the original author. The rationale is that there (usually) quite different solutions, and putting the two kinds of problems together would just complicate the command. At least that was my line and StataCorp have not seen fit to change that since folding the code into official Stata in Stata 8.

In any case your variable with values like AT11 does not include any instances of ) so it is hard to see how that could help.

This works for what you give by way of example.

Code:

. clear . set obs 5 Number of observations (_N) was 0, now 5. . gen reg_id = word("AT11 AT12 AU1 ES62 FI19", _n) . l +--------+ | reg_id | |--------| 1. | AT11 | 2. | AT12 | 3. | AU1 | 4. | ES62 | 5. | FI19 | +--------+ . gen s1 = substr(reg_id, 1, 2) . gen n1 = substr(reg_id, 3, .) . l +------------------+ | reg_id s1 n1 | |------------------| 1. | AT11 AT 11 | 2. | AT12 AT 12 | 3. | AU1 AU 1 | 4. | ES62 ES 62 | 5. | FI19 FI 19 | +------------------+

If the full rules are more complicated a solution may require the use of regular expressions.

Time for you to read https://www.statalist.org/forums/help#spelling after 175 posts here, please!
Comment
Khati Zolfaghari

Join Date: Mar 2021

Posts: 219
#3

27 Jan 2023, 03:35

@Nick Cox many thanks for your reply and your explanation. it works.

Thanks a lot.

Regards,
Khati
Comment

Announcement

Splitting the variable into two part

Comment

Comment