Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Splitting the variable into two part

    Hello all,

    in my dataset, I have string data related to the iso_code of the regional and it is in terms of the text and number i.e" :AT11, AT12, AU1, ES62, FI19,..... I tried to separate the part of the number from the text.

    I did several ways, but they did not separate.



    Code:
    territorylevelandtypology    reg_id    region    year    emp_new    NUTS2_id
    Large regions (TL2)    AT11    Burgenland    1999    123600    AT11
    Large regions (TL2)    AT11    Burgenland    2000    124900    AT11
    Large regions (TL2)    AT11    Burgenland    2001    120400    AT11
    Large regions (TL2)    AT11    Burgenland    2002    122000    AT11
    Large regions (TL2)    AT11    Burgenland    2003    124700    AT11
    Large regions (TL2)    AT11    Burgenland    2004    119200    AT11
    Large regions (TL2)    AT11    Burgenland    2005    126400    AT11
    Large regions (TL2)    AT11    Burgenland    2006    128000    AT11
    Large regions (TL2)    AT11    Burgenland    2007    132400    AT11
    Large regions (TL2)    AT11    Burgenland    2008    133700    AT11
    Large regions (TL2)    AT11    Burgenland    2009    133800    AT11
    Large regions (TL2)    AT11    Burgenland    2010    135100    AT11
    I followed the help of split in the STATA:

    Code:
    split reg_id, generate(ISO2)
    also:


    Code:
    split reg_id, p(")")
    foreach v in `r(varlist)' {
    replace `v' = `v' + ")"
    }

    I appreciate any help and guidance you can provide.


    Many thanks in advance for your valuable time.



    best,
    Khati
    Last edited by Khati Zolfaghari; 27 Jan 2023, 01:27.

  • #2
    split isn't intended for run-together strings without parsing characters to separate substrings. I can say that insofar as I was the original author. The rationale is that there (usually) quite different solutions, and putting the two kinds of problems together would just complicate the command. At least that was my line and StataCorp have not seen fit to change that since folding the code into official Stata in Stata 8.

    In any case your variable with values like AT11 does not include any instances of ) so it is hard to see how that could help.

    This works for what you give by way of example.


    Code:
    . clear
    
    . set obs 5
    Number of observations (_N) was 0, now 5.
    
    . gen reg_id = word("AT11 AT12 AU1 ES62 FI19", _n)
    
    . l
    
         +--------+
         | reg_id |
         |--------|
      1. |   AT11 |
      2. |   AT12 |
      3. |    AU1 |
      4. |   ES62 |
      5. |   FI19 |
         +--------+
    
    . gen s1 = substr(reg_id, 1, 2)
    
    . gen n1 = substr(reg_id, 3, .)
    
    . l
    
         +------------------+
         | reg_id   s1   n1 |
         |------------------|
      1. |   AT11   AT   11 |
      2. |   AT12   AT   12 |
      3. |    AU1   AU    1 |
      4. |   ES62   ES   62 |
      5. |   FI19   FI   19 |
         +------------------+
    If the full rules are more complicated a solution may require the use of regular expressions.

    Time for you to read https://www.statalist.org/forums/help#spelling after 175 posts here, please!

    Comment


    • #3
      @Nick Cox many thanks for your reply and your explanation. it works.


      Thanks a lot.

      Regards,
      Khati

      Comment

      Working...
      X