Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Spilt variable and fill in numeric terms in the string

    Dear Statalist,

    I have a variable called Name which looks like below. I would like to spilt Name into Name1 and Name2 by '/' or '-', and then have both of them ending up with the numeric term.
    Name Name1 Name2
    Passyunk230kV
    NorthBangor34.5kV
    Peckville/Varden34.5kV Peckville34.5kV Varden34.5kV
    ErieSouth-Warren230kV ErieSouth230kV Warren230kV
    Frackville-Hauto#369kV Frackville#369kV Hauto#369kV
    I thought about using spilt to do separate out the Name variable, but do not know how to make them both end with the numeric term.

    Thanks!

  • #2
    Here's one way to do this, which presumes that all of your numeric suffixes start with a "#" or a digit, as in the example you show. The strategy is "divide and conquer," that is, separate out the first part and the numeric part of Name.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str22 Name
    "Passyunk230kV"         
    "NorthBangor34.5kV"     
    "Peckville/Varden34.5kV"
    "ErieSouth-Warren230kV"
    "Frackville-Hauto#369kV"
    end
    //
    // Find position of first digit or "#" in Name
    qui desc Name
    qui local maxlen = r(width)
    gen int pos = 0
    local numstart "#0123456789"
    forval i = 1/`maxlen' {
       quiet replace pos = `i' if (pos == 0) & ///
          (strpos("`numstart'", substr(Name, `i',1)) > 0)
    }
    //
    gen str suffix = substr(Name, pos, .)
    gen str first = substr(Name, 1, pos-1)
    split first, gen(Name) parse(/ -)
    //
    replace Name1 = Name1 + suffix
    replace Name2 = Name2 + suffix
    // handle Names without - or / per your example
    replace Name1 = "" if Name1 == Name
    drop pos first suffix
    list


    Comment


    • #3
      Here is another solution using regular expressions, which assumes (i) there is only a single place the string splits (with either "/" or "-"), and (ii) the "numeric part" as you call it could actually have a "#" in it, and also always ends with "kV", which are both true in your example.

      Code:
      gen suffix = ustrregexs(0) if ustrregexm(Name,"[0-9#\.]+kV$")
      split Name, parse(/ -)
      replace Name1 = Name1 + suffix if Name1 != Name
      replace Name1 = "" if Name1 == Name
      drop suffix
      Last edited by Hemanshu Kumar; 28 Aug 2022, 21:14.

      Comment

      Working...
      X