Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to extract a sub string

    Hi all,

    I wanted to extract a sub string from a string,
    The string in my data set is as follows
    222222BTTTTT000000000TTTTTT333333
    TTTT0000000000TTTTT22222222222222
    00000TTTTTT00000000000000000000000

    like wise I have almost 50,000 observations in a variable. I wanted to extract the sub string between two "T"'s in each string or string after "T". So please do help me in this regard.

    Thanks in advance.

  • #2
    If I understand you well you want something like this.
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str34 var
    "222222BTTTTT000000000TTTTTT333333"
    "TTTT0000000000TTTTT22222222222222"
    "00000TTTTTT00000000000000000000000"
    end
    
    replace var = subinstr(var, "TTTTTT", "TTTTT", .) // start from longest serie of T's
    replace var = subinstr(var, "TTTTT", "TTTT", .) // repeat until they all have same length
    gen wanted = substr(var, strpos(var, "TTTT")+4, .)
    replace wanted = substr(wanted, 1, strpos(wanted, "TTTT")-1) if strpos(wanted, "TTTT") > 0
    list, noobs
    Code:
    . list, noobs
    
      +------------------------------------------------------------+
      |                              var                    wanted |
      |------------------------------------------------------------|
      |   222222BTTTT000000000TTTT333333                 000000000 |
      | TTTT0000000000TTTT22222222222222                0000000000 |
      | 00000TTTT00000000000000000000000   00000000000000000000000 |
      +------------------------------------------------------------+

    Comment


    • #3
      Code:
      gen tokeep = regexs(1) if regexm(s,"T([0-9]+)[^T]")

      Comment


      • #4
        Thanks Bjarte Aagnes it worked. Thanks a lot

        Comment

        Working...
        X