Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Keeping certain letters within a string variable

    Hello Statalist

    This is my first post, so please excuse any misconception from my side.

    I currently have string variable that explains conditions, while also stating the condition in medical terms. I would like to remove all the textand only keep the stated condition i medical terms for later analysis. The observations could look something like this:
    LO121 - text
    LO121 - text
    text (LO131-LO135) - text

    My goal is to only keep the LO123 part of the observations.

    Any ideas would be appreciated

    Best regards
    - Mads

  • #2
    Mads:
    welcome to this forum.
    Do you mean -LO121-?
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Assuming all terms have the prefix "LO"

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str25 text
      "LO121 - text"            
      "LO121 - text"            
      "text (LO131-LO135) - text"
      end
      
      gen wanted = ustrregexra(ustrregexra(text, "[^[L][O][0-9]{3}]", ""), "([0-9])L", "$1 L")
      Res.:

      Code:
      . l
      
           +-----------------------------------------+
           |                      text        wanted |
           |-----------------------------------------|
        1. |              LO121 - text         LO121 |
        2. |              LO121 - text         LO121 |
        3. | text (LO131-LO135) - text   LO131 LO135 |
           +-----------------------------------------+

      Otherwise provide a more representative example.

      Comment


      • #4
        Mads.
        starting form the same -dataex- step taken by Andrew, my take is terrible when compared to his:
        Code:
        . split var1 in 1/2, p(-)
        variables created as string:
        var11  var12
        
        . replace var11= var1 in 3
        variable var11 was str6 now str25
        
        
        . list
        
             +---------------------------------------------------------------+
             |                      var1                       var11   var12 |
             |---------------------------------------------------------------|
          1. |              LO121 - text                      LO121     text |
          2. |              LO121 - text                      LO121     text |
          3. | text (LO131-LO135) - text   text (LO131-LO135) - text         |
             +---------------------------------------------------------------+
        
        . replace var11="LO131-LO135" in 3
        
        
        . list
        
             +-------------------------------------------------+
             |                      var1         var11   var12 |
             |-------------------------------------------------|
          1. |              LO121 - text        LO121     text |
          2. |              LO121 - text        LO121     text |
          3. | text (LO131-LO135) - text   LO131-LO135         |
             +-------------------------------------------------+
        
        . drop var12
        
        . list
        
             +-----------------------------------------+
             |                      var1         var11 |
             |-----------------------------------------|
          1. |              LO121 - text        LO121  |
          2. |              LO121 - text        LO121  |
          3. | text (LO131-LO135) - text   LO131-LO135 |
             +-----------------------------------------+
        
        .
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          -moss- from ssc could be used:
          Code:
          moss text , match("LO(\d{3}-*)") regex unicode 
          egen  wanted = concat(_match*)
          Code:
              +----------------------------------------------------------------------------------+
               |                      text   _count   _match1   _pos1   _match2   _pos2    wanted |
               |----------------------------------------------------------------------------------|
            1. |            LO121 - text          1       121       3                 .       121 |
            2. |              LO121 - text        1       121       3                 .       121 |
            3. | text (LO131-LO135) - text        2      131-       9       135      15   131-135 |
               +----------------------------------------------------------------------------------+

          Comment


          • #6
            Alternative:
            Code:
            gen wanted = ustrregexs(0) if ustrregexm(text, "LO\d{3}(?:-LO\d{3})*")

            Comment

            Working...
            X