Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Loop regexm to search for multiple strings stored as observations

    Dear Statalists,

    I have sucessfully used regexm to search for one string (a diagnose code, for example "I33.0") among many observations in a variable (called "diagnosis").
    I created a new variable (endocarditis) that is given the value 1 if there was a match.

    Code:
    replace endocarditis = regexm(diagnosis, "I33.0")

    Now I want to do the exact same thing, but instead of searching for one string, I want to search for many strings (I33.0", "I33.9", "B37.6").
    If there is a match on any of these diagnose codes, I want the new variable "endocarditis" to be given the value 1.

    The strings I want to search for are stored in a variable ("X") in the same dataset.

    I would very much appreciate if I could get help to loop regexm to search the variable "diagnosis" for each observation in "X".

    I want the data to look as follows:
    X diagnosis endocarditis
    I33.0 A367 B12.3 B37.6 1
    B37.6 C45.6 0
    I33.9 S536 F6349 0
    Thankful for any help on this matter
    Niko Vähäsarja
    Karolinska Institutet
    Last edited by Niko Vahasarja; 13 Aug 2019, 05:09.

  • #2
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str5 x str16 diagnosis byte endocarditis
    "I33.0" "A367 B12.3 B37.6" 1
    "B37.6" "C45.6"            0
    "I33.9" "S536 F6349"       0
    end
    gen wanted= regexm(diagnosis, "(I33.0|I33.9|B37.6)")
    Res.:

    Code:
     l
    
         +----------------------------------------------+
         |     x          diagnosis   endoca~s   wanted |
         |----------------------------------------------|
      1. | I33.0   A367 B12.3 B37.6          1        1 |
      2. | B37.6              C45.6          0        0 |
      3. | I33.9         S536 F6349          0        0 |
         +----------------------------------------------+

    Comment


    • #3
      Or to look for all values in x, if there are more then in your example:
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str5 x str16 diagnosis byte endocarditis
      "I33.0" "A367 B12.3 B37.6" 1
      "B37.6" "C45.6"            0
      "I33.9" "S536 F6349"       0
      end
      
      * Code
      levelsof x, local(items)
      gen wanted= 0
      foreach item of local items{
      replace wanted=1 if regexm(diagnosis, "`item'")
      }

      Or you can also use strpos instead, as you are not really using regex.
      Code:
      levelsof x, local(items)
      gen wanted2= 0
      foreach item of local items{
      replace wanted2=1 if strpos(diagnosis, "`item'")
      }

      Comment


      • #4
        I missed this

        The strings I want to search for are stored in a variable ("X") in the same dataset.

        Note that no loop is needed here.

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str5 x str16 diagnosis byte endocarditis
        "I33.0" "A367 B12.3 B37.6" 1
        "B37.6" "C45.6"            0
        "I33.9" "S536 F6349"       0
        end
        levelsof x, local(X) separate(|) clean
        gen wanted= regexm(diagnosis, "(`X')")

        Comment


        • #5
          That worked! Thank you very much, both of you.

          Best wishes
          Niko

          Comment

          Working...
          X