Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference - regex vs strpos?

    I am trying to do a mortality search for a specific disease (X), in a national mortality database. What´s the difference between using regex and strpos?

    gen X=.

    replace X=1 if regexm(Cause of dead,"X")==1

    vs

    replace X=1 if strpos(Cause of dead,”X”)>0



    Thanks in advance for your response
    Last edited by Alessandro Villegas; 11 Apr 2021, 10:40.

  • #2
    In this example, nothing. But with regexm() you can search for more complicated patterns.
    Code:
    . * Example generated by -dataex-. For more info, type help dataex
    . clear
    
    . input str14 cause
    
                  cause
      1. "he had X"      
      2. "died of Y"     
      3. "EXPIRED FROM Z"
      4. end
    
    . generate X1 = strpos(cause,"X")>0
    
    . generate X2 = regexm(cause,"X")
    
    . generate X3 = ustrregexm(cause,"\bX\b")
    
    . list, clean
    
                    cause   X1   X2   X3  
      1.         he had X    1    1    1  
      2.        died of Y    0    0    0  
      3.   EXPIRED FROM Z    1    1    0  
    
    .
    Note that you should probably create a 0/1 indicator variable, not missing/1.

    Note also that in my third example I used the newer regular expression function ustrregexm(). The same results would have been obtained with the older regexm(), but I prefer the newer functions.

    The Unicode regular expression functions introduced in Stata 14 have a much more powerful definition of regular expressions than the non-Unicode functions. To the best of my knowledge, only in the Statlist post linked here is it documented that Stata's Unicode regular expression parser is the ICU regular expression engine documented at http://userguide.icu-project.org/strings/regexp. A comprehensive discussion of regular expressions can be found at https://www.regular-expressions.info/unicode.html.

    Comment


    • #3
      I'm really thankful. Your answer really helped me to understand. Have a nice day

      Comment

      Working...
      X