Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extract only first digits of a string (using regex)?

    Hello,
    I would like to search strings for the the numbers "5438", and recode the variable "find" = 1 only if the first 4 variables in one of the lists of ops_ko* = "5438".
    Here is an example of my code:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str6(ops_ko1 ops_ko2 ops_ko3 ops_ko4)
    "543871" ""       "89310"  ""
    "5438a"  ""       ""       ""
    "14447"  "575438" "548435" ""
    "3206"   "3203"   "545438" ""
    "545541" ""       ""       ""
    end
    My code is obviously wrong, because it extracts all 5438 findings, no matter in which position of the string they are:

    Code:
    gen find=0
    foreach var of varlist ops_ko* {
      recode find 0=1 if regexm(`var', "5438")
     }
    Do you have an advice for me?
    Thank you very much in advance!
    Philip

  • #2
    Code:
    gen find = 0
    foreach var of varlist ops_ko* {
        replace find = 1 if strpos(`var', "5438") == 1
    }
    This will mark find = 1 only when 5438 are the initial 4 characters of one of the ops_ko* variables. Warning: make sure the ops_ko* variables are not padded with leading blanks: this code will fail if they are. See -help trim()- if there is a padding problem.

    One more thing: -recode fine 0 = 1- is not legal Stata syntax. And the -replace- command is more appropriate here in any case.

    Comment


    • #3
      Thank you very much for your tip, it worked very well!!
      Kindly

      Comment


      • #4
        In #1, you just missed out using ^ to denote the start of the string. That, and the correction from recode to replace noted in #2, gives you the following code:

        Code:
        gen find=0
        foreach var of varlist ops_ko* {
          replace find = 1 if regexm(`var', "^5438")
         }
        the result:

        Code:
        . li , noobs sep(0)
        
          +----------------------------------------------+
          | ops_ko1   ops_ko2   ops_ko3   ops_ko4   find |
          |----------------------------------------------|
          |  543871               89310                1 |
          |   5438a                                    1 |
          |   14447    575438    548435                0 |
          |    3206      3203    545438                0 |
          |  545541                                    0 |
          +----------------------------------------------+

        Comment


        • #5
          Another way in here is to look for

          Code:
           
           substr(`var', 1, 4) == "5438"

          Comment


          • #6
            also
            Code:
            recast str4 ops_ko? , force // NB truncate str5 ... 
            gen byte is5438 = inlist("5438", ops_ko1, ops_ko2, ops_ko3, ops_ko4)

            Comment

            Working...
            X