Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Finding firstname.lastname.number pattern in email

    My data contains observations that have [email protected]. I have variables firstname and lastname

    What I want to do is tag observations that have firstname followed by lastname followed by some numbers before the "@" sign in their emailid. How do I go about tagging them? Also the number could be multiple digits

  • #2
    split email , parse("." "@")

    edit: that won't work if no number.
    Last edited by George Ford; 10 Jul 2024, 11:21.

    Comment


    • #3
      probably a cleaner expression, but I don't use ustrregexm much.

      asking whether a number appears before @.

      g hasnum = ustrregexm(substr(var1,strpos(var1,"@")-1,1), "([0-9])")

      Comment


      • #4
        Originally posted by George Ford View Post
        probably a cleaner expression, but I don't use ustrregexm much.

        asking whether a number appears before @.

        g hasnum = ustrregexm(substr(var1,strpos(var1,"@")-1,1), "([0-9])")
        this has helped me narrow down the observations with a number before the "@". And after using split email, p("@") the firstname.lasntname.number gets split out. Now my aim is tag these observatiosn based on the firstname and lastname variable i have. so something like this
        firstname lastname email emailbefore@ hasnum tag
        JOHN DOE [email protected] JOHNDOE 0 0
        JANE DOE [email protected] JANEDOE122 1 1
        JEFF DOE [email protected] JEFF1DOE32 1 0
        so tag = 1 because emailbefore@ == firstname + lastname + "somecombinationofnumber"

        so i would ideally strip the numbers off of emailbefore@ and then check if the A-Z characters match the stripped emailbefore@ variable. BUT the problem is with the observation in row 3. That would get tagged as 1 but I want it to be tagged as 0

        Thanks again George

        Comment


        • #5
          I think you want
          Code:
          gen byte wanted = ustrregexm(email, "^[A-Z]+\d+@")

          Comment


          • #6
            Code:
            generate byte tag = ustrregexm(email, "^"+firstname+lastname+"\d+@")

            Comment


            • #7
              #5 and #6 seem to work.

              Comment

              Working...
              X