Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identify if single cell contains multiple substrings

    Hi,

    I'd like to identify observations when a string variable contains multiple substrings.

    In my dataset, I have a string variable containing crime descriptions. For that string variable, I want to identify when "REG" and "GUN" appear together in the same cell.

    The data is messy and not standardized. Here is an example of how some of the strings look:

    crime
    GUN OFFENDER REGISTRATION
    GUN OFFENDER-FAIL TO REGISTER
    GUN OFFENDER/FAIL REG OFFENDER
    GAS/AIR/PAINTBALL GUN: POSSESS
    FIRING HANDGUN IN CITY LIMITS
    FRAUDULENT POSSESSION OF VEH OWNERSHIP REG. PLATE
    KNOWINGLY HOLDING FALSIFIED VEH. REG. PLATE

    I've successfully used the strpos command to isolate observations containing a single substring, i.e. :

    l if strpos(crime, "REG")
    l if strpos(crime, "GUN")

    And I've been able to identify observations that contain either one substring or another, i.e. :

    l if strpos(crime, "REG" "GUN")

    But I haven't been able to figure out how to identify if a single cell contains both "REG" and "GUN".

    Any advice is appreciated.









  • #2
    Code:
    l if strpos(crime, "REG") & strpos(crime, "GUN")

    Comment


    • #3
      (crossed in the ether, deleted.)

      Comment


      • #4
        I am surprised to hear that

        Code:
        l if strpos(crime, "REG" "GUN")
        works and guess that's an illusion.

        Consider this:

        Code:
        . list
        
             +---------+
             |    var1 |
             |---------|
          1. | REG GUN |
          2. |     REG |
          3. |    frog |
          4. |    toad |
             +---------+
        
        . list if strpos(var1, "REG" "GUN")
        
             +---------+
             |    var1 |
             |---------|
          1. | REG GUN |
          2. |     REG |
             +---------+
        I think Stata is just ignoring "GUN" there.

        Comment


        • #5
          Thank you very much, Nick!

          Comment

          Working...
          X