Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to select a specific part in a string variable?

    Greetings,
    I am cleaning data in a very large dataset, and would want to drop all records in a specific variable (dosage_text) which have the word "prevention" located anywhere in the that variable (despite being capital letter or small). In the below example, the word prevention is the last word in text, but there are other instances where prevention is located first or middle. I just want to drop all records of prevention.

    This is for illustration:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str71 dosage_text
    "ONE DAILY -HEART DISEASE/STROKE PREVENTION"                  
    "ONE OR TWO UP TO FOUR TIMES A DAY FOR SEVERE PAIN"           
    "ONE TO BE TAKEN ONCE DAILY - HEART DISEASE/STROKE PREVENTION"
    "ONE DAILY -HEART DISEASE/STROKE PREVENTION"                  
    "TWO TO BE TAKEN FOUR TIMES DAILY FOR PAIN"                   
    ""                                                            
    "ONE TO BE TAKEN ONCE DAILY - HEART DISEASE/STROKE PREVENTION"
    end

    I would very much appreciate if you can please guide me to the commands for that, as I couldn't figure it out.


  • #2
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str71 dosage_text
    "ONE DAILY -HEART DISEASE/STROKE PREVENTION"                  
    "ONE OR TWO UP TO FOUR TIMES A DAY FOR SEVERE PAIN"           
    "ONE TO BE TAKEN ONCE DAILY - HEART DISEASE/STROKE PREVENTION"
    "ONE DAILY -HEART DISEASE/STROKE PREVENTION"                  
    "TWO TO BE TAKEN FOUR TIMES DAILY FOR PAIN"                   
    ""                                                            
    "ONE TO BE TAKEN ONCE DAILY - HEART DISEASE/STROKE PREVENTION"
    end
    
    list if strpos(lower(dosage_text), "prevention")
    
         +--------------------------------------------------------------+
         |                                                  dosage_text |
         |--------------------------------------------------------------|
      1. |                   ONE DAILY -HEART DISEASE/STROKE PREVENTION |
      3. | ONE TO BE TAKEN ONCE DAILY - HEART DISEASE/STROKE PREVENTION |
      4. |                   ONE DAILY -HEART DISEASE/STROKE PREVENTION |
      7. | ONE TO BE TAKEN ONCE DAILY - HEART DISEASE/STROKE PREVENTION |
         +--------------------------------------------------------------+
    Hence use drop rather than list.

    Comment


    • #3
      Dear Nick Cox
      That's amazing! many thanks and appreciation!!
      Is there a way that STATA can list/drop any word with prevent* (so prevention, preventing, prevented etc)?

      Comment


      • #4
        In Stata you can go

        Code:
        list if strpos(" " + lower(dosage_text), " prevent") 
        and thereby look for words starting with "prevent".
        Last edited by Nick Cox; 04 Dec 2021, 10:03.

        Comment


        • #5
          Nick Cox
          It worked perfectly well! Thank you very much.

          Comment

          Working...
          X