How to select a specific part in a string variable?

Danah Abdul

Join Date: Dec 2020

Posts: 74
#1

How to select a specific part in a string variable?

04 Dec 2021, 04:58

Greetings,
I am cleaning data in a very large dataset, and would want to drop all records in a specific variable (dosage_text) which have the word "prevention" located anywhere in the that variable (despite being capital letter or small). In the below example, the word prevention is the last word in text, but there are other instances where prevention is located first or middle. I just want to drop all records of prevention.

This is for illustration:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str71 dosage_text "ONE DAILY -HEART DISEASE/STROKE PREVENTION" "ONE OR TWO UP TO FOUR TIMES A DAY FOR SEVERE PAIN" "ONE TO BE TAKEN ONCE DAILY - HEART DISEASE/STROKE PREVENTION" "ONE DAILY -HEART DISEASE/STROKE PREVENTION" "TWO TO BE TAKEN FOUR TIMES DAILY FOR PAIN" "" "ONE TO BE TAKEN ONCE DAILY - HEART DISEASE/STROKE PREVENTION" end

I would very much appreciate if you can please guide me to the commands for that, as I couldn't figure it out.
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 36059

04 Dec 2021, 05:09

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str71 dosage_text
"ONE DAILY -HEART DISEASE/STROKE PREVENTION"                  
"ONE OR TWO UP TO FOUR TIMES A DAY FOR SEVERE PAIN"           
"ONE TO BE TAKEN ONCE DAILY - HEART DISEASE/STROKE PREVENTION"
"ONE DAILY -HEART DISEASE/STROKE PREVENTION"                  
"TWO TO BE TAKEN FOUR TIMES DAILY FOR PAIN"                   
""                                                            
"ONE TO BE TAKEN ONCE DAILY - HEART DISEASE/STROKE PREVENTION"
end

list if strpos(lower(dosage_text), "prevention")

     +--------------------------------------------------------------+
     |                                                  dosage_text |
     |--------------------------------------------------------------|
  1. |                   ONE DAILY -HEART DISEASE/STROKE PREVENTION |
  3. | ONE TO BE TAKEN ONCE DAILY - HEART DISEASE/STROKE PREVENTION |
  4. |                   ONE DAILY -HEART DISEASE/STROKE PREVENTION |
  7. | ONE TO BE TAKEN ONCE DAILY - HEART DISEASE/STROKE PREVENTION |
     +--------------------------------------------------------------+

Hence use drop rather than list.

Comment

Danah Abdul

Join Date: Dec 2020

Posts: 74
#3

04 Dec 2021, 09:13

Dear Nick Cox
That's amazing! many thanks and appreciation!!
Is there a way that STATA can list/drop any word with prevent* (so prevention, preventing, prevented etc)?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36059
#4

04 Dec 2021, 09:17

In Stata you can go

Code:

list if strpos(" " + lower(dosage_text), " prevent")

and thereby look for words starting with "prevent".

Last edited by Nick Cox; 04 Dec 2021, 10:03.
Comment
Danah Abdul

Join Date: Dec 2020

Posts: 74
#5

04 Dec 2021, 09:51

Nick Cox
It worked perfectly well! Thank you very much.
Comment

Announcement

How to select a specific part in a string variable?

Comment

Comment

Comment

Comment