Text mining for word pairs

Jonathan Afilalo

Join Date: Nov 2016

Posts: 41
#1

Text mining for word pairs

28 Mar 2023, 05:30

I am working with a cardiac ultrasound dataset in which there is a long string variable with the key findings. Here is one example:

The LV size and mass are within normal limits. LVEF 60-65%. No RWMA. Dilated RV. Moderate-to-severe TR. PASP is estimated at 54 mmHg.

I would like to remove the stopwords and substitute the synonym words (as specified by me).

Additionally, I would like to specify qualifier words such as: normal, mild, moderate, severe, dilated, enlarged, ... and any digits.

Finally, rather than a simple bag of words representation, I would like to create new variables named according to the non-qualifier words with their values = qualifier words that surround them (either before or after). For example:

var_lv_size var_lv_mass var_lvef var_rwma var_rv_size var_tr var_pasp

normal normal 65 no dilated moderate-severe 54

Is this feasible in Stata??

Thank you,
Jonathan

Last edited by Jonathan Afilalo; 28 Mar 2023, 06:29.
Tags: None

var_lv_size	var_lv_mass	var_lvef	var_rwma	var_rv_size	var_tr	var_pasp
normal	normal	65	no	dilated	moderate-severe	54

Announcement