Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Text mining for word pairs

    I am working with a cardiac ultrasound dataset in which there is a long string variable with the key findings. Here is one example:

    The LV size and mass are within normal limits. LVEF 60-65%. No RWMA. Dilated RV. Moderate-to-severe TR. PASP is estimated at 54 mmHg.
    I would like to remove the stopwords and substitute the synonym words (as specified by me).

    Additionally, I would like to specify qualifier words such as: normal, mild, moderate, severe, dilated, enlarged, ... and any digits.

    Finally, rather than a simple bag of words representation, I would like to create new variables named according to the non-qualifier words with their values = qualifier words that surround them (either before or after). For example:
    var_lv_size var_lv_mass var_lvef var_rwma var_rv_size var_tr var_pasp
    normal normal 65 no dilated moderate-severe 54
    Is this feasible in Stata??

    Thank you,
    Jonathan
    Last edited by Jonathan Afilalo; 28 Mar 2023, 06:29.
Working...
X