Text analysis - amount of words + stop words removal

Robert Adrian Piper

Join Date: Jan 2020

Posts: 12
#1

Text analysis - amount of words + stop words removal

23 Apr 2020, 11:58

I would like to perform an analysis of several texts. One step is the stop word removal. So, I need to count the overall amount of words and the amount of stop words. I know that stop word removal is possible with the command txttool. However, I do not understand how I can use it and which stop words are removed. The texts are in 4 languages. English, French, Spanish and German. Does Stata provide this option? Or is there a better software to use for text analysis? I have figured out how to insert the text in a variable. This was possible with Wordstat. It creates one variable with the file name and one variable "Document" with the file text. So my idea is to make use of this variable containing the whole text and to analyse the text by performing commands that create further variables.

Thank you
Tags: None
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

23 Apr 2020, 12:30

Just to underline, txttool is s user-written program.

That said, as the help files show, the stopwords can be defined by you.

Best regards,

Marcos
Comment
Robert Adrian Piper

Join Date: Jan 2020

Posts: 12
#3

14 Jun 2020, 00:37

Thank you, Marcos.
Comment

Announcement