I would like to perform an analysis of several texts. One step is the stop word removal. So, I need to count the overall amount of words and the amount of stop words. I know that stop word removal is possible with the command txttool. However, I do not understand how I can use it and which stop words are removed. The texts are in 4 languages. English, French, Spanish and German. Does Stata provide this option? Or is there a better software to use for text analysis? I have figured out how to insert the text in a variable. This was possible with Wordstat. It creates one variable with the file name and one variable "Document" with the file text. So my idea is to make use of this variable containing the whole text and to analyse the text by performing commands that create further variables.
Thank you
Thank you
Comment