Text Mining with Stata (Language, word count, keywords)

Flor Pali

Join Date: Nov 2016

Posts: 1
#1

Text Mining with Stata (Language, word count, keywords)

04 Nov 2016, 10:05

Hello!

For a university project I am working on a text mining task...I got a .csv-file with data from AirBnB listings in Berlin with a total of 96 variables and around 3000 observations, including data on e.g. the exact location, size of rooms, price, and title/description of the offer as displayed on the AirBnB website.

I wanna analyse the small title/description text parts- What language are they written in (English, German or bilingual)? How many words/characters were used? What's the frequency of certain keywords? I wanna filter this by the different districts of Berlin, e.g. maybe in district 1 more hosts tend to write English titles to attract foreign tourists? Maybe in district 2 there is a high frequency of location-focused keywords in the title, whereas in district 3 there is a high frequency of emotional keywords?

I am not a very experienced user of Stata, but after some first research into this particular topic, I found that there are ways to do some basic text mining in Stata.
(See this paper: https://poseidon01.ssrn.com/delivery...085069&EXT=pdf or also http://www.stata.com/manuals14/fnstringfunctions.pdf)

I do not really understand the approach though. Especially the word counting command does not seem to work for me.

As an alternative to using the basic Stata, I found out that I could try to do it with WordStat and QDA Data Miner. I would prefer a simple Stata solution though, without having to learn these two other programs first.

Has anybody here done a similar simple text mining task with Stata before? Is it even possible? I would highly appreciate any kind of help! :-)

Thank you in advance!

Best
Flor
Tags: data mining, text mining

Announcement

Text Mining with Stata (Language, word count, keywords)