Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Text Mining with Stata (Language, word count, keywords)

    Hello!

    For a university project I am working on a text mining task...I got a .csv-file with data from AirBnB listings in Berlin with a total of 96 variables and around 3000 observations, including data on e.g. the exact location, size of rooms, price, and title/description of the offer as displayed on the AirBnB website.

    I wanna analyse the small title/description text parts- What language are they written in (English, German or bilingual)? How many words/characters were used? What's the frequency of certain keywords? I wanna filter this by the different districts of Berlin, e.g. maybe in district 1 more hosts tend to write English titles to attract foreign tourists? Maybe in district 2 there is a high frequency of location-focused keywords in the title, whereas in district 3 there is a high frequency of emotional keywords?

    I am not a very experienced user of Stata, but after some first research into this particular topic, I found that there are ways to do some basic text mining in Stata.
    (See this paper: https://poseidon01.ssrn.com/delivery...085069&EXT=pdf or also http://www.stata.com/manuals14/fnstringfunctions.pdf)

    I do not really understand the approach though. Especially the word counting command does not seem to work for me.

    As an alternative to using the basic Stata, I found out that I could try to do it with WordStat and QDA Data Miner. I would prefer a simple Stata solution though, without having to learn these two other programs first.

    Has anybody here done a similar simple text mining task with Stata before? Is it even possible? I would highly appreciate any kind of help! :-)

    Thank you in advance!

    Best
    Flor



Working...
X