Hi all,
I have a dataset with string variables that contain text in multiple languages (i.e., obs1 can be in English, obs2 Spanish, obs3 Chinese, etc.).
Is it possible to detect the language used for each observation? There seem to be several Python packages (e.g., Polyglot, Langdetect) that can do so, but I have been unable to find anything equivalent for Stata.
For my purposes, it would be sufficient to detect whether the language used is English or not.
I'm using Stata version 16.
Any help is much appreciated!
I have a dataset with string variables that contain text in multiple languages (i.e., obs1 can be in English, obs2 Spanish, obs3 Chinese, etc.).
Is it possible to detect the language used for each observation? There seem to be several Python packages (e.g., Polyglot, Langdetect) that can do so, but I have been unable to find anything equivalent for Stata.
For my purposes, it would be sufficient to detect whether the language used is English or not.
I'm using Stata version 16.
Any help is much appreciated!
Comment