Dear Statalist, I have a list of codes and descriptions (see below) and I would like to classify these codes into green or non-green classification related to environmental descriptions. However, given that I do not know all possible combinations of words linked to environmental friend technologies, I think it would be better to use an algorithm for that.
The idea would be to start from a list of keywords (e.g., environmental, sustainable, renewable energy), and then tell Stata to look for other keywords related to these initial keywords in the whole list of codes and descriptions (e.g., eco-friendly, wind power,…).
I have been looking for this in Stata but do not find nothing. Maybe you can point me in the right direction, or give me any feedback on how to proceed.
Thanks in advance!

The idea would be to start from a list of keywords (e.g., environmental, sustainable, renewable energy), and then tell Stata to look for other keywords related to these initial keywords in the whole list of codes and descriptions (e.g., eco-friendly, wind power,…).
I have been looking for this in Stata but do not find nothing. Maybe you can point me in the right direction, or give me any feedback on how to proceed.
Thanks in advance!
Comment