Hello,
I am doing some text analysis and I have larger chunks of texts as strings in my data.
Now I would like to count how often a specific word occurs in the string.
I have managed to identify IF a specific word occurs by using regexm, but not how often it occurs.
For instance, the code below just tells me a sum of each individual keyword, but I am also interested in cases were, for instance, the word "fraud" appears several times in body.
thank you in advance for your help!
I am doing some text analysis and I have larger chunks of texts as strings in my data.
Now I would like to count how often a specific word occurs in the string.
I have managed to identify IF a specific word occurs by using regexm, but not how often it occurs.
For instance, the code below just tells me a sum of each individual keyword, but I am also interested in cases were, for instance, the word "fraud" appears several times in body.
thank you in advance for your help!
Code:
gen negative_count = 0 local keywords "fraud scam misconduct corruption manipulation deception falsification misrepresentation overstatement greenwashing illegal trading non-compliance double counting price manipulation offset fraud unverified credits low-quality offsets worthless credits overestimated reductions questionable projects lack of additionality poor verification lack of transparency flawed methodology unverified claims inflated impact carbon leakage temporary storage loopholes fake reductions non-permanent offsets market failure lack of regulation lack of oversight inconsistent standards conflict of interest weak governance speculation unfair distribution profit-driven market opaque transactions middlemen issues poor enforcement exploitation of communities lack of trust industry capture" foreach word in `keywords' { replace negative_count = negative_count + regexm(lower(body), "`word'") }
Comment