Hello,
I want generate a new varibale that display the words found in the string variable that match my local list of words.
The context hereby is that I want to create a profanity filter. Nevertheless not all words are considered to be profane in every context.
Therefore I want to see which words my profanity filter is classifying as profane.
The Approach sofar:
gen profanitydummy = 0
gen profanitycount = 0
local badwords "badword1 badword2 badword3 badword4"
foreach b in `badwords' {
replace profanitydummy = 1 if strpos(varstring, " `b' ") != 0
replace profanitycount = profanitycount + 1 if strpos(varstring, " `b' ") != 0
}
This results in a dummy if a word in the varstring matches a word in the local badwords.
In addition it counts the number of unique badwords used in the string.
The local badwords list is approx. 1100 words, I used from a reseacher gathering "offensive" words.
I now want to know, for which words the profanity dummy is indicating that there is a bad word in the varstring.
My approach:
gen badwordinstring = ""
foreach b in `badwords'{
replace badwordinstring = " `b' " if strpos(varstring), " `b' ")
}
Nevertheless, get the error message "invalid Syntax" and cant figure out where the problem is.
My desired goal would be: badwordsinstring: "badword5 badword7"
In addition as of right now my profanitycounter only counts the unique badwords used in a the string.
Do you guys have a hint how to change it to the absolute number of badwords in the string.
For example if badword1 is used 2 times and badword2 is used 5 times the varibale should indicate 7, as of right now I am only able to get the unique amount of badwords.
Thank you in advance.
I want generate a new varibale that display the words found in the string variable that match my local list of words.
The context hereby is that I want to create a profanity filter. Nevertheless not all words are considered to be profane in every context.
Therefore I want to see which words my profanity filter is classifying as profane.
The Approach sofar:
gen profanitydummy = 0
gen profanitycount = 0
local badwords "badword1 badword2 badword3 badword4"
foreach b in `badwords' {
replace profanitydummy = 1 if strpos(varstring, " `b' ") != 0
replace profanitycount = profanitycount + 1 if strpos(varstring, " `b' ") != 0
}
This results in a dummy if a word in the varstring matches a word in the local badwords.
In addition it counts the number of unique badwords used in the string.
The local badwords list is approx. 1100 words, I used from a reseacher gathering "offensive" words.
I now want to know, for which words the profanity dummy is indicating that there is a bad word in the varstring.
My approach:
gen badwordinstring = ""
foreach b in `badwords'{
replace badwordinstring = " `b' " if strpos(varstring), " `b' ")
}
Nevertheless, get the error message "invalid Syntax" and cant figure out where the problem is.
My desired goal would be: badwordsinstring: "badword5 badword7"
In addition as of right now my profanitycounter only counts the unique badwords used in a the string.
Do you guys have a hint how to change it to the absolute number of badwords in the string.
For example if badword1 is used 2 times and badword2 is used 5 times the varibale should indicate 7, as of right now I am only able to get the unique amount of badwords.
Thank you in advance.
Comment