Hi everyone,
I am dealing with survey data with open-ended questions that I need to clean. The questions I am interested in are factual knowledge questions about different people. Respondents were asked things like "Do you recall the name of the person who holds X position?". Respondents then had to type in the name of the person. Say the correct response was "John Doe", respondents may have written JOHN DOE; John Doe, john doe, Doe, DOE, Bob Doe, Bill Doe, Doe, DOE, J Doe...
I now need to clean this and create binary variables where 1=correct and 0=incorrect response. I think that the only possible approach is to do this incrementally by progressively coding all the specific ways of writing "John Doe" used by the respondents.
So I would proceed like this :
gen dummy1=0
To create a dummy variable for the first knowledge question. But then, I need to recode the values of dummy1 according to the answer given on the question about John Doe (lets call that variable "qdoe"). So, first, I would need to recode as 1 in dummy1 all answers in qdoe that contains "doe" (and then all its possible variations : Doe, DOE... ).
What I have in mind is something like this :
recode dummy1 0=1 if qdoe=="Doe"
recode dummy1 0=1 if qdoe=="doe"
recode dummy1 0=1 if qdoe=="DOE"
etc.
But, this does not work. Moreover, what I need is that dummy1 be recoded to 1 if qdoe contains "Doe", whether or not qdoe is perfectly equal to "Doe". Hence, respondents who answered "John Doe" and those who only wrote "Doe" without the first name would both be coded as 1 simultaneously.
Any help would be highly appreciated!
I am dealing with survey data with open-ended questions that I need to clean. The questions I am interested in are factual knowledge questions about different people. Respondents were asked things like "Do you recall the name of the person who holds X position?". Respondents then had to type in the name of the person. Say the correct response was "John Doe", respondents may have written JOHN DOE; John Doe, john doe, Doe, DOE, Bob Doe, Bill Doe, Doe, DOE, J Doe...
I now need to clean this and create binary variables where 1=correct and 0=incorrect response. I think that the only possible approach is to do this incrementally by progressively coding all the specific ways of writing "John Doe" used by the respondents.
So I would proceed like this :
gen dummy1=0
To create a dummy variable for the first knowledge question. But then, I need to recode the values of dummy1 according to the answer given on the question about John Doe (lets call that variable "qdoe"). So, first, I would need to recode as 1 in dummy1 all answers in qdoe that contains "doe" (and then all its possible variations : Doe, DOE... ).
What I have in mind is something like this :
recode dummy1 0=1 if qdoe=="Doe"
recode dummy1 0=1 if qdoe=="doe"
recode dummy1 0=1 if qdoe=="DOE"
etc.
But, this does not work. Moreover, what I need is that dummy1 be recoded to 1 if qdoe contains "Doe", whether or not qdoe is perfectly equal to "Doe". Hence, respondents who answered "John Doe" and those who only wrote "Doe" without the first name would both be coded as 1 simultaneously.
Any help would be highly appreciated!
Comment