So I'm using Stata 13.1 and have a question. How can I delete string portions in a field based on the contents of another field?
I am using a dataset that includes a variable called comments that includes brief write-ups of student disciplinary issues (this is individual-level data). Some comments may include a student's name and I need to de-identify those comments before the data are sharable. Can I use the variables for first, middle, and last names to search the comments field to get rid of identifying information?
I realize there's probably at least two ways to accomplish what I'm asking.
1. The first is to take the data one row at a time.
2. The other is to search each row's comment field and remove name strings based on the complete list of names available in the data.
The second way is preferable, if doable.
I am using a dataset that includes a variable called comments that includes brief write-ups of student disciplinary issues (this is individual-level data). Some comments may include a student's name and I need to de-identify those comments before the data are sharable. Can I use the variables for first, middle, and last names to search the comments field to get rid of identifying information?
I realize there's probably at least two ways to accomplish what I'm asking.
1. The first is to take the data one row at a time.
2. The other is to search each row's comment field and remove name strings based on the complete list of names available in the data.
The second way is preferable, if doable.
Comment