I am working with a survey data set of over 300 variables and near 10,000 observations. There is a risk that some data are fabricated. I need to detect observations that are over 90% similar. Do you think this is possible with Stata? I know command duplicate that detects duplications, but only 100% duplications can be detected.

Comment