Hi
I'd like to drop duplicates randomly instead of just the first duplicate observation.
A snapshot of my data set:
Each patent-invt_id has several co_invt_id. I want to keep only one co_invt_id but picked randomly.
I found the following code on the predecessor of statalist:
Does it make sense? (I'm not very familiar with Stata syntax) I can execute it in my dataset but because I have over 1 million observation it's quite difficult to see if it indeed duplicates were dropped randomly. Any feedback would be welcome.
I'd like to drop duplicates randomly instead of just the first duplicate observation.
A snapshot of my data set:
Each patent-invt_id has several co_invt_id. I want to keep only one co_invt_id but picked randomly.
I found the following code on the predecessor of statalist:
Code:
bys varnames : gen rnd = uniform() bys varnames (rnd) : keep if _n == 1
Comment