I am working with a dataset with 24 Variables and ~44 million observations. For unknown reasons, the raw data has a about 5000 duplicates (found by duplicate tag, gen). I am trying to drop the duplicates right now with:
My computer is calculating since 6 hours now and doesn't want to come to an end. It is not a huge workstation: It has a i7-Processor and 256GB SSD and 8GB RAM. Right now it is calculating with 7,840M Ram.
Did someone have similiar experiences and knows a more efficient way to get drop the duplicates?
P.S.: An additional feature in the bottom right corner data section could be an estimated processing time.
Best regards,
Felix
Code:
duplicates drop permno date, force
Did someone have similiar experiences and knows a more efficient way to get drop the duplicates?
P.S.: An additional feature in the bottom right corner data section could be an estimated processing time.
Best regards,
Felix
Comment