Dear All,
I am looking for a fast and [ideally] a one-command solution to building duplicates histogram. Basically, I want a bar-chart version of the duplicates report id output, which may look, for example, like this:
So I want a histogram of the observations column over the copies column.
This can be done with collapse (see below), but I don't want to destroy (or sort) the dataset for performance reasons, so looking for the fastest way to achieve this.
Thanks in advance for all the advice,
Sergiy
I am looking for a fast and [ideally] a one-command solution to building duplicates histogram. Basically, I want a bar-chart version of the duplicates report id output, which may look, for example, like this:
Code:
. duplicates report Id Duplicates in terms of Id -------------------------------------- copies | observations surplus ----------+--------------------------- 1 | 153 0 2 | 658 329 3 | 1737 1158 4 | 2664 1998 5 | 2490 1992 6 | 1164 970 7 | 588 504 8 | 232 203 9 | 144 128 10 | 100 90 11 | 33 30 12 | 24 22 13 | 13 12 --------------------------------------
This can be done with collapse (see below), but I don't want to destroy (or sort) the dataset for performance reasons, so looking for the fastest way to achieve this.
Code:
preserve tempvar one cop generate `one'=1 collapse (count) `cop'=`one', by(Id) label variable `cop' "Copies" histogram `cop', d freq restore
Sergiy
Comment