Dear All,
I am looking for a fast and [ideally] a one-command solution to building duplicates histogram. Basically, I want a bar-chart version of the duplicates report id output, which may look, for example, like this:
So I want a histogram of the observations column over the copies column.

This can be done with collapse (see below), but I don't want to destroy (or sort) the dataset for performance reasons, so looking for the fastest way to achieve this.
Thanks in advance for all the advice,
Sergiy
I am looking for a fast and [ideally] a one-command solution to building duplicates histogram. Basically, I want a bar-chart version of the duplicates report id output, which may look, for example, like this:
Code:
. duplicates report Id
Duplicates in terms of Id
--------------------------------------
copies | observations surplus
----------+---------------------------
1 | 153 0
2 | 658 329
3 | 1737 1158
4 | 2664 1998
5 | 2490 1992
6 | 1164 970
7 | 588 504
8 | 232 203
9 | 144 128
10 | 100 90
11 | 33 30
12 | 24 22
13 | 13 12
--------------------------------------
This can be done with collapse (see below), but I don't want to destroy (or sort) the dataset for performance reasons, so looking for the fastest way to achieve this.
Code:
preserve tempvar one cop generate `one'=1 collapse (count) `cop'=`one', by(Id) label variable `cop' "Copies" histogram `cop', d freq restore
Sergiy

Comment