Hi all,
I have a dataset with 22,534,283 records. I would like to create a simple unique id so I used the following: g id=_n.
When I examine the contents of this variable, I find that there are duplicates. See screenshot attached from results window and a paste is below.
the other odd issue is that id's format is float rather than long. Not a big deal (maybe) but it struck me as odd.
Any ideas how to resolve this issue?
Thanks, in advance,
Ben Hoen
Berkeley Lab
=================================================
. drop id
. g id=_n
. duplicates report id
Duplicates in terms of id
--------------------------------------
copies | observations surplus
----------+---------------------------
1 | 18216483 0
2 | 2 1
3 | 4317798 2878532
--------------------------------------
I have a dataset with 22,534,283 records. I would like to create a simple unique id so I used the following: g id=_n.
When I examine the contents of this variable, I find that there are duplicates. See screenshot attached from results window and a paste is below.
the other odd issue is that id's format is float rather than long. Not a big deal (maybe) but it struck me as odd.
Any ideas how to resolve this issue?
Thanks, in advance,
Ben Hoen
Berkeley Lab
=================================================
. drop id
. g id=_n
. duplicates report id
Duplicates in terms of id
--------------------------------------
copies | observations surplus
----------+---------------------------
1 | 18216483 0
2 | 2 1
3 | 4317798 2878532
--------------------------------------
Comment