Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Duplicates in very large dataset

    Hi all,

    I am trying to merge two huge datasets.
    To do so, I am generating a unique identifier as
    Code:
    gen id_mas = _n
    However, when I check for duplicates, stata found that there are though the numbers displayed are different.
    For instance:

    Click image for larger version

Name:	Screenshot 2022-11-04 at 11.26.35 AM.png
Views:	2
Size:	67.6 KB
ID:	1687993


    as you can see the number displayed is 2.33e+07 but it is precisely 23309572. The number below is displayed again as 2.33e+07 but it's 23309514. So they are uniquely defined but stata seems to care only about the rounded value.
    How can I solve this issue and tell stata that these are two separate numbers?

    Thank you

  • #2
    Duplicate thread. Interested readers can follow the discussion here: https://www.statalist.org/forums/for...-large-dataset

    Comment


    • #3
      From -help precision- you have

      Floats can store up to 16,777,215 exactly
      So you are asking for trouble if the storage type is float.

      Comment

      Working...
      X