Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Falsely identical numerical values in tabulate by group

    Dear all,
    I am running Stata15. I have the following simple truncated results:
    Code:
    . tab beta author
    
               |              author
          beta |         0          1          2 |     Total
    -----------+---------------------------------+----------
             0 |        25          6         11 |        42
          .025 |         0          5         17 |        22
          .025 |        38          0          0 |        38
    ...
          .925 |         0          0          1 |         1
          .925 |         2          0          0 |         2
          .975 |         0          2          6 |         8
          .975 |        31          0          0 |        31
             1 |        57          2          7 |        66
    -----------+---------------------------------+----------
         Total |       408         61        143 |       612
    As one can see, the beta column, which is numeric (double) displays twice the same value. One is allocated to author 0 (second line of each beta value) while the other occurrence (the first) is allocated to author 1 and 2. There obviously should be only one line per beta value and I do not know what generates this issue. Authors 1 and 2 were stored in different databases that I appended to the main data-set (author 0). May this have create this issue? How can I solve the issue? My apologies if this is regarded as a simple beginner problem.
    Last edited by Therese Rebiere; 22 Jan 2019, 10:18.

  • #2
    It's a precision problem.

    Code:
    search precision
    will show numerous resources: blog posts, manual sections, short papers, etc.

    My guess is that beta was double in one dataset and float in another.

    The root problem, as I guess it: Multiples of 0.025 (1/40) can't usually be stored exactly as decimals. Let's see why

    generate a variable with values 0(1)40 out of 40. Which values can be stored exactly, meaning exactly. The list is short. It is 0/40 = 0, 40/40 = 1, 20/40 = 0.5, 10/40 = 0.25, 30/40 = 0.75, 5/40 = 0.125, 15/40 = 0.375, 25/40 = 0.625, 35/40 = 0.875. These numbers have exact binary equivalents, namely 0.0, 1.0, 0.1, 0.01, 0.11, etc. in binary. The other numbers are all a little problematic.

    Usually this doesn't bite hard, but it will do if you combine datasets in which there are different storage types.

    So on the information you gave, and you don't give a data example,

    Code:
    gen beta2 = round(beta, 0.025) 
    may help.

    Code:
    . clear
    
    . set obs 41
    number of observations (_N) was 0, now 41
    
    . gen double beta = (_n - 1)/40
    
    . format beta %23.18f
    
    . list
    
         +----------------------+
         |                 beta |
         |----------------------|
      1. | 0.000000000000000000 |
      2. | 0.025000000000000001 |
      3. | 0.050000000000000003 |
      4. | 0.074999999999999997 |
      5. | 0.100000000000000006 |
         |----------------------|
      6. | 0.125000000000000000 |
      7. | 0.149999999999999994 |
      8. | 0.174999999999999989 |
      9. | 0.200000000000000011 |
     10. | 0.225000000000000006 |
         |----------------------|
     11. | 0.250000000000000000 |
     12. | 0.275000000000000022 |
     13. | 0.299999999999999989 |
     14. | 0.325000000000000011 |
     15. | 0.349999999999999978 |
         |----------------------|
     16. | 0.375000000000000000 |
     17. | 0.400000000000000022 |
     18. | 0.424999999999999989 |
     19. | 0.450000000000000011 |
     20. | 0.474999999999999978 |
         |----------------------|
     21. | 0.500000000000000000 |
     22. | 0.525000000000000022 |
     23. | 0.550000000000000044 |
     24. | 0.574999999999999956 |
     25. | 0.599999999999999978 |
         |----------------------|
     26. | 0.625000000000000000 |
     27. | 0.650000000000000022 |
     28. | 0.675000000000000044 |
     29. | 0.699999999999999956 |
     30. | 0.724999999999999978 |
         |----------------------|
     31. | 0.750000000000000000 |
     32. | 0.775000000000000022 |
     33. | 0.800000000000000044 |
     34. | 0.824999999999999956 |
     35. | 0.849999999999999978 |
         |----------------------|
     36. | 0.875000000000000000 |
     37. | 0.900000000000000022 |
     38. | 0.925000000000000044 |
     39. | 0.949999999999999956 |
     40. | 0.974999999999999978 |
         |----------------------|
     41. | 1.000000000000000000 |
         +----------------------+
    (Please use full family name in your identifier, as requested in the FAQ Advice.)

    Comment


    • #3
      Thank you for your precise explanation. One of the dataset (author 0) is the result of some calculation while the two appended others (author 1 and 2) did not suffer from a potential precision problem. Replacing beta by a rounded beta indeed solve the issue.

      Comment

      Working...
      X