Falsely identical numerical values in tabulate by group

Therese Rebiere

Join Date: Aug 2015

Posts: 7
#1

Falsely identical numerical values in tabulate by group

22 Jan 2019, 10:16

Dear all,
I am running Stata15. I have the following simple truncated results:

Code:

. tab beta author | author beta | 0 1 2 | Total -----------+---------------------------------+---------- 0 | 25 6 11 | 42 .025 | 0 5 17 | 22 .025 | 38 0 0 | 38 ... .925 | 0 0 1 | 1 .925 | 2 0 0 | 2 .975 | 0 2 6 | 8 .975 | 31 0 0 | 31 1 | 57 2 7 | 66 -----------+---------------------------------+---------- Total | 408 61 143 | 612

As one can see, the beta column, which is numeric (double) displays twice the same value. One is allocated to author 0 (second line of each beta value) while the other occurrence (the first) is allocated to author 1 and 2. There obviously should be only one line per beta value and I do not know what generates this issue. Authors 1 and 2 were stored in different databases that I appended to the main data-set (author 0). May this have create this issue? How can I solve the issue? My apologies if this is regarded as a simple beginner problem.

Last edited by Therese Rebiere; 22 Jan 2019, 10:18.
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35792

22 Jan 2019, 10:40

It's a precision problem.

Code:

search precision

will show numerous resources: blog posts, manual sections, short papers, etc.

My guess is that beta was double in one dataset and float in another.

The root problem, as I guess it: Multiples of 0.025 (1/40) can't usually be stored exactly as decimals. Let's see why

generate a variable with values 0(1)40 out of 40. Which values can be stored exactly, meaning exactly. The list is short. It is 0/40 = 0, 40/40 = 1, 20/40 = 0.5, 10/40 = 0.25, 30/40 = 0.75, 5/40 = 0.125, 15/40 = 0.375, 25/40 = 0.625, 35/40 = 0.875. These numbers have exact binary equivalents, namely 0.0, 1.0, 0.1, 0.01, 0.11, etc. in binary. The other numbers are all a little problematic.

Usually this doesn't bite hard, but it will do if you combine datasets in which there are different storage types.

So on the information you gave, and you don't give a data example,

Code:

gen beta2 = round(beta, 0.025)

may help.

Code:

. clear

. set obs 41
number of observations (_N) was 0, now 41

. gen double beta = (_n - 1)/40

. format beta %23.18f

. list

     +----------------------+
     |                 beta |
     |----------------------|
  1. | 0.000000000000000000 |
  2. | 0.025000000000000001 |
  3. | 0.050000000000000003 |
  4. | 0.074999999999999997 |
  5. | 0.100000000000000006 |
     |----------------------|
  6. | 0.125000000000000000 |
  7. | 0.149999999999999994 |
  8. | 0.174999999999999989 |
  9. | 0.200000000000000011 |
 10. | 0.225000000000000006 |
     |----------------------|
 11. | 0.250000000000000000 |
 12. | 0.275000000000000022 |
 13. | 0.299999999999999989 |
 14. | 0.325000000000000011 |
 15. | 0.349999999999999978 |
     |----------------------|
 16. | 0.375000000000000000 |
 17. | 0.400000000000000022 |
 18. | 0.424999999999999989 |
 19. | 0.450000000000000011 |
 20. | 0.474999999999999978 |
     |----------------------|
 21. | 0.500000000000000000 |
 22. | 0.525000000000000022 |
 23. | 0.550000000000000044 |
 24. | 0.574999999999999956 |
 25. | 0.599999999999999978 |
     |----------------------|
 26. | 0.625000000000000000 |
 27. | 0.650000000000000022 |
 28. | 0.675000000000000044 |
 29. | 0.699999999999999956 |
 30. | 0.724999999999999978 |
     |----------------------|
 31. | 0.750000000000000000 |
 32. | 0.775000000000000022 |
 33. | 0.800000000000000044 |
 34. | 0.824999999999999956 |
 35. | 0.849999999999999978 |
     |----------------------|
 36. | 0.875000000000000000 |
 37. | 0.900000000000000022 |
 38. | 0.925000000000000044 |
 39. | 0.949999999999999956 |
 40. | 0.974999999999999978 |
     |----------------------|
 41. | 1.000000000000000000 |
     +----------------------+

(Please use full family name in your identifier, as requested in the FAQ Advice.)

Comment

Therese Rebiere

Join Date: Aug 2015

Posts: 7
#3

22 Jan 2019, 11:05

Thank you for your precise explanation. One of the dataset (author 0) is the result of some calculation while the two appended others (author 1 and 2) did not suffer from a potential precision problem. Replacing beta by a rounded beta indeed solve the issue.
Comment

Announcement