accuracy with large numbers in generating

Jae Yu

Join Date: Mar 2019

Posts: 42
#1

accuracy with large numbers in generating

06 Apr 2021, 09:05

I would like to generate a new ID variable, which is based on ID1 and ID2. Instead of giving the unique ID after sorting, I wanted to use the below way, and its advantage is that I can keep both IDs in id12. But the code doesn't work as I expect. id12 for the first observation is supposed to be 135205439, not 135205440. I hope that there is a simple fix for this.

Code:

clear input id1 id2 1352 5439 5386 9615 4359 8139 end gen id12= id1*10^5 + id2 format id* %16.0f list | id1 id2 id12 | |-------------------------| 1. | 1352 5439 135205440 | 2. | 5386 9615 538609600 | 3. | 4359 8139 435908128 |
Tags: None

William Lisowski

Join Date: Dec 2014
Posts: 10150

06 Apr 2021, 09:12

You are suffering from a problem of precision.

Code:

. gen double id12= id1*10^5 + id2

. format id* %16.0f

. list

     +-------------------------+
     |  id1    id2        id12 |
     |-------------------------|
  1. | 1352   5439   135205439 |
  2. | 5386   9615   538609615 |
  3. | 4359   8139   435908139 |
     +-------------------------+

Here are the limits on storage of decimal integers with full accuracy in the various numeric storage types. As you can see, you tried to store a 9-digit number into the default float variable type, which is limited to 7 digits of accuracy.

The fixed-point variables lose the 27 largest positive values to missing value codes; the similar loss for floating point variables occurs only for the largest exponent, so it doesn't affect the much smaller integer values.

byte - 7 bits	-127	100
int - 15 bits	-32,767	32,740
long - 31 bits	-2,147,483,647	2,147,483,620
float - 24 bits	-16,777,216	16,777,216
double - 53 bits	-9,007,199,254,740,992	9,007,199,254,740,992

Last edited by William Lisowski; 06 Apr 2021, 09:15.

Comment

Jae Yu

Join Date: Mar 2019

Posts: 42
#3

06 Apr 2021, 09:23

William Lisowski Thanks for the information. Is it possible to change this setting? or Do I need to type"double" whenever I generate variables that may have large numbers?
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4466
#4

06 Apr 2021, 09:45

this is discussed under

Code:

help generate

where you find, among other things,

Code:

set type {float|double} [, permanently]
1 like
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#5

06 Apr 2021, 10:12

Adding to what Rich wrote, you should start by reading

Code:

help precision

to understand that your question is not "variable that may have large numbers" but rather "variables containing numbers that need to be stored precisely".

For example, a GNP of 435,908,139 is a large number but one that is not as precise at the 9 digits would suggest, and would under most circumstances be suitably stored as a float rather than double. On the other hand, an identifier of 435908139 needs to be stored without loss of precision.

Understand that while you can

Code:

set type double, permanently

that may be problematic when you deal with very large datasets, as we sometimes see other Statalist members doing.

You would, in my opinion, be better advised to think consciously about the appropriate storage type for variables as you create them, rather than adopt a standard in the hope of avoiding the effort of thinking.
1 like
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#6

06 Apr 2021, 10:21

Apart from the useful advice given above, there are two other ways:

1. Create the new identifier as a string.

2. Do not created a new identifier at all. The two variables already identify an observation, there is no need to create a new single variable.
Comment

Announcement