Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • accuracy with large numbers in generating

    I would like to generate a new ID variable, which is based on ID1 and ID2. Instead of giving the unique ID after sorting, I wanted to use the below way, and its advantage is that I can keep both IDs in id12. But the code doesn't work as I expect. id12 for the first observation is supposed to be 135205439, not 135205440. I hope that there is a simple fix for this.

    Code:
            clear 
            input id1 id2
            
            1352     5439
            5386    9615
            4359    8139
            
            end
            
            gen id12= id1*10^5 + id2
            format id* %16.0f
            
            list
            
         |  id1    id2        id12 |
         |-------------------------|
      1. | 1352   5439   135205440 |
      2. | 5386   9615   538609600 |
      3. | 4359   8139   435908128 |

  • #2
    You are suffering from a problem of precision.
    Code:
    . gen double id12= id1*10^5 + id2
    
    . format id* %16.0f
    
    . list
    
         +-------------------------+
         |  id1    id2        id12 |
         |-------------------------|
      1. | 1352   5439   135205439 |
      2. | 5386   9615   538609615 |
      3. | 4359   8139   435908139 |
         +-------------------------+
    Here are the limits on storage of decimal integers with full accuracy in the various numeric storage types. As you can see, you tried to store a 9-digit number into the default float variable type, which is limited to 7 digits of accuracy.

    The fixed-point variables lose the 27 largest positive values to missing value codes; the similar loss for floating point variables occurs only for the largest exponent, so it doesn't affect the much smaller integer values.

    byte - 7 bits -127 100
    int - 15 bits -32,767 32,740
    long - 31 bits -2,147,483,647 2,147,483,620
    float - 24 bits -16,777,216 16,777,216
    double - 53 bits -9,007,199,254,740,992 9,007,199,254,740,992
    Last edited by William Lisowski; 06 Apr 2021, 09:15.

    Comment


    • #3
      William Lisowski Thanks for the information. Is it possible to change this setting? or Do I need to type"double" whenever I generate variables that may have large numbers?

      Comment


      • #4
        this is discussed under
        Code:
        help generate
        where you find, among other things,
        Code:
        set type {float|double} [, permanently]

        Comment


        • #5
          Adding to what Rich wrote, you should start by reading
          Code:
          help precision
          to understand that your question is not "variable that may have large numbers" but rather "variables containing numbers that need to be stored precisely".

          For example, a GNP of 435,908,139 is a large number but one that is not as precise at the 9 digits would suggest, and would under most circumstances be suitably stored as a float rather than double. On the other hand, an identifier of 435908139 needs to be stored without loss of precision.

          Understand that while you can
          Code:
          set type double, permanently
          that may be problematic when you deal with very large datasets, as we sometimes see other Statalist members doing.

          You would, in my opinion, be better advised to think consciously about the appropriate storage type for variables as you create them, rather than adopt a standard in the hope of avoiding the effort of thinking.

          Comment


          • #6
            Apart from the useful advice given above, there are two other ways:

            1. Create the new identifier as a string.

            2. Do not created a new identifier at all. The two variables already identify an observation, there is no need to create a new single variable.

            Comment

            Working...
            X