Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Does cascade replace round off large values?

    Hello! Thank you for your help in advance.

    I am using cascade replace to replace missing values with a function of the lagged value & another variable.

    I have run into issues that only seem to come up when I am replacing a "large" value. Specifically (see reproducible example below), the cascade replace generates the correct answer when my initial value is <=10 million, but the wrong value when my initial value is >=20 million.

    Code:
    * When initial value set to 20 million, doesn't work
    * Missing values get replaced with running_count[_n-1], whether or not third_obs=0
    
    clear
    set obs 10
    gen third_obs = (_n==3)
    gen running_count = 20000000  if _n==1
    replace running_count = running_count[_n-1] +third_obs[_n]  if missing(running_count)
    tab running_count /*shows only 1 unique value*/
    
    
    * When initial value set to 10 million, does work
    
    clear
    set obs 10
    gen third_obs = (_n==3)
    gen running_count = 10000000  if _n==1
    replace running_count = running_count[_n-1] +third_obs[_n]  if missing(running_count)
    tab running_count /*shows 2 unique values*/

    I don't know what to think, especially since 10 mil and 20 mil have the same number of digits.. Any help is appreciated

  • #2
    The number of decimal digits is less important than whether there are enough bits in your chosen variable type to hold values with full accuracy. You need to use a double not the default float.

    Comment


    • #3
      Here are the limits on storage of decimal integers with full accuracy in the various numeric storage types. The fixed-point variables lose the 27 largest positive values to missing value codes; the similar loss for floating point variables occurs only for the largest exponent, so it doesn't affect the much smaller integer values.

      You will see that 10,000,000 worked for you, because a float will hold precisely a decimal integer of up to 16,777,216 but 20,000,000 exceeds that and cannot be stored with full precision.
      byte - 7 bits -127 100
      int - 15 bits -32,767 32,740
      long - 31 bits -2,147,483,647 2,147,483,620
      float - 24 bits -16,777,216 16,777,216
      double - 53 bits -9,007,199,254,740,992 9,007,199,254,740,992

      Comment


      • #4
        It seems that a long would serve also for #1.

        Comment


        • #5
          Thanks so much to both of you!

          For any future readers with similar problems, I also found this stata blog helpful for building my understanding of what I was doing wrong: https://blog.stata.com/2012/04/02/th...-to-precision/

          Comment

          Working...
          X