Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Issue with running sum

    Hi all!

    I have problems running the following code (please find attached a sample data with the two relevant variables I'm using):

    Code:
    gen double ___diff_y = ___startinc - ___startinc[_n-1]
    recode ___diff_y (.=0)
    gen double ___cum_w = sum(w)
    gen double ___diff_y_i = ___diff_y*___cum_w[_n-1]
    recode ___diff_y_i (.=0)
    gen double ___cum_diff_y_i = sum(___diff_y_i)
    gen double difference=___cum_diff_y_i[_n]-___cum_diff_y_i[_n-1]
    gen double comparing=difference-___diff_y_i
    More specifically, the issue arises when I creat the variable "___cum_diff_y_i". Since it is (supposed to be) the running sum of "___diff_y_i", by construction the variable "difference" should be exactly the same as "___diff_y_i". However, I created the variable "comparing" to check that this was actually true and to my surprise, there are many cases in which they differ, although, by very small numbers. See here a subsample of the data set where you can see what I'm getting.


    ___diff_y_i ___cum_diff_y_i difference comparing
    0 0
    -8.1E+07 -8.1E+07 -8.1E+07 0
    -2784104 -8.4E+07 -2784104 -9.3E-10
    -9907621 -9.4E+07 -9907621 7.45E-09
    -1.2E+07 -1.1E+08 -1.2E+07 -5.6E-09
    -2.1E+07 -1.3E+08 -2.1E+07 -3.7E-09
    -3.4E+07 -1.6E+08 -3.4E+07 -1.5E-08
    -1.1E+07 -1.7E+08 -1.1E+07 3.73E-09
    -2869477 -1.8E+08 -2869477 -1.2E-08
    -1.6E+07 -1.9E+08 -1.6E+07 1.3E-08
    -1.1E+07 -2E+08 -1.1E+07 7.45E-09
    -8204058 -2.1E+08 -8204058 -1.1E-08
    -8377956 -2.2E+08 -8377956 -5.6E-09
    -1.5E+07 -2.3E+08 -1.5E+07 1.3E-08
    -5970314 -2.4E+08 -5970314 -1E-08
    -2E+07 -2.6E+08 -2E+07 -3.7E-09
    -5340540 -2.7E+08 -5340540 3.73E-09
    -1849475 -2.7E+08 -1849475 1.16E-08
    -5497057 -2.7E+08 -5497057 4.66E-09
    -6667035 -2.8E+08 -6667035 1.3E-08
    -7224984 -2.9E+08 -7224984 0
    -4901580 -2.9E+08 -4901580 1.96E-08
    -4280221 -3E+08 -4280221 -7.5E-09
    First, I've tried calculating manually the running sum, in case the problem was in the function, but I got the same result. Then, I thought that the problem might be with the precision of the numbers. However, adding double to the gen commands also didn't change the problem. So, I was wondering what else could be the problem. Any help would be extremely welcome!
    Attached Files
    Last edited by Caterina Brest Lopez; 23 Jun 2020, 16:29.

  • #2
    To explain very crudely, Stata operates on floating point numbers, and when you do math on FPNs there is a certain (and fixed) accuracy of calculations.
    This means abs(A?B - A?B) < epsilon, where ? is any operation, such as addition or multiplication.
    Theoretically, you should not even expect A?B to be the same when you run it several times (though it commonly remains constant, but may change if you re-run e.g. from 32-bit to 64-bit or from LoHi to HiLo machine).

    For doubles about 16 digits are accurate in representation of a number.
    It is easy to see from your data that the numbers are large Xe+07 and the differences are small: Ye-08.
    The difference between e+07 and e-08 is about 15-16 digits, and that's the promised precision.

    A much more accurate and detailed explanation of precision is in Bill Gould's penultimate guide to precision.

    Comment


    • #3
      Thank you, Sergiy Radyakin for your explanation and the reference!!

      Comment

      Working...
      X