Issue with running sum

Caterina Brest Lopez

Join Date: Dec 2019
Posts: 46

Issue with running sum

23 Jun 2020, 16:25

Hi all!

I have problems running the following code (please find attached a sample data with the two relevant variables I'm using):

Code:

gen double ___diff_y = ___startinc - ___startinc[_n-1]
recode ___diff_y (.=0)
gen double ___cum_w = sum(w)
gen double ___diff_y_i = ___diff_y*___cum_w[_n-1]
recode ___diff_y_i (.=0)
gen double ___cum_diff_y_i = sum(___diff_y_i)
gen double difference=___cum_diff_y_i[_n]-___cum_diff_y_i[_n-1]
gen double comparing=difference-___diff_y_i

More specifically, the issue arises when I creat the variable "___cum_diff_y_i". Since it is (supposed to be) the running sum of "___diff_y_i", by construction the variable "difference" should be exactly the same as "___diff_y_i". However, I created the variable "comparing" to check that this was actually true and to my surprise, there are many cases in which they differ, although, by very small numbers. See here a subsample of the data set where you can see what I'm getting.

___diff_y_i	___cum_diff_y_i	difference	comparing
0	0
-8.1E+07	-8.1E+07	-8.1E+07	0
-2784104	-8.4E+07	-2784104	-9.3E-10
-9907621	-9.4E+07	-9907621	7.45E-09
-1.2E+07	-1.1E+08	-1.2E+07	-5.6E-09
-2.1E+07	-1.3E+08	-2.1E+07	-3.7E-09
-3.4E+07	-1.6E+08	-3.4E+07	-1.5E-08
-1.1E+07	-1.7E+08	-1.1E+07	3.73E-09
-2869477	-1.8E+08	-2869477	-1.2E-08
-1.6E+07	-1.9E+08	-1.6E+07	1.3E-08
-1.1E+07	-2E+08	-1.1E+07	7.45E-09
-8204058	-2.1E+08	-8204058	-1.1E-08
-8377956	-2.2E+08	-8377956	-5.6E-09
-1.5E+07	-2.3E+08	-1.5E+07	1.3E-08
-5970314	-2.4E+08	-5970314	-1E-08
-2E+07	-2.6E+08	-2E+07	-3.7E-09
-5340540	-2.7E+08	-5340540	3.73E-09
-1849475	-2.7E+08	-1849475	1.16E-08
-5497057	-2.7E+08	-5497057	4.66E-09
-6667035	-2.8E+08	-6667035	1.3E-08
-7224984	-2.9E+08	-7224984	0
-4901580	-2.9E+08	-4901580	1.96E-08
-4280221	-3E+08	-4280221	-7.5E-09

First, I've tried calculating manually the running sum, in case the problem was in the function, but I got the same result. Then, I thought that the problem might be with the precision of the numbers. However, adding double to the gen commands also didn't change the problem. So, I was wondering what else could be the problem. Any help would be extremely welcome!

Attached Files

Sampledata.dta (333.5 KB, 1 view)

Last edited by Caterina Brest Lopez; 23 Jun 2020, 16:29.

Tags: None

Sergiy Radyakin

Join Date: Apr 2014

Posts: 1867
#2

23 Jun 2020, 19:00

To explain very crudely, Stata operates on floating point numbers, and when you do math on FPNs there is a certain (and fixed) accuracy of calculations.
This means abs(A?B - A?B) < epsilon, where ? is any operation, such as addition or multiplication.
Theoretically, you should not even expect A?B to be the same when you run it several times (though it commonly remains constant, but may change if you re-run e.g. from 32-bit to 64-bit or from LoHi to HiLo machine).

For doubles about 16 digits are accurate in representation of a number.
It is easy to see from your data that the numbers are large Xe+07 and the differences are small: Ye-08.
The difference between e+07 and e-08 is about 15-16 digits, and that's the promised precision.

A much more accurate and detailed explanation of precision is in Bill Gould's penultimate guide to precision.
1 like
Comment
Caterina Brest Lopez

Join Date: Dec 2019

Posts: 46
#3

24 Jun 2020, 11:43

Thank you, Sergiy Radyakin for your explanation and the reference!!
Comment

Announcement

Issue with running sum

Comment

Comment