The modulus -mod()- function is giving me negative values. Is this a bug?

Joro Kolev

Join Date: Aug 2018
Posts: 3050

The modulus -mod()- function is giving me negative values. Is this a bug?

06 Jan 2021, 05:33

The help for the modulus function, -mod()- reads:

mod(x,y)
Description: the modulus of x with respect to y
mod(x,y) = x - y*floor(x/y)
mod(x,0) = .
Domain x: -8e+307 to 8e+307
Domain y: 0 to 8e+307
Range: 0 to 8e+307

However it is giving me negative values, i.e., it goes in the negative Range. And as far as I can see I am not doing anything out of the limits of its Domain:

Code:

. clear

. 
. sca Seed = 2147483647

. 
. set obs 10000000
number of observations (_N) was 0, now 10,000,000

. 
. gen double x = _n*Seed^2

. 
. summ x

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           x | 10,000,000    2.31e+25    1.33e+25   4.61e+18   4.61e+25

. 
. replace x = mod(x, 1000000000)
(10,000,000 real changes made)

. 
. summ x

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           x | 10,000,000    5.86e+07    6.72e+08  -3.29e+09   4.29e+09

Tags: None

William Lisowski

Join Date: Dec 2014
Posts: 10150

06 Jan 2021, 13:48

I agree with your interpretation. The following example suggests that the code run by Stata does not identically match the code in the documentation. This would be well referred to Stata Technical Services.

Code:

. clear

. scalar Seed = 2147483647

. set obs 10000000
number of observations (_N) was 0, now 10,000,000

. gen double x = _n*Seed^2

. gen double y = 1000000000

. gen double fxy = floor(x/y)

. gen double mxy = y * fxy

. gen double zv = mod(x,y)

. gen double zc = mod(x,1000000000)

. summ x y fxy mxy

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           x | 10,000,000    2.31e+25    1.33e+25   4.61e+18   4.61e+25
           y | 10,000,000    1.00e+09           0   1.00e+09   1.00e+09
         fxy | 10,000,000    2.31e+16    1.33e+16   4.61e+09   4.61e+16
         mxy | 10,000,000    2.31e+25    1.33e+25   4.61e+18   4.61e+25

. summ zv zc

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
          zv | 10,000,000    5.86e+07    6.72e+08  -3.29e+09   4.29e+09
          zc | 10,000,000    5.86e+07    6.72e+08  -3.29e+09   4.29e+09

Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

06 Jan 2021, 15:32

I am embarrassed to admit that the analysis in post #2 was incomplete and misleading. Here is a corrected analysis; discussion follows.

Code:

. clear

. scalar Seed = 2147483647

. set obs 10000000
number of observations (_N) was 0, now 10,000,000

. generate double x = _n*Seed^2

. generate double y = 1000000000

. generate double fxy = floor(x/y)

. generate double pxy = y * fxy

. generate double mxy = x - pxy

. generate double zv = mod(x,y)

. generate double zc = mod(x,1000000000)

. summarize x y fxy pxy mxy

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           x | 10,000,000    2.31e+25    1.33e+25   4.61e+18   4.61e+25
           y | 10,000,000    1.00e+09           0   1.00e+09   1.00e+09
         fxy | 10,000,000    2.31e+16    1.33e+16   4.61e+09   4.61e+16
         pxy | 10,000,000    2.31e+25    1.33e+25   4.61e+18   4.61e+25
         mxy | 10,000,000    3.31e+07    7.62e+08  -4.29e+09   4.29e+09

. summarize zv zc

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
          zv | 10,000,000    5.86e+07    6.72e+08  -3.29e+09   4.29e+09
          zc | 10,000,000    5.86e+07    6.72e+08  -3.29e+09   4.29e+09

. generate neg = x<pxy

. tab neg

        neg |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |  9,744,294       97.44       97.44
          1 |    255,706        2.56      100.00
------------+-----------------------------------
      Total | 10,000,000      100.00

. generate tolist = cond(neg==1,sum(neg),0)

. format %21x x pxy

. list x pxy if inrange(tolist,1,10)

          +-----------------------------------------------+
          |                     x                     pxy |
          |-----------------------------------------------|
   15345. | +1.df87fff881e00X+04b   +1.df87fff881e01X+04b |
   15564. | +1.e65ffff866800X+04b   +1.e65ffff866801X+04b |
   15783. | +1.ed37fff84b200X+04b   +1.ed37fff84b201X+04b |
   16002. | +1.f40ffff82fc00X+04b   +1.f40ffff82fc01X+04b |
   16221. | +1.fae7fff814600X+04b   +1.fae7fff814601X+04b |
          |-----------------------------------------------|
   30690. | +1.df87fff881e00X+04c   +1.df87fff881e01X+04c |
   30909. | +1.e2f3fff874300X+04c   +1.e2f3fff874301X+04c |
   31128. | +1.e65ffff866800X+04c   +1.e65ffff866801X+04c |
   31347. | +1.e9cbfff858d00X+04c   +1.e9cbfff858d01X+04c |
   31566. | +1.ed37fff84b200X+04c   +1.ed37fff84b201X+04c |
          +-----------------------------------------------+

What we have is the result of precision errors. Dividing x by 10,000,000 will not be accurate, sometimes the floor of the result will be slightly larger than the actual value, enough that when it is multiplied by 10,000,000 the product is greater than the original value of x.

Here are the results of rerunning the code but replacing 10,000,000 with 8,388,608 (2²³). For this, the division will be exact.

Code:

. summarize x y fxy pxy mxy

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           x | 10,000,000    2.31e+25    1.33e+25   4.61e+18   4.61e+25
           y | 10,000,000     8388608           0    8388608    8388608
         fxy | 10,000,000    2.75e+18    1.59e+18   5.50e+11   5.50e+18
         pxy | 10,000,000    2.31e+25    1.33e+25   4.61e+18   4.61e+25
         mxy | 10,000,000           0           0          0          0

. summarize zv zc

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
          zv | 10,000,000           0           0          0          0
          zc | 10,000,000           0           0          0          0

Comment

Joro Kolev

Join Date: Aug 2018

Posts: 3050
#4

07 Jan 2021, 03:22

Thank you very much William Lisowski , for uncovering what is going on here, and resolving the mystery.

Yet I think that if a function has a domain D that supposedly maps to a range R, any supplied legitimate value in D should result in the function producing a legitimate value in R.

Hence I will write to Stata support, and I will forward to them also your explanation of why the problem occurs.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#5

07 Jan 2021, 10:15

On further reflection, I think there is no good solution to the example you present. Presumably you are trying to extract the rightmost seven digits of the variable x in each of your 10,000,000 observations. Your problem lies in the following comparisons based on the first observation, where x = 1*Seed².

Code:

Seed^2 4,611,686,014,132,420,609 largest decimal integer precisely representable in a Stata double variable 9,007,199,254,740,992 . display %30.0fc 2147483647^2 4,611,686,014,132,420,608 . display %30.0fc 10000000*floor(2147483647^2/10000000) 4,611,686,014,129,999,872 . display %30.0fc 2147483647^2 - 10000000*floor(2147483647^2/10000000) 2,420,736

The third number shows us that Stata starts off with a misrepresentation of the number central to your calculations.

The fourth number does not end with 7 zeroes as we might have expected to obtain from multiplying an integer floor() by 10,000,000.

The fifth number is neither 2,420,609 nor 2,420,608, the last seven digits of the value of 2147483647^2 represented in the first and third numbers.

The problem is that you have exceeded the range of the precise integer calculation required by your example. Stata has no inherent way of understanding that you require precise integer calculations if you have declared a floating point (float or double) variable. Had you instead worked with long variables the problem would have been detected by Stata.

Code:

. clear . set obs 1 number of observations (_N) was 0, now 1 . generate long Seed = 2147483647 . generate long x = _n*Seed^2 (1 missing value generated)
Comment

Announcement

The modulus -mod()- function is giving me negative values. Is this a bug?

Comment

Comment

Comment

Comment