Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • The modulus -mod()- function is giving me negative values. Is this a bug?

    The help for the modulus function, -mod()- reads:

    mod(x,y)
    Description: the modulus of x with respect to y
    mod(x,y) = x - y*floor(x/y)
    mod(x,0) = .
    Domain x: -8e+307 to 8e+307
    Domain y: 0 to 8e+307
    Range: 0 to 8e+307

    However it is giving me negative values, i.e., it goes in the negative Range. And as far as I can see I am not doing anything out of the limits of its Domain:

    Code:
    . clear
    
    . 
    . sca Seed = 2147483647
    
    . 
    . set obs 10000000
    number of observations (_N) was 0, now 10,000,000
    
    . 
    . gen double x = _n*Seed^2
    
    . 
    . summ x
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
               x | 10,000,000    2.31e+25    1.33e+25   4.61e+18   4.61e+25
    
    . 
    . replace x = mod(x, 1000000000)
    (10,000,000 real changes made)
    
    . 
    . summ x
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
               x | 10,000,000    5.86e+07    6.72e+08  -3.29e+09   4.29e+09

  • #2
    I agree with your interpretation. The following example suggests that the code run by Stata does not identically match the code in the documentation. This would be well referred to Stata Technical Services.
    Code:
    . clear
    
    . scalar Seed = 2147483647
    
    . set obs 10000000
    number of observations (_N) was 0, now 10,000,000
    
    . gen double x = _n*Seed^2
    
    . gen double y = 1000000000
    
    . gen double fxy = floor(x/y)
    
    . gen double mxy = y * fxy
    
    . gen double zv = mod(x,y)
    
    . gen double zc = mod(x,1000000000)
    
    . summ x y fxy mxy
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
               x | 10,000,000    2.31e+25    1.33e+25   4.61e+18   4.61e+25
               y | 10,000,000    1.00e+09           0   1.00e+09   1.00e+09
             fxy | 10,000,000    2.31e+16    1.33e+16   4.61e+09   4.61e+16
             mxy | 10,000,000    2.31e+25    1.33e+25   4.61e+18   4.61e+25
    
    . summ zv zc
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
              zv | 10,000,000    5.86e+07    6.72e+08  -3.29e+09   4.29e+09
              zc | 10,000,000    5.86e+07    6.72e+08  -3.29e+09   4.29e+09

    Comment


    • #3
      I am embarrassed to admit that the analysis in post #2 was incomplete and misleading. Here is a corrected analysis; discussion follows.
      Code:
      . clear
      
      . scalar Seed = 2147483647
      
      . set obs 10000000
      number of observations (_N) was 0, now 10,000,000
      
      . generate double x = _n*Seed^2
      
      . generate double y = 1000000000
      
      . generate double fxy = floor(x/y)
      
      . generate double pxy = y * fxy
      
      . generate double mxy = x - pxy
      
      . generate double zv = mod(x,y)
      
      . generate double zc = mod(x,1000000000)
      
      . summarize x y fxy pxy mxy
      
          Variable |        Obs        Mean    Std. Dev.       Min        Max
      -------------+---------------------------------------------------------
                 x | 10,000,000    2.31e+25    1.33e+25   4.61e+18   4.61e+25
                 y | 10,000,000    1.00e+09           0   1.00e+09   1.00e+09
               fxy | 10,000,000    2.31e+16    1.33e+16   4.61e+09   4.61e+16
               pxy | 10,000,000    2.31e+25    1.33e+25   4.61e+18   4.61e+25
               mxy | 10,000,000    3.31e+07    7.62e+08  -4.29e+09   4.29e+09
      
      . summarize zv zc
      
          Variable |        Obs        Mean    Std. Dev.       Min        Max
      -------------+---------------------------------------------------------
                zv | 10,000,000    5.86e+07    6.72e+08  -3.29e+09   4.29e+09
                zc | 10,000,000    5.86e+07    6.72e+08  -3.29e+09   4.29e+09
      
      . generate neg = x<pxy
      
      . tab neg
      
              neg |      Freq.     Percent        Cum.
      ------------+-----------------------------------
                0 |  9,744,294       97.44       97.44
                1 |    255,706        2.56      100.00
      ------------+-----------------------------------
            Total | 10,000,000      100.00
      
      . generate tolist = cond(neg==1,sum(neg),0)
      
      . format %21x x pxy
      
      . list x pxy if inrange(tolist,1,10)
      
                +-----------------------------------------------+
                |                     x                     pxy |
                |-----------------------------------------------|
         15345. | +1.df87fff881e00X+04b   +1.df87fff881e01X+04b |
         15564. | +1.e65ffff866800X+04b   +1.e65ffff866801X+04b |
         15783. | +1.ed37fff84b200X+04b   +1.ed37fff84b201X+04b |
         16002. | +1.f40ffff82fc00X+04b   +1.f40ffff82fc01X+04b |
         16221. | +1.fae7fff814600X+04b   +1.fae7fff814601X+04b |
                |-----------------------------------------------|
         30690. | +1.df87fff881e00X+04c   +1.df87fff881e01X+04c |
         30909. | +1.e2f3fff874300X+04c   +1.e2f3fff874301X+04c |
         31128. | +1.e65ffff866800X+04c   +1.e65ffff866801X+04c |
         31347. | +1.e9cbfff858d00X+04c   +1.e9cbfff858d01X+04c |
         31566. | +1.ed37fff84b200X+04c   +1.ed37fff84b201X+04c |
                +-----------------------------------------------+
      What we have is the result of precision errors. Dividing x by 10,000,000 will not be accurate, sometimes the floor of the result will be slightly larger than the actual value, enough that when it is multiplied by 10,000,000 the product is greater than the original value of x.

      Here are the results of rerunning the code but replacing 10,000,000 with 8,388,608 (223). For this, the division will be exact.
      Code:
      . summarize x y fxy pxy mxy
      
          Variable |        Obs        Mean    Std. Dev.       Min        Max
      -------------+---------------------------------------------------------
                 x | 10,000,000    2.31e+25    1.33e+25   4.61e+18   4.61e+25
                 y | 10,000,000     8388608           0    8388608    8388608
               fxy | 10,000,000    2.75e+18    1.59e+18   5.50e+11   5.50e+18
               pxy | 10,000,000    2.31e+25    1.33e+25   4.61e+18   4.61e+25
               mxy | 10,000,000           0           0          0          0
      
      . summarize zv zc
      
          Variable |        Obs        Mean    Std. Dev.       Min        Max
      -------------+---------------------------------------------------------
                zv | 10,000,000           0           0          0          0
                zc | 10,000,000           0           0          0          0

      Comment


      • #4
        Thank you very much William Lisowski , for uncovering what is going on here, and resolving the mystery.

        Yet I think that if a function has a domain D that supposedly maps to a range R, any supplied legitimate value in D should result in the function producing a legitimate value in R.

        Hence I will write to Stata support, and I will forward to them also your explanation of why the problem occurs.

        Comment


        • #5
          On further reflection, I think there is no good solution to the example you present. Presumably you are trying to extract the rightmost seven digits of the variable x in each of your 10,000,000 observations. Your problem lies in the following comparisons based on the first observation, where x = 1*Seed2.
          Code:
          Seed^2
           4,611,686,014,132,420,609
          
          largest decimal integer precisely representable in a Stata double variable
               9,007,199,254,740,992
          
          . display %30.0fc 2147483647^2
           4,611,686,014,132,420,608
          
          . display %30.0fc 10000000*floor(2147483647^2/10000000)
           4,611,686,014,129,999,872
          
          . display %30.0fc 2147483647^2 - 10000000*floor(2147483647^2/10000000)
                           2,420,736
          The third number shows us that Stata starts off with a misrepresentation of the number central to your calculations.

          The fourth number does not end with 7 zeroes as we might have expected to obtain from multiplying an integer floor() by 10,000,000.

          The fifth number is neither 2,420,609 nor 2,420,608, the last seven digits of the value of 2147483647^2 represented in the first and third numbers.

          The problem is that you have exceeded the range of the precise integer calculation required by your example. Stata has no inherent way of understanding that you require precise integer calculations if you have declared a floating point (float or double) variable. Had you instead worked with long variables the problem would have been detected by Stata.
          Code:
          . clear
          
          . set obs 1
          number of observations (_N) was 0, now 1
          
          . generate long Seed = 2147483647
          
          . generate long x = _n*Seed^2
          (1 missing value generated)


          Comment

          Working...
          X