Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Decimal Points precisions

    I have noticed a peculiar pattern of decimal points when one number is divided on another. Specifically, see the following example, where I generate ri variable, rank its values, convert those ranks in into percentages. Since there are 10 observations, and rank function assigns values to each one of them from 1 to 10, dividing each value on 10 yields values from 0.1 to 1. However, the problem is many of these values are not strictly rounded to 1 decimal, rather when you double click on them, the values are different. For example, the third value shows .30000001 instead of 0.3.
    Code:
    set obs 10
    gen ri=uniform()
    egen rank=rank(ri)
    egen N=count(ri)
    gen pc=rank/N
    When I apply if qualifier, the argument fails. For example,
    Code:
    sort pc
    assert pc==3 in 3
    assertion is false
    r(9);
    The probblem is not unique with rank, it is general in nature. For example,
    Code:
    clear
    input float ri
    1 
    2 
    3 
    4 
    5 
    6 
    7 
    8 
    9 
    10 
    end
    gen pc=ri/10
    Last edited by Attaullah Shah; 24 Jul 2015, 04:52.
    Regards
    --------------------------------------------------
    Attaullah Shah, PhD.
    Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
    FinTechProfessor.com
    https://asdocx.com
    Check out my asdoc program, which sends outputs to MS Word.
    For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

  • #2
    You are quite right. This problem is general in nature. A one-sentence summary is that users sometimes see the consequences of the fact that Stata necessarily uses binary approximations. 0.1 is the canonical example: it is an exact decimal, but there is no exact binary equivalent.

    For much, much more, see many posts in this forum under the heading precision and/or (for example)

    Search of official help files, FAQs, Examples, SJs, and STBs

    [U] Chapter 13.12 . . . . . . . . . . . . . Precision and problems therein
    (help precision)

    Blog . . . . . . . . . . . . . . . . . . The penultimate guide to precision
    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . W. Gould
    4/12 http://blog.stata.com/2012/04/02/the-penultimate-
    guide-to-precision/

    Blog . . . . . . . . . . . . . . . . . . . . Precision (yet again), part II
    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . W. Gould
    6/11 http://blog.stata.com/2011/06/23/pre...again-part-ii/

    Blog . . . . . . . . . . . . . . . . . . . . Precision (yet again), part I
    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . W. Gould
    6/11 http://blog.stata.com/2011/06/17/pre...-again-part-i/

    Blog . . . . . . . . . . . . . . . . . How to read the %21x format, part 2
    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . W. Gould
    2/11 http://blog.stata.com/2011/02/10/
    how-to-read-the-percent-21x-format-part-2/

    FAQ . . . . . . . . . Comparing floating-point values (the float function)
    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . J. Wernow
    9/05 Why can't I compare two values that I know are equal?
    http://www.stata.com/support/faqs/data-management/
    comparing-floating-point-values/

    FAQ . . . . . . . . . . . . . . . . . . . Results of the mod(x,y) function
    . . . . . . . . . . . . . . . . . . . . . N. J. Cox and T. J. Steichen
    9/05 Why does the mod(x,y) function sometimes give
    puzzling results?
    Why is mod(0.3,0.1) not equal to 0?
    http://www.stata.com/support/faqs/data-management/
    mod-function/

    FAQ . . . . . . . . . . . . . . . . . The accuracy of the float data type
    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . W. Gould
    5/01 How many significant digits are there in a float?
    http://www.stata.com/support/faqs/data-management/
    float-data-type/

    FAQ . . . . . . . . . Why am I losing precision with large whole numbers?
    . . . . . . . . . . . . . . . . . . UCLA Academic Technology Services
    7/08 http://www.ats.ucla.edu/stat/stata/faq/longid.htm

    SJ-8-2 pr0038 Mata Matters: Overflow, underflow & IEEE floating-point format
    . . . . . . . . . . . . . . . . . . . . . . . . . . . . J. M. Linhart
    Q2/08 SJ 8(2):255--268 (no commands)
    focuses on underflow and overflow and details of how
    floating-point numbers are stored in the IEEE 754
    floating-point standard

    SJ-6-4 pr0025 . . . . . . . . . . . . . . . . . . . Mata matters: Precision
    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . W. Gould
    Q4/06 SJ 6(4):550--560 (no commands)
    looks at programming implications of the floating-point,
    base-2 encoding that modern computers use

    SJ-6-2 dm0022 . Tip 33: Sweet sixteen: Hexadec. formats & precision problems
    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
    Q2/06 SJ 6(2):282--283 (no commands)
    tip for using hexadecimal formats to understand precision
    problems in Stata

    Comment


    • #3
      Thanks Nicks, for less sophisticated users, I might suggest that the problem is handled to some extent using double instead of float. For example,
      Code:
      clear
      input float ri
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      end
      gen double pc=ri/10
      Regards
      --------------------------------------------------
      Attaullah Shah, PhD.
      Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
      FinTechProfessor.com
      https://asdocx.com
      Check out my asdoc program, which sends outputs to MS Word.
      For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

      Comment


      • #4
        I don't know precisely who qualifies as "less sophisticated" here.

        But "to some extent" is the right wording.

        In a way, the problem is with user perception. Using an appropriate display format is one answer for reducing puzzlement. Rounding isn't: round can't convert to exact decimals.

        Comment


        • #5
          I think the problem is more than "user perception". See the example,
          Code:
          clear
          input float ri
          1
          2
          3
          4
          5
          6
          7
          8
          9
          10
          end
          gen double pc=ri/10
          gen pc2=ri/10
          sort pc
          assert pc==.3 in 3
          assert pc2==.3 in 3
          assert fails in the case of pc2, so here I think the Stata perception matters, not users
          Regards
          --------------------------------------------------
          Attaullah Shah, PhD.
          Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
          FinTechProfessor.com
          https://asdocx.com
          Check out my asdoc program, which sends outputs to MS Word.
          For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

          Comment


          • #6
            "In a way" was my wording. As said, these matters have been discussed many times over and how to think about them explained repeatedly.

            Comment


            • #7
              re: #5, it's not a question of perception; you just have to compare apples with apples

              Code:
              clear
              input float ri
              1
              2
              3
              4
              5
              6
              7
              8
              9
              10
              end
              gen double pc=ri/10
              gen pc2=ri/10
              sort pc
              assert pc==.3 in 3
              assert pc2==float(.3) in 3

              Comment


              • #8
                Robert, you might have gone through all the messages. My intent in the first message is clear. The fraction 3/10 should return 0.3, as we expect it in any mathematical principle. I just wanted to get it from Stata. So the use of double with creating a new variable does that without further modification or attaching further variable types.
                Regards
                --------------------------------------------------
                Attaullah Shah, PhD.
                Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
                FinTechProfessor.com
                https://asdocx.com
                Check out my asdoc program, which sends outputs to MS Word.
                For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

                Comment


                • #9
                  Attaullah: In turn we might ask whether you have read (some of) the references given earlier, as you still seem to be missing the fundamental point.

                  Using a double just gives you a better (binary) approximation to the decimal 0.3 than does a float. That it appears more satisfactory in output is mostly to the credit of default display formats. 0.3 is one of many simple exact decimals that cannot be matched by exact binary equivalents.

                  Consider these experiments:

                  . set obs 1
                  number of observations (_N) was 0, now 1

                  . gen myfloat = 0.3

                  . gen double mydouble = 0.3

                  . l

                  +--------------------+
                  | myfloat mydouble |
                  |--------------------|
                  1. | .3 .3 |
                  +--------------------+

                  . di myfloat[1]
                  .30000001

                  . di mydouble[1]
                  .3

                  . di %23.18f mydouble[1]
                  0.299999999999999990

                  . di %23.18f myfloat[1]
                  0.300000011920928960

                  So, 0.3 held as a double is really just a better binary approximation to 0.3, not 0.3 itself.

                  If you want perfect arithmetic to the first (second, third, ...) decimal place the only way to get it is to multiply by 10, 100, 1000, ..., work in integers and finally write your own display routines to emit strings with the decimal point shifted.

                  For almost no statistical purposes is that really needed. The advice for users who become puzzled by this is to learn to understand it and then to appreciate that it doesn't really matter any way.

                  But your goal that 0.3 should be exactly that in Stata, reasonable though it sounds from understanding elementary mathematics, is in a strict sense impossible.

                  Comment

                  Working...
                  X