Decimal Points precisions

Attaullah Shah

Join Date: Aug 2014

Posts: 1669
#1

Decimal Points precisions

24 Jul 2015, 04:47

I have noticed a peculiar pattern of decimal points when one number is divided on another. Specifically, see the following example, where I generate ri variable, rank its values, convert those ranks in into percentages. Since there are 10 observations, and rank function assigns values to each one of them from 1 to 10, dividing each value on 10 yields values from 0.1 to 1. However, the problem is many of these values are not strictly rounded to 1 decimal, rather when you double click on them, the values are different. For example, the third value shows .30000001 instead of 0.3.

Code:

set obs 10 gen ri=uniform() egen rank=rank(ri) egen N=count(ri) gen pc=rank/N

When I apply if qualifier, the argument fails. For example,

Code:

sort pc assert pc==3 in 3 assertion is false r(9);

The probblem is not unique with rank, it is general in nature. For example,

Code:

clear input float ri 1 2 3 4 5 6 7 8 9 10 end gen pc=ri/10

Last edited by Attaullah Shah; 24 Jul 2015, 04:52.

Regards
--------------------------------------------------
Attaullah Shah, PhD.
Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
FinTechProfessor.com
https://asdocx.com
Check out my asdoc program, which sends outputs to MS Word.
For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35610
#2

24 Jul 2015, 05:07

You are quite right. This problem is general in nature. A one-sentence summary is that users sometimes see the consequences of the fact that Stata necessarily uses binary approximations. 0.1 is the canonical example: it is an exact decimal, but there is no exact binary equivalent.

For much, much more, see many posts in this forum under the heading precision and/or (for example)

Search of official help files, FAQs, Examples, SJs, and STBs

[U] Chapter 13.12 . . . . . . . . . . . . . Precision and problems therein
(help precision)

Blog . . . . . . . . . . . . . . . . . . The penultimate guide to precision
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . W. Gould
4/12 http://blog.stata.com/2012/04/02/the-penultimate-
guide-to-precision/

Blog . . . . . . . . . . . . . . . . . . . . Precision (yet again), part II
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . W. Gould
6/11 http://blog.stata.com/2011/06/23/pre...again-part-ii/

Blog . . . . . . . . . . . . . . . . . . . . Precision (yet again), part I
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . W. Gould
6/11 http://blog.stata.com/2011/06/17/pre...-again-part-i/

Blog . . . . . . . . . . . . . . . . . How to read the %21x format, part 2
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . W. Gould
2/11 http://blog.stata.com/2011/02/10/
how-to-read-the-percent-21x-format-part-2/

FAQ . . . . . . . . . Comparing floating-point values (the float function)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . J. Wernow
9/05 Why can't I compare two values that I know are equal?
http://www.stata.com/support/faqs/data-management/
comparing-floating-point-values/

FAQ . . . . . . . . . . . . . . . . . . . Results of the mod(x,y) function
. . . . . . . . . . . . . . . . . . . . . N. J. Cox and T. J. Steichen
9/05 Why does the mod(x,y) function sometimes give
puzzling results?
Why is mod(0.3,0.1) not equal to 0?
http://www.stata.com/support/faqs/data-management/
mod-function/

FAQ . . . . . . . . . . . . . . . . . The accuracy of the float data type
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . W. Gould
5/01 How many significant digits are there in a float?
http://www.stata.com/support/faqs/data-management/
float-data-type/

FAQ . . . . . . . . . Why am I losing precision with large whole numbers?
. . . . . . . . . . . . . . . . . . UCLA Academic Technology Services
7/08 http://www.ats.ucla.edu/stat/stata/faq/longid.htm

SJ-8-2 pr0038 Mata Matters: Overflow, underflow & IEEE floating-point format
. . . . . . . . . . . . . . . . . . . . . . . . . . . . J. M. Linhart
Q2/08 SJ 8(2):255--268 (no commands)
focuses on underflow and overflow and details of how
floating-point numbers are stored in the IEEE 754
floating-point standard

SJ-6-4 pr0025 . . . . . . . . . . . . . . . . . . . Mata matters: Precision
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . W. Gould
Q4/06 SJ 6(4):550--560 (no commands)
looks at programming implications of the floating-point,
base-2 encoding that modern computers use

SJ-6-2 dm0022 . Tip 33: Sweet sixteen: Hexadec. formats & precision problems
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
Q2/06 SJ 6(2):282--283 (no commands)
tip for using hexadecimal formats to understand precision
problems in Stata
Comment
Attaullah Shah

Join Date: Aug 2014

Posts: 1669
#3

24 Jul 2015, 05:13

Thanks Nicks, for less sophisticated users, I might suggest that the problem is handled to some extent using double instead of float. For example,

Code:

clear input float ri 1 2 3 4 5 6 7 8 9 10 end gen double pc=ri/10

Regards
--------------------------------------------------
Attaullah Shah, PhD.
Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
FinTechProfessor.com
https://asdocx.com
Check out my asdoc program, which sends outputs to MS Word.
For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35610
#4

24 Jul 2015, 05:19

I don't know precisely who qualifies as "less sophisticated" here.

But "to some extent" is the right wording.

In a way, the problem is with user perception. Using an appropriate display format is one answer for reducing puzzlement. Rounding isn't: round can't convert to exact decimals.
Comment
Attaullah Shah

Join Date: Aug 2014

Posts: 1669
#5

24 Jul 2015, 05:26

I think the problem is more than "user perception". See the example,

Code:

clear input float ri 1 2 3 4 5 6 7 8 9 10 end gen double pc=ri/10 gen pc2=ri/10 sort pc assert pc==.3 in 3 assert pc2==.3 in 3

assert fails in the case of pc2, so here I think the Stata perception matters, not users

Regards
--------------------------------------------------
Attaullah Shah, PhD.
Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
FinTechProfessor.com
https://asdocx.com
Check out my asdoc program, which sends outputs to MS Word.
For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35610
#6

24 Jul 2015, 05:34

"In a way" was my wording. As said, these matters have been discussed many times over and how to think about them explained repeatedly.
Comment
Robert Picard

Join Date: Mar 2014

Posts: 1536
#7

24 Jul 2015, 08:21

re: #5, it's not a question of perception; you just have to compare apples with apples

Code:

clear input float ri 1 2 3 4 5 6 7 8 9 10 end gen double pc=ri/10 gen pc2=ri/10 sort pc assert pc==.3 in 3 assert pc2==float(.3) in 3
1 like
Comment
Attaullah Shah

Join Date: Aug 2014

Posts: 1669
#8

25 Jul 2015, 04:00

Robert, you might have gone through all the messages. My intent in the first message is clear. The fraction 3/10 should return 0.3, as we expect it in any mathematical principle. I just wanted to get it from Stata. So the use of double with creating a new variable does that without further modification or attaching further variable types.

Regards
--------------------------------------------------
Attaullah Shah, PhD.
Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
FinTechProfessor.com
https://asdocx.com
Check out my asdoc program, which sends outputs to MS Word.
For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35610
#9

25 Jul 2015, 08:19

Attaullah: In turn we might ask whether you have read (some of) the references given earlier, as you still seem to be missing the fundamental point.

Using a double just gives you a better (binary) approximation to the decimal 0.3 than does a float. That it appears more satisfactory in output is mostly to the credit of default display formats. 0.3 is one of many simple exact decimals that cannot be matched by exact binary equivalents.

Consider these experiments:

. set obs 1
number of observations (_N) was 0, now 1

. gen myfloat = 0.3

. gen double mydouble = 0.3

. l

+--------------------+
| myfloat mydouble |
|--------------------|
1. | .3 .3 |
+--------------------+

. di myfloat[1]
.30000001

. di mydouble[1]
.3

. di %23.18f mydouble[1]
0.299999999999999990

. di %23.18f myfloat[1]
0.300000011920928960

So, 0.3 held as a double is really just a better binary approximation to 0.3, not 0.3 itself.

If you want perfect arithmetic to the first (second, third, ...) decimal place the only way to get it is to multiply by 10, 100, 1000, ..., work in integers and finally write your own display routines to emit strings with the decimal point shifted.

For almost no statistical purposes is that really needed. The advice for users who become puzzled by this is to learn to understand it and then to appreciate that it doesn't really matter any way.

But your goal that 0.3 should be exactly that in Stata, reasonable though it sounds from understanding elementary mathematics, is in a strict sense impossible.
Comment

Announcement

Decimal Points precisions

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment