What rules dictate how Python floats are rounded? - python

The Python documentation on floats state
0.1
0.1000000000000000055511151231257827021181583404541015625
That is more digits than most people find useful, so Python keeps the
number of digits manageable by displaying a rounded value instead
0.1
What are the rules surrounding which floats get rounded for display and which ones don't? I've encountered some funny scenarios where
1.1+2.2 returns 3.3000000000000003 (unrounded)
but
1.0+2.3 returns 3.3 (rounded)
I know that the decimal module exists for making these things consistent, but am curious as to what determines the displayed rounding in floats.

What are the rules surrounding which floats get rounded for display
and which ones don't? I've encountered some funny scenarios where
1.1+2.2 returns 3.3000000000000003 (unrounded)
but
1.0+2.3 returns 3.3 (rounded)
Part of the explanation is of course that 1.1 + 2.2 and 1.0 + 2.3 produce different floating-point numbers, and part of the explanation for that is that 1.1 is not really 11/10, 2.2 not really 22/10, and of course floating-point + is not rational addition either.
Many modern programming languages, including the most recent Python variations, when displaying a double-precision floating-point value d, show exactly the number of decimal digits necessary for the decimal representation, re-parsed as a double, to be converted again exactly to d. As a consequence:
there is exactly one floating-point value that prints as 3.3. There cannot be two, because they would have to be the same by application of the definition, and there is at least one because if you convert the decimal representation 3.3 to a double, you get a double that has the property of producing the string “3.3” when converted to decimal with the algorithm in question.
the values are rounded for the purpose of showing them with decimal digits, but they otherwise remain the numbers that they are. So some of the “rules” that you are asking for are rules about how floating-point operations are rounded. This is very simple but you have to look at the binary representation of the arguments and results for it to be simple. If you look at the decimal representations, the rounding looks random (but it isn't).
the numbers only have a compact representation in binary. The exact value may take many decimal digits to represent exactly. “3.3000000000000003” is not “unrounded”, it is simply rounded to more digits than “3.3”, specifically, just exactly the number of digits necessary to distinguish that double-precision number from its neighbor (the one that is represented by “3.3”). They are in fact respectively the numbers below:
3.29999999999999982236431605997495353221893310546875
3.300000000000000266453525910037569701671600341796875
Of the two, 33/10 is closest to the former, so the former can be printed as “3.3”. The latter cannot be printed as “3.3”, and neither can it be printed as “3.30”, “3.300”, …, “3.300000000000000” since all these representations are equivalent and parse back to the floating-point number 3.29999999999999982236431605997495353221893310546875. So it has to be printed as “3.3000000000000003”, where the 3 is obtained because the digit 2 is followed by 6.

Related

How does Python handle repeated subtraction of floating numbers? [duplicate]

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 4 years ago.
I have a very simple code that takes a floating point number, and uses a while-loop to keep subtracting 1 until it reaches zero:
nr = 4.2
while nr > 0:
print(nr)
nr -= 1
I expected the output to look like this:
4.2
3.2
2.2
etc...
But instead, I get this:
4.2
3.2
2.2
1.2000000000000002
0.20000000000000018
Where do these weird floating numbers come from? Why does it only happen after the third time executing the loop? Also, very interestingly, this does not happen when the last decimal of nr is a 5.
What happened and how can I prevent this?
Upon execution of nr = 4.2, your Python set nr to exactly 4.20000000000000017763568394002504646778106689453125. This is the value that results from converting 4.2 to a binary-based floating-point format.
The results shown for subsequent subtractions appear to vary in the low digits solely due to formatting decisions. The default formatting for floating-point numbers does not show all of the digits. Python is not strict about floating-point behavior, but I suspect your implementation may be showing just as many decimal digits as needed to uniquely distinguish the binary floating-point number.
For “4.2”, “3.2”, and “2.2”, this is just two significant digits, because these decimal numbers are near the actual binary floating-point value.
Near 1.2, the floating-point format has more resolution (because the value dropped under 2, meaning the exponent decreased, thus shifting the effective position of the significand lower, allowing it to include another bit of resolution on an absolute scale). In consequence, there happens to be another binary floating-point number near 1.2, so “ 1.2000000000000002” is shown to distinguish the number currently in nr from this other number.
Near .2, there is even more resolution, and so there are even more binary floating-point numbers nearby, and more digits have to be used to distinguish the value.

Python float approximations interpreted as actual values by cython

Note: I already know that Python uses approximations for floating point numbers. However their behaviour is inconsistent even after a subtraction (which should not increase the representation error, should it?). Furthermore, I would like to know how to fix the problem.
Python sometimes seems to use an approximation for the actual value of a float, as you can see in the 5th and 6th element of the this_spectrum variable in the image. All values were read from a text file and should have only 2 decimals.
When you print them or use them in a Python calculation, they behave as if they have the intended value:
In: this_spectrum[4] == 1716.31
Out: True
However, after using them in a Cython script which calculates all pairwise distances between elements (i.e. performs a simple subtraction of values) and stores them in alphaMatrix, it seems the Python approximations were used instead of the actual values:
In: alphaMatrix[0][0]
Out: 161.08
In: alphaMatrix[0][0] == 161.08
Out: False
In: alphaMatrix[0][0] == 161.07999999999996
Out: True
Why is this happening, and what would be the proper way to fix this problem? Is this really a Cython problem/bug/feature(?) or is there something else going on?
Your problem is here: “a Cython script which calculates all pairwise distances…”
Essentially every floating-point operation (add, subtract, multiply, divide, using any “elementary” function such as square root, cosine, logarithm, and so on), some rounding error may be added. For the basic arithmetic operations, the result is calculated as if the exact mathematical result were calculated and then rounded to the nearest representable floating-point value. Ideally, the elementary functions would behave this way too, but most implementations have some additional error, because calculating these functions precisely is difficult.
As an example, consider calculating 1/3*18 using decimal floating-point with four digits. 1 and 3 in decimal floating-point have no error; they are representable as 1.000 and 3.000. But their quotient, ⅓, is not representable. When you do the division, the result is .3333. Then, when you multiply .3333 by 18, the exact result is 5.9994. Because our floating-point format has only four digits, we have to round this to some number we can represent. The nearest representable value is 5.999, so that is returned.
When you compare a calculated value to a constant such as 161.08, you are comparing a value with several accumulated rounding errors to a value with only one rounding error (when the decimal numeral “161.08” in the source is converted to binary floating-point, there is a rounding error). So the values are different.
You often will not see this difference when numbers are printed with default precision because only a few digits are shown. This means the full internal value is rounded to a few digits for display, and that rounding conceals the differences. If you print the numbers with more precision, you will see the differences.

1 == 2 for large numbers of 1

I'm wondering what causes this behaviour. I haven't been able to find an answer that covers this. It is probably something simple and obvious, but it is not to me. I am using python 2.7.3 in Ubuntu.
In [1]: 2 == 1.9999999999999999
Out[1]: True
In [2]: 2 == 1.999999999999999
Out[2]: False
EDIT:
To clarify my question. Is there a written(in documentation) max number of 9's where python will evaluate the expression above as being equal to 2?
Python uses floating point representation
What a floating point actually is, is a fixed-width binary number (called the "significand") plus a small integer to tell you how many powers of two to shift that value by (the "exponent"). Plus a sign bit. Just like scientific notation, but in base 2 instead of 10.
The closest 64 bit floating point value to 1.9999999999999999 is 2.0, because 64 bit floating point values (so-called "double precision") uses 52 bits of significand, which is equivalent to about 15 decimal places. So the literal 1.9999999999999999 is just another way of writing 2.0. However, the closest value to 1.999999999999999 is less than 2.0 (I think it's 1.9999999999999988897769753748434595763683319091796875 exactly, but I'm too lazy to check that's correct, I'm just relying on Python's formatting code to be exact).
I don't actually know whether the use specifically of 64 bit floats is required by the Python language, or is an implementation detail of CPython. But whatever size is used, the important thing is not specifically the number of decimal places, it is where the closest floating-point value of that size lies to your decimal literal. It will be closer for some literals than others.
Hence, 1.9999999999999999 == 2 for the same reason that 2.0 == 2 (Python allows mixed-type numeric operations including comparison, and the integer 2 is equal to the float 2.0). Whereas 1.999999999999999 != 2.
Types coercion
>>> 2 == 2.0
True
And consequences of maximum number of digits that can be represented in python :
>>> import sys
>>> sys.float_info.dig
15
>>> 1.9999999999999999
2.0
more from docs
>>> float('9876543211234567')
9876543211234568.0
note ..68 on the end instead of expected ..67
This is due to the way floats are implemented in Python. To keep it short and simple: Since floats almost always are an approximation and thus have more digits than most people find useful, the Python interpreter displays a rounded value.
More detailed, floats are stored in binary. This means that they're stored as fractions to the base 2, unlike decimal, were you can display a float as fractions to the base 10. However, most decimal fractions don't have an exact representation in binary. Because of that, they are typically stored with a precision of 53 bits. This renders them pretty much useless if you want to do more complex arithmetic operations, since you'll run into some strange problems, e. g.:
>>> 0.1 + 0.2
0.30000000000000004
>>> round(2.675, 2)
2.67
See The docs on floats as well.
Mathematically speaking, 2.0 does equal 1.9999... forever. They are two different ways of writing the same number.
However, in software, it's important to never compare two floats or decimals for equality - instead, subtract them, take the absolute value, and verify that the (always positive) difference is sufficiently low for your purposes.
EG:
if abs(value1 - value2) < 1e10:
# they are close enough
else:
# they are not
You probably should set EPSILON = 1e10, and use the symbolic constant instead of scattering 1e10 throughout your code, or better still use a comparison function.

Why is Ruby's Float#round behavior different than Python's?

"Behavior of “round” function in Python" observes that Python rounds floats like this:
>>> round(0.45, 1)
0.5
>>> round(1.45, 1)
1.4
>>> round(2.45, 1)
2.5
>>> round(3.45, 1)
3.5
>>> round(4.45, 1)
4.5
>>> round(5.45, 1)
5.5
>>> round(6.45, 1)
6.5
>>> round(7.45, 1)
7.5
>>> round(8.45, 1)
8.4
>>> round(9.45, 1)
9.4
The accepted answer confirms this is caused by the binary representation of floats being inaccurate, which is all logical.
Assuming that Ruby floats are just as inaccurate as Python's, how come Ruby floats round like a human would? Does Ruby cheat?
1.9.3p194 :009 > 0.upto(9) do |n|
1.9.3p194 :010 > puts (n+0.45).round(1)
1.9.3p194 :011?> end
0.5
1.5
2.5
3.5
4.5
5.5
6.5
7.5
8.5
9.5
Summary
Both implementations are confront the same issues surrounding binary floating point numbers.
Ruby operates directly on the floating point number with simple operations (multiply by a power of ten, adjust, and truncate).
Python converts the binary floating point number to a string using David Gay's sophisticated algorithm that yields the shortest decimal representation that is exactly equal to the binary floating point number. This does not do any additional rounding, it is an exact conversion to a string.
With the shortest string representation in-hand, Python rounds to the appropriate number of decimal places using exact string operations. The goal of the float-to-string conversion is to attempt to "undo" some of the binary floating point representation error (i.e. if you enter 6.6, Python rounds on the 6.6 rather that 6.5999999999999996.
In addition, Ruby differs from some versions of Python in rounding modes: round-away-from-zero versus round-half-even.
Detail
Ruby doesn't cheat. It starts with plain old binary float point numbers the same a Python does. Accordingly, it is subject to some of the same challenges (such 3.35 being represented at slightly more than 3.35 and 4.35 being represented as slightly less than 4.35):
>>> Decimal.from_float(3.35)
Decimal('3.350000000000000088817841970012523233890533447265625')
>>> Decimal.from_float(4.35)
Decimal('4.3499999999999996447286321199499070644378662109375')
The best way to see the implementation differences is to look at the underlying source code:
Here's a link to the Ruby source code: https://github.com/ruby/ruby/blob/trunk/numeric.c#L1587
The Python source is starts here: http://hg.python.org/cpython/file/37352a3ccd54/Python/bltinmodule.c
and finishes here: http://hg.python.org/cpython/file/37352a3ccd54/Objects/floatobject.c#l1080
The latter has an extensive comment that reveals the differences between the two implementations:
The basic idea is very simple: convert and round the double to a
decimal string using _Py_dg_dtoa, then convert that decimal string
back to a double with _Py_dg_strtod. There's one minor difficulty:
Python 2.x expects round to do round-half-away-from-zero, while
_Py_dg_dtoa does round-half-to-even. So we need some way to detect and correct the halfway cases.
Detection: a halfway value has the form k * 0.5 * 10**-ndigits for
some odd integer k. Or in other words, a rational number x is exactly
halfway between two multiples of 10**-ndigits if its 2-valuation is
exactly -ndigits-1 and its 5-valuation is at least
-ndigits. For ndigits >= 0 the latter condition is automatically satisfied for a binary float x, since any such float has nonnegative
5-valuation. For 0 > ndigits >= -22, x needs to be an integral
multiple of 5**-ndigits; we can check this using fmod. For -22 >
ndigits, there are no halfway cases: 5**23 takes 54 bits to represent
exactly, so any odd multiple of 0.5 * 10**n for n >= 23 takes at least
54 bits of precision to represent exactly.
Correction: a simple strategy for dealing with halfway cases is to
(for the halfway cases only) call _Py_dg_dtoa with an argument of
ndigits+1 instead of ndigits (thus doing an exact conversion to
decimal), round the resulting string manually, and then convert back
using _Py_dg_strtod.
In short, Python 2.7 goes to great lengths to accurately follow a round-away-from-zero rule.
In Python 3.3, it goes to equally great length to accurately follow a round-to-even rule.
Here's a little additional detail on the _Py_dg_dtoa function. Python calls the float to string function because it implements an algorithm that gives the shortest possible string representation among equal alternatives. In Python 2.6, for example, the number 1.1 shows up as 1.1000000000000001, but in Python 2.7 and later, it is simply 1.1. David Gay's sophisticated dtoa.c algorithm gives "the-result-that-people-expect" without forgoing accuracy.
That string conversion algorithm tends to make-up for some of the issues that plague any implementation of round() on binary floating point numbers (i.e. it less rounding of 4.35 start with 4.35 instead of 4.3499999999999996447286321199499070644378662109375).
That and the rounding mode (round-half-even vs round-away-from-zero) are the essential differences between the Python and Ruby round() functions.
The fundamental difference is:
Python: Convert to decimal and then round
Ruby: Round and then convert to decimal
Ruby is rounding it from the original floating point bit string, but after operating on it with 10n. You can't see the original binary value without looking very closely. The values are inexact because they are binary, and we are used to writing in decimal, and as it happens almost all of the decimal fraction strings we are likely to write do not have an exact equivalence as a base 2 fraction string.
In particular, 0.45 looks like this:
01111111101 1100110011001100110011001100110011001100110011001101
In hex, that is 3fdccccccccccccd.
It repeats in binary, the first unrepresented digit is 0xc, and the clever decimal input conversion has accurately rounded this very last fractional digit to 0xd.
This means that inside the machine, the value is greater than 0.45 by roughly 1/250. This is obviously a very, very small number but it's enough to cause the default round-nearest algorithm to round up instead of to the tie-breaker of even.
Both Python and Ruby are potentially rounding more than once as every operation effectively rounds into the least significant bit.
I'm not sure I agree that Ruby does what a human would do. I think Python is approximating what decimal arithmetic would do. Python (depending on version) is applying round-nearest to the decimal string and Ruby is applying the round nearest algorithm to a computed binary value.
Note that we can see here quite clearly the reason people say that FP is inexact. It's a reasonably true statement, but it's more true to say that we simply can't convert accurately between binary and most decimal fractions. (Some do: 0.25, 0.5, 0.75, ...) Most simple decimal numbers are repeating numbers in binary, so we can never store the exact equivalent value. But, every value we can store is known exactly and all arithmetic performed on it is performed exactly. If we wrote our fractions in binary in the first place our FP arithmetic would be considered exact.
Ruby doesn't cheat. It just chose another way to implement round.
In Ruby, 9.45.round(1) is almost equivalent to (9.45*10.0).round / 10.0.
irb(main):001:0> printf "%.20f", 9.45
9.44999999999999928946=> nil
irb(main):002:0> printf "%.20f", 9.45*10.0
94.50000000000000000000=> nil
So
irb(main):003:0> puts 9.45.round(1)
9.5
If we use such way in Python, we will get 9.5 as well.
>>> round(9.45, 1)
9.4
>>> round(9.45*10)/10
9.5

Why do simple math operations on floating point return unexpected (inaccurate) results in VB.Net and Python?

x = 4.2 - 0.1
vb.net gives 4.1000000000000005
python gives 4.1000000000000005
Excel gives 4.1
Google calc gives 4.1
What is the reason this happens?
Float/double precision.
You must remember that in binary, 4.1 = 4 + 1/10. 1/10 is an infinitely repeating sum in binary, much like 1/9 is an infinite sum in decimal.
>>> x = 4.2 - 0.1
>>> x
4.1000000000000005
>>>>print(x)
4.1
This happens because of how numbers are stored internally.
Computers represent numbers in binary, instead of decimal, as us humans are used to. With floating point numbers, computers have to make an approximation to the closest binary floating point value.
Almost all machines today (November 2000) use IEEE-754 floating point arithmetic, and almost all platforms map Python floats to IEEE-754 “double precision”. 754 doubles contain 53 bits of precision, so on input the computer strives to convert 0.1 to the closest fraction it can of the form J/2***N* where J is an integer containing exactly 53 bits.
If you print the number, it will show the approximation, truncated to a normal value. For example, the real value of 0.1 is 0.1000000000000000055511151231257827021181583404541015625.
If you really need a base 10 based number (if you don't know the answer to this question, you don't), you could use (in Python) decimal.Decimal:
>>> from decimal import Decimal
>>> Decimal("4.2") - Decimal("0.1")
Decimal("4.1")
Binary floating-point arithmetic holds many surprises like this. The problem with “0.1” is explained in precise detail below, in the “Representation Error” section. See The Perils of Floating Point for a more complete account of other common surprises.
As that says near the end, “there are no easy answers.” Still, don’t be unduly wary of floating-point! The errors in Python float operations are inherited from the floating-point hardware, and on most machines are on the order of no more than 1 part in 2**53 per operation. That’s more than adequate for most tasks, but you do need to keep in mind that it’s not decimal arithmetic, and that every float operation can suffer a new rounding error.
While pathological cases do exist, for most casual use of floating-point arithmetic you’ll see the result you expect in the end if you simply round the display of your final results to the number of decimal digits you expect. str() usually suffices, and for finer control see the str.format() method’s format specifiers in Format String Syntax.
There is no problem, really. It is just the way floats work (their internal binary representation). Anyway:
>>> from decimal import Decimal
>>> Decimal('4.2')-Decimal('0.1')
Decimal('4.1')
In vb.net, you can avoid this problem by using Decimal type instead:
Dim x As Decimal = 4.2D - 0.1D
The result is 4.1 .

Categories