How does Python handle repeated subtraction of floating numbers? [duplicate] - python

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 4 years ago.
I have a very simple code that takes a floating point number, and uses a while-loop to keep subtracting 1 until it reaches zero:
nr = 4.2
while nr > 0:
print(nr)
nr -= 1
I expected the output to look like this:
4.2
3.2
2.2
etc...
But instead, I get this:
4.2
3.2
2.2
1.2000000000000002
0.20000000000000018
Where do these weird floating numbers come from? Why does it only happen after the third time executing the loop? Also, very interestingly, this does not happen when the last decimal of nr is a 5.
What happened and how can I prevent this?

Upon execution of nr = 4.2, your Python set nr to exactly 4.20000000000000017763568394002504646778106689453125. This is the value that results from converting 4.2 to a binary-based floating-point format.
The results shown for subsequent subtractions appear to vary in the low digits solely due to formatting decisions. The default formatting for floating-point numbers does not show all of the digits. Python is not strict about floating-point behavior, but I suspect your implementation may be showing just as many decimal digits as needed to uniquely distinguish the binary floating-point number.
For “4.2”, “3.2”, and “2.2”, this is just two significant digits, because these decimal numbers are near the actual binary floating-point value.
Near 1.2, the floating-point format has more resolution (because the value dropped under 2, meaning the exponent decreased, thus shifting the effective position of the significand lower, allowing it to include another bit of resolution on an absolute scale). In consequence, there happens to be another binary floating-point number near 1.2, so “ 1.2000000000000002” is shown to distinguish the number currently in nr from this other number.
Near .2, there is even more resolution, and so there are even more binary floating-point numbers nearby, and more digits have to be used to distinguish the value.

Related

Python and R are returning different results where they should be exactly the same [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
[Python numpy code]
In [171]: A1*b
Out[171]:
array([ -7.55603523e-01, 7.18519356e-01, 3.98628050e-03,
9.27047917e-04, -1.31074698e-03, 1.44455190e-03,
1.02676602e-03, 5.03891225e-02, -1.15752426e-03,
-2.43685270e-02, 5.88382307e-03, 2.63372861e-04])
In [172]: (A1*b).sum()
Out[172]: -1.6702134467139196e-16
[R code]
> cholcholT[2,] * b
[1] -0.7556035225 0.7185193560 0.0039862805 0.0009270479 -0.0013107470
[6] 0.0014445519 0.0010267660 0.0503891225 -0.0011575243 -0.0243685270
[11] 0.0058838231 0.0002633729
> sum(cholcholT[2,] * b)
[1] -9.616873e-17
The first is the R code and second is numpy. Up until the element-wise product of two vectors, they return the same result. However, if I try to add them up, they become different. I believe it doesn't have to do with the precision settings of the two since they both are double-precision based. Why is this happening?
You are experiencing what is called catastrophic cancellation. You are subtracting numbers from each other which differ only very slightly. As a result you get numbers which have a very high error relative to their value. The error stems from rounding errors which are introduced when your system stores values which cannot be represented by the binary system accurately.
Intuitively, you can think of this as the same difficulties you have when writing 1/3 as a decimal number. You would have to write 0.3333... , so infinitely many 3s behind the decimal point. You cannot do this and your computer can't either.
So your computer has to round the numbers somewhere.
You can see the rounding errors if you use something like
"{:.20e}".format(0.1)
You will see that after the 16th digit or so the number you wanted to store (1.0000000000000000000...×10^-1) is different from the number the computer stores (1.00000000000000005551...×10^-1)
To see in which order of magnitude this inaccuracy lies, you can view the machine epsilon. In simplified terms, this value gives you the minimum amount relative to your value which you can add to your value so that the computer can still distinguish the result from the old value (so it gets not rounded away while storing the result in memory).
If you execute
import numpy as np
eps = np.finfo(float).eps
you can see that this value lies on the order of magnitude of 10^-16.
The computer reprents floats in a form like SIGN|EXPONENT|FRACTION. So to simplify greatly, If computer memory would store numbers in decimal format, a number like -0.0053 would be stored as 1|-2|.53|. 1 is for the negative sign, -2 means 'FRACTION times 10^-2'.
If you sum up floats, the computer must represent each float with the same exponent to add/subtract the digits of the FRACTION from each other. Therefore all your values will be represented in terms of the greatest exponent of your data, which is -1. Therefore your rounding error will be in the order of magnitude of 10^-16*10^-1 which is 10^-17. You can see that your result is in this order of magnitude as well, so it is very much influenced by the rounding errors of your digits.
You are using floats and apply arithmetic on it. Floating point arithmetic is a dangerous thing because it always produces a small rounding error. Whether this error is rounded up or down or just "cut off" from the binary representation, different results may appear.

What rules dictate how Python floats are rounded?

The Python documentation on floats state
0.1
0.1000000000000000055511151231257827021181583404541015625
That is more digits than most people find useful, so Python keeps the
number of digits manageable by displaying a rounded value instead
0.1
What are the rules surrounding which floats get rounded for display and which ones don't? I've encountered some funny scenarios where
1.1+2.2 returns 3.3000000000000003 (unrounded)
but
1.0+2.3 returns 3.3 (rounded)
I know that the decimal module exists for making these things consistent, but am curious as to what determines the displayed rounding in floats.
What are the rules surrounding which floats get rounded for display
and which ones don't? I've encountered some funny scenarios where
1.1+2.2 returns 3.3000000000000003 (unrounded)
but
1.0+2.3 returns 3.3 (rounded)
Part of the explanation is of course that 1.1 + 2.2 and 1.0 + 2.3 produce different floating-point numbers, and part of the explanation for that is that 1.1 is not really 11/10, 2.2 not really 22/10, and of course floating-point + is not rational addition either.
Many modern programming languages, including the most recent Python variations, when displaying a double-precision floating-point value d, show exactly the number of decimal digits necessary for the decimal representation, re-parsed as a double, to be converted again exactly to d. As a consequence:
there is exactly one floating-point value that prints as 3.3. There cannot be two, because they would have to be the same by application of the definition, and there is at least one because if you convert the decimal representation 3.3 to a double, you get a double that has the property of producing the string “3.3” when converted to decimal with the algorithm in question.
the values are rounded for the purpose of showing them with decimal digits, but they otherwise remain the numbers that they are. So some of the “rules” that you are asking for are rules about how floating-point operations are rounded. This is very simple but you have to look at the binary representation of the arguments and results for it to be simple. If you look at the decimal representations, the rounding looks random (but it isn't).
the numbers only have a compact representation in binary. The exact value may take many decimal digits to represent exactly. “3.3000000000000003” is not “unrounded”, it is simply rounded to more digits than “3.3”, specifically, just exactly the number of digits necessary to distinguish that double-precision number from its neighbor (the one that is represented by “3.3”). They are in fact respectively the numbers below:
3.29999999999999982236431605997495353221893310546875
3.300000000000000266453525910037569701671600341796875
Of the two, 33/10 is closest to the former, so the former can be printed as “3.3”. The latter cannot be printed as “3.3”, and neither can it be printed as “3.30”, “3.300”, …, “3.300000000000000” since all these representations are equivalent and parse back to the floating-point number 3.29999999999999982236431605997495353221893310546875. So it has to be printed as “3.3000000000000003”, where the 3 is obtained because the digit 2 is followed by 6.

1 == 2 for large numbers of 1

I'm wondering what causes this behaviour. I haven't been able to find an answer that covers this. It is probably something simple and obvious, but it is not to me. I am using python 2.7.3 in Ubuntu.
In [1]: 2 == 1.9999999999999999
Out[1]: True
In [2]: 2 == 1.999999999999999
Out[2]: False
EDIT:
To clarify my question. Is there a written(in documentation) max number of 9's where python will evaluate the expression above as being equal to 2?
Python uses floating point representation
What a floating point actually is, is a fixed-width binary number (called the "significand") plus a small integer to tell you how many powers of two to shift that value by (the "exponent"). Plus a sign bit. Just like scientific notation, but in base 2 instead of 10.
The closest 64 bit floating point value to 1.9999999999999999 is 2.0, because 64 bit floating point values (so-called "double precision") uses 52 bits of significand, which is equivalent to about 15 decimal places. So the literal 1.9999999999999999 is just another way of writing 2.0. However, the closest value to 1.999999999999999 is less than 2.0 (I think it's 1.9999999999999988897769753748434595763683319091796875 exactly, but I'm too lazy to check that's correct, I'm just relying on Python's formatting code to be exact).
I don't actually know whether the use specifically of 64 bit floats is required by the Python language, or is an implementation detail of CPython. But whatever size is used, the important thing is not specifically the number of decimal places, it is where the closest floating-point value of that size lies to your decimal literal. It will be closer for some literals than others.
Hence, 1.9999999999999999 == 2 for the same reason that 2.0 == 2 (Python allows mixed-type numeric operations including comparison, and the integer 2 is equal to the float 2.0). Whereas 1.999999999999999 != 2.
Types coercion
>>> 2 == 2.0
True
And consequences of maximum number of digits that can be represented in python :
>>> import sys
>>> sys.float_info.dig
15
>>> 1.9999999999999999
2.0
more from docs
>>> float('9876543211234567')
9876543211234568.0
note ..68 on the end instead of expected ..67
This is due to the way floats are implemented in Python. To keep it short and simple: Since floats almost always are an approximation and thus have more digits than most people find useful, the Python interpreter displays a rounded value.
More detailed, floats are stored in binary. This means that they're stored as fractions to the base 2, unlike decimal, were you can display a float as fractions to the base 10. However, most decimal fractions don't have an exact representation in binary. Because of that, they are typically stored with a precision of 53 bits. This renders them pretty much useless if you want to do more complex arithmetic operations, since you'll run into some strange problems, e. g.:
>>> 0.1 + 0.2
0.30000000000000004
>>> round(2.675, 2)
2.67
See The docs on floats as well.
Mathematically speaking, 2.0 does equal 1.9999... forever. They are two different ways of writing the same number.
However, in software, it's important to never compare two floats or decimals for equality - instead, subtract them, take the absolute value, and verify that the (always positive) difference is sufficiently low for your purposes.
EG:
if abs(value1 - value2) < 1e10:
# they are close enough
else:
# they are not
You probably should set EPSILON = 1e10, and use the symbolic constant instead of scattering 1e10 throughout your code, or better still use a comparison function.

Why is Ruby's Float#round behavior different than Python's?

"Behavior of “round” function in Python" observes that Python rounds floats like this:
>>> round(0.45, 1)
0.5
>>> round(1.45, 1)
1.4
>>> round(2.45, 1)
2.5
>>> round(3.45, 1)
3.5
>>> round(4.45, 1)
4.5
>>> round(5.45, 1)
5.5
>>> round(6.45, 1)
6.5
>>> round(7.45, 1)
7.5
>>> round(8.45, 1)
8.4
>>> round(9.45, 1)
9.4
The accepted answer confirms this is caused by the binary representation of floats being inaccurate, which is all logical.
Assuming that Ruby floats are just as inaccurate as Python's, how come Ruby floats round like a human would? Does Ruby cheat?
1.9.3p194 :009 > 0.upto(9) do |n|
1.9.3p194 :010 > puts (n+0.45).round(1)
1.9.3p194 :011?> end
0.5
1.5
2.5
3.5
4.5
5.5
6.5
7.5
8.5
9.5
Summary
Both implementations are confront the same issues surrounding binary floating point numbers.
Ruby operates directly on the floating point number with simple operations (multiply by a power of ten, adjust, and truncate).
Python converts the binary floating point number to a string using David Gay's sophisticated algorithm that yields the shortest decimal representation that is exactly equal to the binary floating point number. This does not do any additional rounding, it is an exact conversion to a string.
With the shortest string representation in-hand, Python rounds to the appropriate number of decimal places using exact string operations. The goal of the float-to-string conversion is to attempt to "undo" some of the binary floating point representation error (i.e. if you enter 6.6, Python rounds on the 6.6 rather that 6.5999999999999996.
In addition, Ruby differs from some versions of Python in rounding modes: round-away-from-zero versus round-half-even.
Detail
Ruby doesn't cheat. It starts with plain old binary float point numbers the same a Python does. Accordingly, it is subject to some of the same challenges (such 3.35 being represented at slightly more than 3.35 and 4.35 being represented as slightly less than 4.35):
>>> Decimal.from_float(3.35)
Decimal('3.350000000000000088817841970012523233890533447265625')
>>> Decimal.from_float(4.35)
Decimal('4.3499999999999996447286321199499070644378662109375')
The best way to see the implementation differences is to look at the underlying source code:
Here's a link to the Ruby source code: https://github.com/ruby/ruby/blob/trunk/numeric.c#L1587
The Python source is starts here: http://hg.python.org/cpython/file/37352a3ccd54/Python/bltinmodule.c
and finishes here: http://hg.python.org/cpython/file/37352a3ccd54/Objects/floatobject.c#l1080
The latter has an extensive comment that reveals the differences between the two implementations:
The basic idea is very simple: convert and round the double to a
decimal string using _Py_dg_dtoa, then convert that decimal string
back to a double with _Py_dg_strtod. There's one minor difficulty:
Python 2.x expects round to do round-half-away-from-zero, while
_Py_dg_dtoa does round-half-to-even. So we need some way to detect and correct the halfway cases.
Detection: a halfway value has the form k * 0.5 * 10**-ndigits for
some odd integer k. Or in other words, a rational number x is exactly
halfway between two multiples of 10**-ndigits if its 2-valuation is
exactly -ndigits-1 and its 5-valuation is at least
-ndigits. For ndigits >= 0 the latter condition is automatically satisfied for a binary float x, since any such float has nonnegative
5-valuation. For 0 > ndigits >= -22, x needs to be an integral
multiple of 5**-ndigits; we can check this using fmod. For -22 >
ndigits, there are no halfway cases: 5**23 takes 54 bits to represent
exactly, so any odd multiple of 0.5 * 10**n for n >= 23 takes at least
54 bits of precision to represent exactly.
Correction: a simple strategy for dealing with halfway cases is to
(for the halfway cases only) call _Py_dg_dtoa with an argument of
ndigits+1 instead of ndigits (thus doing an exact conversion to
decimal), round the resulting string manually, and then convert back
using _Py_dg_strtod.
In short, Python 2.7 goes to great lengths to accurately follow a round-away-from-zero rule.
In Python 3.3, it goes to equally great length to accurately follow a round-to-even rule.
Here's a little additional detail on the _Py_dg_dtoa function. Python calls the float to string function because it implements an algorithm that gives the shortest possible string representation among equal alternatives. In Python 2.6, for example, the number 1.1 shows up as 1.1000000000000001, but in Python 2.7 and later, it is simply 1.1. David Gay's sophisticated dtoa.c algorithm gives "the-result-that-people-expect" without forgoing accuracy.
That string conversion algorithm tends to make-up for some of the issues that plague any implementation of round() on binary floating point numbers (i.e. it less rounding of 4.35 start with 4.35 instead of 4.3499999999999996447286321199499070644378662109375).
That and the rounding mode (round-half-even vs round-away-from-zero) are the essential differences between the Python and Ruby round() functions.
The fundamental difference is:
Python: Convert to decimal and then round
Ruby: Round and then convert to decimal
Ruby is rounding it from the original floating point bit string, but after operating on it with 10n. You can't see the original binary value without looking very closely. The values are inexact because they are binary, and we are used to writing in decimal, and as it happens almost all of the decimal fraction strings we are likely to write do not have an exact equivalence as a base 2 fraction string.
In particular, 0.45 looks like this:
01111111101 1100110011001100110011001100110011001100110011001101
In hex, that is 3fdccccccccccccd.
It repeats in binary, the first unrepresented digit is 0xc, and the clever decimal input conversion has accurately rounded this very last fractional digit to 0xd.
This means that inside the machine, the value is greater than 0.45 by roughly 1/250. This is obviously a very, very small number but it's enough to cause the default round-nearest algorithm to round up instead of to the tie-breaker of even.
Both Python and Ruby are potentially rounding more than once as every operation effectively rounds into the least significant bit.
I'm not sure I agree that Ruby does what a human would do. I think Python is approximating what decimal arithmetic would do. Python (depending on version) is applying round-nearest to the decimal string and Ruby is applying the round nearest algorithm to a computed binary value.
Note that we can see here quite clearly the reason people say that FP is inexact. It's a reasonably true statement, but it's more true to say that we simply can't convert accurately between binary and most decimal fractions. (Some do: 0.25, 0.5, 0.75, ...) Most simple decimal numbers are repeating numbers in binary, so we can never store the exact equivalent value. But, every value we can store is known exactly and all arithmetic performed on it is performed exactly. If we wrote our fractions in binary in the first place our FP arithmetic would be considered exact.
Ruby doesn't cheat. It just chose another way to implement round.
In Ruby, 9.45.round(1) is almost equivalent to (9.45*10.0).round / 10.0.
irb(main):001:0> printf "%.20f", 9.45
9.44999999999999928946=> nil
irb(main):002:0> printf "%.20f", 9.45*10.0
94.50000000000000000000=> nil
So
irb(main):003:0> puts 9.45.round(1)
9.5
If we use such way in Python, we will get 9.5 as well.
>>> round(9.45, 1)
9.4
>>> round(9.45*10)/10
9.5

Why do simple math operations on floating point return unexpected (inaccurate) results in VB.Net and Python?

x = 4.2 - 0.1
vb.net gives 4.1000000000000005
python gives 4.1000000000000005
Excel gives 4.1
Google calc gives 4.1
What is the reason this happens?
Float/double precision.
You must remember that in binary, 4.1 = 4 + 1/10. 1/10 is an infinitely repeating sum in binary, much like 1/9 is an infinite sum in decimal.
>>> x = 4.2 - 0.1
>>> x
4.1000000000000005
>>>>print(x)
4.1
This happens because of how numbers are stored internally.
Computers represent numbers in binary, instead of decimal, as us humans are used to. With floating point numbers, computers have to make an approximation to the closest binary floating point value.
Almost all machines today (November 2000) use IEEE-754 floating point arithmetic, and almost all platforms map Python floats to IEEE-754 “double precision”. 754 doubles contain 53 bits of precision, so on input the computer strives to convert 0.1 to the closest fraction it can of the form J/2***N* where J is an integer containing exactly 53 bits.
If you print the number, it will show the approximation, truncated to a normal value. For example, the real value of 0.1 is 0.1000000000000000055511151231257827021181583404541015625.
If you really need a base 10 based number (if you don't know the answer to this question, you don't), you could use (in Python) decimal.Decimal:
>>> from decimal import Decimal
>>> Decimal("4.2") - Decimal("0.1")
Decimal("4.1")
Binary floating-point arithmetic holds many surprises like this. The problem with “0.1” is explained in precise detail below, in the “Representation Error” section. See The Perils of Floating Point for a more complete account of other common surprises.
As that says near the end, “there are no easy answers.” Still, don’t be unduly wary of floating-point! The errors in Python float operations are inherited from the floating-point hardware, and on most machines are on the order of no more than 1 part in 2**53 per operation. That’s more than adequate for most tasks, but you do need to keep in mind that it’s not decimal arithmetic, and that every float operation can suffer a new rounding error.
While pathological cases do exist, for most casual use of floating-point arithmetic you’ll see the result you expect in the end if you simply round the display of your final results to the number of decimal digits you expect. str() usually suffices, and for finer control see the str.format() method’s format specifiers in Format String Syntax.
There is no problem, really. It is just the way floats work (their internal binary representation). Anyway:
>>> from decimal import Decimal
>>> Decimal('4.2')-Decimal('0.1')
Decimal('4.1')
In vb.net, you can avoid this problem by using Decimal type instead:
Dim x As Decimal = 4.2D - 0.1D
The result is 4.1 .

Categories