Why is Ruby's Float#round behavior different than Python's?

Why is Ruby's Float#round behavior different than Python's? - python

"Behavior of “round” function in Python" observes that Python rounds floats like this:
>>> round(0.45, 1)
0.5
>>> round(1.45, 1)
1.4
>>> round(2.45, 1)
2.5
>>> round(3.45, 1)
3.5
>>> round(4.45, 1)
4.5
>>> round(5.45, 1)
5.5
>>> round(6.45, 1)
6.5
>>> round(7.45, 1)
7.5
>>> round(8.45, 1)
8.4
>>> round(9.45, 1)
9.4
The accepted answer confirms this is caused by the binary representation of floats being inaccurate, which is all logical.
Assuming that Ruby floats are just as inaccurate as Python's, how come Ruby floats round like a human would? Does Ruby cheat?
1.9.3p194 :009 > 0.upto(9) do |n|
1.9.3p194 :010 > puts (n+0.45).round(1)
1.9.3p194 :011?> end
0.5
1.5
2.5
3.5
4.5
5.5
6.5
7.5
8.5
9.5

Summary
Both implementations are confront the same issues surrounding binary floating point numbers.
Ruby operates directly on the floating point number with simple operations (multiply by a power of ten, adjust, and truncate).
Python converts the binary floating point number to a string using David Gay's sophisticated algorithm that yields the shortest decimal representation that is exactly equal to the binary floating point number. This does not do any additional rounding, it is an exact conversion to a string.
With the shortest string representation in-hand, Python rounds to the appropriate number of decimal places using exact string operations. The goal of the float-to-string conversion is to attempt to "undo" some of the binary floating point representation error (i.e. if you enter 6.6, Python rounds on the 6.6 rather that 6.5999999999999996.
In addition, Ruby differs from some versions of Python in rounding modes: round-away-from-zero versus round-half-even.
Detail
Ruby doesn't cheat. It starts with plain old binary float point numbers the same a Python does. Accordingly, it is subject to some of the same challenges (such 3.35 being represented at slightly more than 3.35 and 4.35 being represented as slightly less than 4.35):
>>> Decimal.from_float(3.35)
Decimal('3.350000000000000088817841970012523233890533447265625')
>>> Decimal.from_float(4.35)
Decimal('4.3499999999999996447286321199499070644378662109375')
The best way to see the implementation differences is to look at the underlying source code:
Here's a link to the Ruby source code: https://github.com/ruby/ruby/blob/trunk/numeric.c#L1587
The Python source is starts here: http://hg.python.org/cpython/file/37352a3ccd54/Python/bltinmodule.c
and finishes here: http://hg.python.org/cpython/file/37352a3ccd54/Objects/floatobject.c#l1080
The latter has an extensive comment that reveals the differences between the two implementations:
The basic idea is very simple: convert and round the double to a
decimal string using _Py_dg_dtoa, then convert that decimal string
back to a double with _Py_dg_strtod. There's one minor difficulty:
Python 2.x expects round to do round-half-away-from-zero, while
_Py_dg_dtoa does round-half-to-even. So we need some way to detect and correct the halfway cases.
Detection: a halfway value has the form k * 0.5 * 10**-ndigits for
some odd integer k. Or in other words, a rational number x is exactly
halfway between two multiples of 10**-ndigits if its 2-valuation is
exactly -ndigits-1 and its 5-valuation is at least
-ndigits. For ndigits >= 0 the latter condition is automatically satisfied for a binary float x, since any such float has nonnegative
5-valuation. For 0 > ndigits >= -22, x needs to be an integral
multiple of 5**-ndigits; we can check this using fmod. For -22 >
ndigits, there are no halfway cases: 5**23 takes 54 bits to represent
exactly, so any odd multiple of 0.5 * 10**n for n >= 23 takes at least
54 bits of precision to represent exactly.
Correction: a simple strategy for dealing with halfway cases is to
(for the halfway cases only) call _Py_dg_dtoa with an argument of
ndigits+1 instead of ndigits (thus doing an exact conversion to
decimal), round the resulting string manually, and then convert back
using _Py_dg_strtod.
In short, Python 2.7 goes to great lengths to accurately follow a round-away-from-zero rule.
In Python 3.3, it goes to equally great length to accurately follow a round-to-even rule.
Here's a little additional detail on the _Py_dg_dtoa function. Python calls the float to string function because it implements an algorithm that gives the shortest possible string representation among equal alternatives. In Python 2.6, for example, the number 1.1 shows up as 1.1000000000000001, but in Python 2.7 and later, it is simply 1.1. David Gay's sophisticated dtoa.c algorithm gives "the-result-that-people-expect" without forgoing accuracy.
That string conversion algorithm tends to make-up for some of the issues that plague any implementation of round() on binary floating point numbers (i.e. it less rounding of 4.35 start with 4.35 instead of 4.3499999999999996447286321199499070644378662109375).
That and the rounding mode (round-half-even vs round-away-from-zero) are the essential differences between the Python and Ruby round() functions.

The fundamental difference is:
Python: Convert to decimal and then round
Ruby: Round and then convert to decimal
Ruby is rounding it from the original floating point bit string, but after operating on it with 10n. You can't see the original binary value without looking very closely. The values are inexact because they are binary, and we are used to writing in decimal, and as it happens almost all of the decimal fraction strings we are likely to write do not have an exact equivalence as a base 2 fraction string.
In particular, 0.45 looks like this:
01111111101 1100110011001100110011001100110011001100110011001101
In hex, that is 3fdccccccccccccd.
It repeats in binary, the first unrepresented digit is 0xc, and the clever decimal input conversion has accurately rounded this very last fractional digit to 0xd.
This means that inside the machine, the value is greater than 0.45 by roughly 1/250. This is obviously a very, very small number but it's enough to cause the default round-nearest algorithm to round up instead of to the tie-breaker of even.
Both Python and Ruby are potentially rounding more than once as every operation effectively rounds into the least significant bit.
I'm not sure I agree that Ruby does what a human would do. I think Python is approximating what decimal arithmetic would do. Python (depending on version) is applying round-nearest to the decimal string and Ruby is applying the round nearest algorithm to a computed binary value.
Note that we can see here quite clearly the reason people say that FP is inexact. It's a reasonably true statement, but it's more true to say that we simply can't convert accurately between binary and most decimal fractions. (Some do: 0.25, 0.5, 0.75, ...) Most simple decimal numbers are repeating numbers in binary, so we can never store the exact equivalent value. But, every value we can store is known exactly and all arithmetic performed on it is performed exactly. If we wrote our fractions in binary in the first place our FP arithmetic would be considered exact.

Ruby doesn't cheat. It just chose another way to implement round.
In Ruby, 9.45.round(1) is almost equivalent to (9.45*10.0).round / 10.0.
irb(main):001:0> printf "%.20f", 9.45
9.44999999999999928946=> nil
irb(main):002:0> printf "%.20f", 9.45*10.0
94.50000000000000000000=> nil
So
irb(main):003:0> puts 9.45.round(1)
9.5
If we use such way in Python, we will get 9.5 as well.
>>> round(9.45, 1)
9.4
>>> round(9.45*10)/10
9.5

Related

Algorithm to check if a given integer is a power of two; I am using log and failing? [duplicate]

I work daily with Python 2.4 at my company. I used the versatile logarithm function 'log' from the standard math library, and when I entered log(2**31, 2) it returned 31.000000000000004, which struck me as a bit odd.
I did the same thing with other powers of 2, and it worked perfectly. I ran 'log10(2**31) / log10(2)' and I got a round 31.0
I tried running the same original function in Python 3.0.1, assuming that it was fixed in a more advanced version.
Why does this happen? Is it possible that there are some inaccuracies in mathematical functions in Python?

This is to be expected with computer arithmetic. It is following particular rules, such as IEEE 754, that probably don't match the math you learned in school.
If this actually matters, use Python's decimal type.
Example:
from decimal import Decimal, Context
ctx = Context(prec=20)
two = Decimal(2)
ctx.divide(ctx.power(two, Decimal(31)).ln(ctx), two.ln(ctx))

You should read "What Every Computer Scientist Should Know About Floating-Point Arithmetic".
http://docs.sun.com/source/806-3568/ncg_goldberg.html

Always assume that floating point operations will have some error in them and check for equality taking that error into account (either a percentage value like 0.00001% or a fixed value like 0.00000000001). This inaccuracy is a given since not all decimal numbers can be represented in binary with a fixed number of bits precision.
Your particular case is not one of them if Python uses IEEE754 since 31 should be easily representable with even single precision. It's possible however that it loses precision in one of the many steps it takes to calculate log2231, simply because it doesn't have code to detect special cases like a direct power of two.

floating-point operations are never exact. They return a result which has an acceptable relative error, for the language/hardware infrastructure.
In general, it's quite wrong to assume that floating-point operations are precise, especially with single-precision. "Accuracy problems" section from Wikipedia Floating point article :)

IEEE double floating point numbers have 52 bits of precision. Since 10^15 < 2^52 < 10^16, a double has between 15 and 16 significant figures. The result 31.000000000000004 is correct to 16 figures, so it is as good as you can expect.

This is normal. I would expect log10 to be more accurate then log(x, y), since it knows exactly what the base of the logarithm is, also there may be some hardware support for calculating base-10 logarithms.

float are imprecise
I don't buy that argument, because exact power of two are represented exactly on most platforms (with underlying IEEE 754 floating point).
So if we really want that log2 of an exact power of 2 be exact, we can.
I'll demonstrate it in Squeak Smalltalk, because it is easy to change the base system in that language, but the language does not really matter, floating point computation are universal, and Python object model is not that far from Smalltalk.
For taking log in base n, there is the log: function defined in Number, which naively use the Neperian logarithm ln:
log: aNumber
"Answer the log base aNumber of the receiver."
^self ln / aNumber ln
self ln (take the neperian logarithm of receiver) , aNumber ln and / are three operations that will round there result to nearest Float, and these rounding error can cumulate... So the naive implementation is subject to the rounding error you observe, and I guess that Python implementation of log function is not much different.
((2 raisedTo: 31) log: 2) = 31.000000000000004
But if I change the definition like this:
log: aNumber
"Answer the log base aNumber of the receiver."
aNumber = 2 ifTrue: [^self log2].
^self ln / aNumber ln
provide a generic log2 in Number class:
log2
"Answer the base-2 log of the receiver."
^self asFloat log2
and this refinment in Float class:
log2
"Answer the base 2 logarithm of the receiver.
Care to answer exact result for exact power of two."
^self significand ln / Ln2 + self exponent asFloat
where Ln2 is a constant (2 ln), then I effectively get an exact log2 for exact power of two, because significand of such number = 1.0 (including subnormal for Squeak exponent/significand definition), and 1.0 ln = 0.0.
The implementation is quite trivial, and should translate without difficulty in Python (probably in the VM); the runtime cost is very cheap, so it's just a matter of how important we think this feature is, or is not.
As I always say, the fact that floating point operations results are rounded to nearest (or whatever rounding direction) representable value is not a license to waste ulp. Exactness has a cost, both in term of runtime penalty and implementation complexity, so it's trade-offs driven.

The representation (float.__repr__) of a number in python tries to return a string of digits as close to the real value as possible when converted back, given that IEEE-754 arithmetic is precise up to a limit. In any case, if you printed the result, you wouldn't notice:
>>> from math import log
>>> log(2**31,2)
31.000000000000004
>>> print log(2**31,2)
31.0
print converts its arguments to strings (in this case, through the float.__str__ method), which caters for the inaccuracy by displaying less digits:
>>> log(1000000,2)
19.931568569324174
>>> print log(1000000,2)
19.9315685693
>>> 1.0/10
0.10000000000000001
>>> print 1.0/10
0.1
usuallyuseless' answer is very useful, actually :)

If you wish to calculate the highest power of 'k' in a number 'n'. Then the code below might be helpful:
import math
answer = math.ceil(math.log(n,k))
while k**answer>n:
answer-=1
NOTE: You shouldn't use 'if' instead of 'while' because that will give wrong results in some cases like n=2**51-1 and k=2. In this example with 'if' the answer is 51 whereas with 'while' the answer is 50, which is correct.

How does Python handle repeated subtraction of floating numbers? [duplicate]

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 4 years ago.
I have a very simple code that takes a floating point number, and uses a while-loop to keep subtracting 1 until it reaches zero:
nr = 4.2
while nr > 0:
print(nr)
nr -= 1
I expected the output to look like this:
4.2
3.2
2.2
etc...
But instead, I get this:
4.2
3.2
2.2
1.2000000000000002
0.20000000000000018
Where do these weird floating numbers come from? Why does it only happen after the third time executing the loop? Also, very interestingly, this does not happen when the last decimal of nr is a 5.
What happened and how can I prevent this?

Upon execution of nr = 4.2, your Python set nr to exactly 4.20000000000000017763568394002504646778106689453125. This is the value that results from converting 4.2 to a binary-based floating-point format.
The results shown for subsequent subtractions appear to vary in the low digits solely due to formatting decisions. The default formatting for floating-point numbers does not show all of the digits. Python is not strict about floating-point behavior, but I suspect your implementation may be showing just as many decimal digits as needed to uniquely distinguish the binary floating-point number.
For “4.2”, “3.2”, and “2.2”, this is just two significant digits, because these decimal numbers are near the actual binary floating-point value.
Near 1.2, the floating-point format has more resolution (because the value dropped under 2, meaning the exponent decreased, thus shifting the effective position of the significand lower, allowing it to include another bit of resolution on an absolute scale). In consequence, there happens to be another binary floating-point number near 1.2, so “ 1.2000000000000002” is shown to distinguish the number currently in nr from this other number.
Near .2, there is even more resolution, and so there are even more binary floating-point numbers nearby, and more digits have to be used to distinguish the value.

What rules dictate how Python floats are rounded?

The Python documentation on floats state
0.1
0.1000000000000000055511151231257827021181583404541015625
That is more digits than most people find useful, so Python keeps the
number of digits manageable by displaying a rounded value instead
0.1
What are the rules surrounding which floats get rounded for display and which ones don't? I've encountered some funny scenarios where
1.1+2.2 returns 3.3000000000000003 (unrounded)
but
1.0+2.3 returns 3.3 (rounded)
I know that the decimal module exists for making these things consistent, but am curious as to what determines the displayed rounding in floats.

What are the rules surrounding which floats get rounded for display
and which ones don't? I've encountered some funny scenarios where
1.1+2.2 returns 3.3000000000000003 (unrounded)
but
1.0+2.3 returns 3.3 (rounded)
Part of the explanation is of course that 1.1 + 2.2 and 1.0 + 2.3 produce different floating-point numbers, and part of the explanation for that is that 1.1 is not really 11/10, 2.2 not really 22/10, and of course floating-point + is not rational addition either.
Many modern programming languages, including the most recent Python variations, when displaying a double-precision floating-point value d, show exactly the number of decimal digits necessary for the decimal representation, re-parsed as a double, to be converted again exactly to d. As a consequence:
there is exactly one floating-point value that prints as 3.3. There cannot be two, because they would have to be the same by application of the definition, and there is at least one because if you convert the decimal representation 3.3 to a double, you get a double that has the property of producing the string “3.3” when converted to decimal with the algorithm in question.
the values are rounded for the purpose of showing them with decimal digits, but they otherwise remain the numbers that they are. So some of the “rules” that you are asking for are rules about how floating-point operations are rounded. This is very simple but you have to look at the binary representation of the arguments and results for it to be simple. If you look at the decimal representations, the rounding looks random (but it isn't).
the numbers only have a compact representation in binary. The exact value may take many decimal digits to represent exactly. “3.3000000000000003” is not “unrounded”, it is simply rounded to more digits than “3.3”, specifically, just exactly the number of digits necessary to distinguish that double-precision number from its neighbor (the one that is represented by “3.3”). They are in fact respectively the numbers below:
3.29999999999999982236431605997495353221893310546875
3.300000000000000266453525910037569701671600341796875
Of the two, 33/10 is closest to the former, so the former can be printed as “3.3”. The latter cannot be printed as “3.3”, and neither can it be printed as “3.30”, “3.300”, …, “3.300000000000000” since all these representations are equivalent and parse back to the floating-point number 3.29999999999999982236431605997495353221893310546875. So it has to be printed as “3.3000000000000003”, where the 3 is obtained because the digit 2 is followed by 6.

1 == 2 for large numbers of 1

I'm wondering what causes this behaviour. I haven't been able to find an answer that covers this. It is probably something simple and obvious, but it is not to me. I am using python 2.7.3 in Ubuntu.
In [1]: 2 == 1.9999999999999999
Out[1]: True
In [2]: 2 == 1.999999999999999
Out[2]: False
EDIT:
To clarify my question. Is there a written(in documentation) max number of 9's where python will evaluate the expression above as being equal to 2?

Python uses floating point representation
What a floating point actually is, is a fixed-width binary number (called the "significand") plus a small integer to tell you how many powers of two to shift that value by (the "exponent"). Plus a sign bit. Just like scientific notation, but in base 2 instead of 10.
The closest 64 bit floating point value to 1.9999999999999999 is 2.0, because 64 bit floating point values (so-called "double precision") uses 52 bits of significand, which is equivalent to about 15 decimal places. So the literal 1.9999999999999999 is just another way of writing 2.0. However, the closest value to 1.999999999999999 is less than 2.0 (I think it's 1.9999999999999988897769753748434595763683319091796875 exactly, but I'm too lazy to check that's correct, I'm just relying on Python's formatting code to be exact).
I don't actually know whether the use specifically of 64 bit floats is required by the Python language, or is an implementation detail of CPython. But whatever size is used, the important thing is not specifically the number of decimal places, it is where the closest floating-point value of that size lies to your decimal literal. It will be closer for some literals than others.
Hence, 1.9999999999999999 == 2 for the same reason that 2.0 == 2 (Python allows mixed-type numeric operations including comparison, and the integer 2 is equal to the float 2.0). Whereas 1.999999999999999 != 2.

Types coercion
>>> 2 == 2.0
True
And consequences of maximum number of digits that can be represented in python :
>>> import sys
>>> sys.float_info.dig
15
>>> 1.9999999999999999
2.0
more from docs
>>> float('9876543211234567')
9876543211234568.0
note ..68 on the end instead of expected ..67

This is due to the way floats are implemented in Python. To keep it short and simple: Since floats almost always are an approximation and thus have more digits than most people find useful, the Python interpreter displays a rounded value.
More detailed, floats are stored in binary. This means that they're stored as fractions to the base 2, unlike decimal, were you can display a float as fractions to the base 10. However, most decimal fractions don't have an exact representation in binary. Because of that, they are typically stored with a precision of 53 bits. This renders them pretty much useless if you want to do more complex arithmetic operations, since you'll run into some strange problems, e. g.:
>>> 0.1 + 0.2
0.30000000000000004
>>> round(2.675, 2)
2.67
See The docs on floats as well.

Mathematically speaking, 2.0 does equal 1.9999... forever. They are two different ways of writing the same number.
However, in software, it's important to never compare two floats or decimals for equality - instead, subtract them, take the absolute value, and verify that the (always positive) difference is sufficiently low for your purposes.
EG:
if abs(value1 - value2) < 1e10:
# they are close enough
else:
# they are not
You probably should set EPSILON = 1e10, and use the symbolic constant instead of scattering 1e10 throughout your code, or better still use a comparison function.

Why do simple math operations on floating point return unexpected (inaccurate) results in VB.Net and Python?

x = 4.2 - 0.1
vb.net gives 4.1000000000000005
python gives 4.1000000000000005
Excel gives 4.1
Google calc gives 4.1
What is the reason this happens?

Float/double precision.
You must remember that in binary, 4.1 = 4 + 1/10. 1/10 is an infinitely repeating sum in binary, much like 1/9 is an infinite sum in decimal.

>>> x = 4.2 - 0.1
>>> x
4.1000000000000005
>>>>print(x)
4.1
This happens because of how numbers are stored internally.
Computers represent numbers in binary, instead of decimal, as us humans are used to. With floating point numbers, computers have to make an approximation to the closest binary floating point value.
Almost all machines today (November 2000) use IEEE-754 floating point arithmetic, and almost all platforms map Python floats to IEEE-754 “double precision”. 754 doubles contain 53 bits of precision, so on input the computer strives to convert 0.1 to the closest fraction it can of the form J/2***N* where J is an integer containing exactly 53 bits.
If you print the number, it will show the approximation, truncated to a normal value. For example, the real value of 0.1 is 0.1000000000000000055511151231257827021181583404541015625.
If you really need a base 10 based number (if you don't know the answer to this question, you don't), you could use (in Python) decimal.Decimal:
>>> from decimal import Decimal
>>> Decimal("4.2") - Decimal("0.1")
Decimal("4.1")
Binary floating-point arithmetic holds many surprises like this. The problem with “0.1” is explained in precise detail below, in the “Representation Error” section. See The Perils of Floating Point for a more complete account of other common surprises.
As that says near the end, “there are no easy answers.” Still, don’t be unduly wary of floating-point! The errors in Python float operations are inherited from the floating-point hardware, and on most machines are on the order of no more than 1 part in 2**53 per operation. That’s more than adequate for most tasks, but you do need to keep in mind that it’s not decimal arithmetic, and that every float operation can suffer a new rounding error.
While pathological cases do exist, for most casual use of floating-point arithmetic you’ll see the result you expect in the end if you simply round the display of your final results to the number of decimal digits you expect. str() usually suffices, and for finer control see the str.format() method’s format specifiers in Format String Syntax.

There is no problem, really. It is just the way floats work (their internal binary representation). Anyway:
>>> from decimal import Decimal
>>> Decimal('4.2')-Decimal('0.1')
Decimal('4.1')

In vb.net, you can avoid this problem by using Decimal type instead:
Dim x As Decimal = 4.2D - 0.1D
The result is 4.1 .

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.