1 == 2 for large numbers of 1

1 == 2 for large numbers of 1 - python

I'm wondering what causes this behaviour. I haven't been able to find an answer that covers this. It is probably something simple and obvious, but it is not to me. I am using python 2.7.3 in Ubuntu.
In [1]: 2 == 1.9999999999999999
Out[1]: True
In [2]: 2 == 1.999999999999999
Out[2]: False
EDIT:
To clarify my question. Is there a written(in documentation) max number of 9's where python will evaluate the expression above as being equal to 2?

Python uses floating point representation
What a floating point actually is, is a fixed-width binary number (called the "significand") plus a small integer to tell you how many powers of two to shift that value by (the "exponent"). Plus a sign bit. Just like scientific notation, but in base 2 instead of 10.
The closest 64 bit floating point value to 1.9999999999999999 is 2.0, because 64 bit floating point values (so-called "double precision") uses 52 bits of significand, which is equivalent to about 15 decimal places. So the literal 1.9999999999999999 is just another way of writing 2.0. However, the closest value to 1.999999999999999 is less than 2.0 (I think it's 1.9999999999999988897769753748434595763683319091796875 exactly, but I'm too lazy to check that's correct, I'm just relying on Python's formatting code to be exact).
I don't actually know whether the use specifically of 64 bit floats is required by the Python language, or is an implementation detail of CPython. But whatever size is used, the important thing is not specifically the number of decimal places, it is where the closest floating-point value of that size lies to your decimal literal. It will be closer for some literals than others.
Hence, 1.9999999999999999 == 2 for the same reason that 2.0 == 2 (Python allows mixed-type numeric operations including comparison, and the integer 2 is equal to the float 2.0). Whereas 1.999999999999999 != 2.

Types coercion
>>> 2 == 2.0
True
And consequences of maximum number of digits that can be represented in python :
>>> import sys
>>> sys.float_info.dig
15
>>> 1.9999999999999999
2.0
more from docs
>>> float('9876543211234567')
9876543211234568.0
note ..68 on the end instead of expected ..67

This is due to the way floats are implemented in Python. To keep it short and simple: Since floats almost always are an approximation and thus have more digits than most people find useful, the Python interpreter displays a rounded value.
More detailed, floats are stored in binary. This means that they're stored as fractions to the base 2, unlike decimal, were you can display a float as fractions to the base 10. However, most decimal fractions don't have an exact representation in binary. Because of that, they are typically stored with a precision of 53 bits. This renders them pretty much useless if you want to do more complex arithmetic operations, since you'll run into some strange problems, e. g.:
>>> 0.1 + 0.2
0.30000000000000004
>>> round(2.675, 2)
2.67
See The docs on floats as well.

Mathematically speaking, 2.0 does equal 1.9999... forever. They are two different ways of writing the same number.
However, in software, it's important to never compare two floats or decimals for equality - instead, subtract them, take the absolute value, and verify that the (always positive) difference is sufficiently low for your purposes.
EG:
if abs(value1 - value2) < 1e10:
# they are close enough
else:
# they are not
You probably should set EPSILON = 1e10, and use the symbolic constant instead of scattering 1e10 throughout your code, or better still use a comparison function.

Related

Mitigating Floating Point Approximation Issues with Numpy

My code is quite simple, and only 1 line is causing an issue:
np.tan(np.radians(rotation))
Instead of my expected output for rotation = 45 as 1, I get 0.9999999999999999. I understand that 0 and a ton of 9's is 1. In my use case, however, it seems like the type of thing that will definitely build up over iterations.
What is causing the floating point error: np.tan or np.radians, and how do I get the problem function to come out correctly regardless of floating point inaccuracies?
Edit:
I should clarify that I am familiar with floating point inaccuracies. My concern is that as that number gets multiplied, added, and compared, the 1e-6 error suddenly becomes a tangible issue. I've normally been able to safely ignore floating point issues, but now I am far more concerned about the build up of error. I would like to reduce the possibility of such an error.
Edit 2:
My current solution is to just round to 8 decimal places because that's most likely enough. It's sort of a temporary solution because I'd much prefer a way to get around the IEEE decimal representations.

What is causing the floating point error: np.tan or np.radians, and how do I get the problem function to come out correctly regardless of floating point inaccuracies?
Both functions incur rounding error, since in neither case is the exact result representable in floating point.
My current solution is to just round to 8 decimal places because that's most likely enough. It's sort of a temporary solution because I'd much prefer a way to get around the IEEE decimal representations.
The problem has nothing to do with decimal representation, and this will give worse results outside of the exact case you mention above, e.g.
>>> np.tan(np.radians(60))
1.7320508075688767
>>> round(np.tan(np.radians(60)), 8)
1.73205081
>>> np.sqrt(3) # sqrt is correctly rounded, so this is the closest float to the true result
1.7320508075688772
If you absolutely need higher accuracy than the 15 decimal digits you would get from code above, then you can use an arbitrary precision library like gmpy2.

Take a look here: https://docs.scipy.org/doc/numpy/user/basics.types.html .
Standard dtypes in numpy do not go beyond 64 bits precision. From the docs:
Be warned that even if np.longdouble offers more precision than python
float, it is easy to lose that extra precision, since python often
forces values to pass through float. For example, the % formatting
operator requires its arguments to be converted to standard python
types, and it is therefore impossible to preserve extended precision
even if many decimal places are requested. It can be useful to test
your code with the value 1 + np.finfo(np.longdouble).eps.
You can increase precision with np.longdouble, but this is platform dependent
In spyder (windows):
np.finfo(np.longdouble).eps #same precision as float
>> 2.220446049250313e-16
np.finfo(np.longdouble).precision
>> 15
In google colab:
np.finfo(np.longdouble).eps #larger precision
>> 1.084202172485504434e-19
np.finfo(np.longdouble).precision
>> 18
print(np.tan(np.radians(45, dtype=np.float), dtype=np.float) - 1)
print(np.tan(np.radians(45, dtype=np.longfloat), dtype=np.longfloat) - 1)
>> -1.1102230246251565e-16
0.0

Comparing Python Decimals created from float and string

Can someone explain why the following three examples are not all equal?
ipdb> Decimal(71.60) == Decimal(71.60)
True
ipdb> Decimal('71.60') == Decimal('71.60')
True
ipdb> Decimal(71.60) == Decimal('71.60')
False
Is there a general 'correct' way to create Decimal objects in Python? (ie, as strings or as floats)

Floating point numbers, what are used by default, are in base 2. 71.6 can't be accurately represented in base 2. (Think of numbers like 1/3 in base 10).
Because of this, they will be converted to be as many decimal places as the floating point can represent. Because the number 71.6 in base 2 would go on forever and you almost certainly don't have infinate memory to play with, the computer decides to represent it (well, is told to) in a fewer number of bits.
If you were to use a string instead, the program can use an algorithm to convert it exactly instead of starting from the dodgy rounded floating point number.
>>> decimal.Decimal(71.6)
Decimal('71.599999999999994315658113919198513031005859375')
Compared to
>>> decimal.Decimal("71.6")
Decimal('71.6')
However, if your number is representable exactly as a float, it is just as accurate as a string
>>> decimal.Decimal(71.5)
Decimal('71.5')

Normally Decimal is used to avoid the floating point precision problem. For example, the float literal 71.60 isn't mathematically 71.60, but a number very close to it.
As a result, using float to initialize Decimal won't avoid the problem. In general, you should use strings to initialize Decimal.

Rounding ** 0.5 and math.sqrt

In Python, are either
n**0.5 # or
math.sqrt(n)
recognized when a number is a perfect square? Specifically, should I worry that when I use
int(n**0.5) # instead of
int(n**0.5 + 0.000000001)
I might accidentally end up with the number one less than the actual square root due to precision error?

As several answers have suggested integer arithmetic, I'll recommend the gmpy2 library. It provides functions for checking if a number is a perfect power, calculating integer square roots, and integer square root with remainder.
>>> import gmpy2
>>> gmpy2.is_power(9)
True
>>> gmpy2.is_power(10)
False
>>> gmpy2.isqrt(10)
mpz(3)
>>> gmpy2.isqrt_rem(10)
(mpz(3), mpz(1))
Disclaimer: I maintain gmpy2.

Yes, you should worry:
In [11]: int((100000000000000000000000000000000000**2) ** 0.5)
Out[11]: 99999999999999996863366107917975552L
In [12]: int(math.sqrt(100000000000000000000000000000000000**2))
Out[12]: 99999999999999996863366107917975552L
obviously adding the 0.000000001 doesn't help here either...
As #DSM points out, you can use the decimal library:
In [21]: from decimal import Decimal
In [22]: x = Decimal('100000000000000000000000000000000000')
In [23]: (x ** 2).sqrt() == x
Out[23]: True
for numbers over 10**999999999, provided you keep a check on the precision (configurable), it'll throw an error rather than an incorrect answer...

Both **0.5 and math.sqrt() perform the calculation using floating point arithmetic. The input is converted to float before the square root is calculated.
Do these calculations recognize when the input value is a perfect square?
No they do not. Floating arithmetic has no concept of perfect squares.
large integers may not be representable, for values where the number has more significant digits than available in the floating point mantissa. It's easy to see therefore that for non-representable input values, n**0.5 may be innaccurate. And you proposed fix by adding a small value will not in general fix the problem.
If your input is an integer then you should consider performing your calculation using integer arithmetic. That ultimately is the right way to deal with this.

You can use the round(number, significant_figures) before converting to an int, I cannot recall if python truncs or rounds when doing a float-to-integer conversion.
In any case, since python uses floating point arithmetic, all the pitfalls apply. See:
http://docs.python.org/2/tutorial/floatingpoint.html

Perfect-square values will have no fractional components, so your main worry would be very large values, and for such values a difference of 1 or 2 being significant means you're going to want a specific numerical library that supports such high precision (as DSM mentions, the Decimal library, standard since Python 2.4, should be able to do what you want as it supports arbitrary precision.
http://docs.python.org/library/decimal.html

sqrt is one of the easier math library functions to implement, and any math library of reasonable quality will implement it with faithful rounding (sub-ULP accuracy). If the input is a perfect square, its square root is representable (in a reasonable floating-point format). In this case, faithful rounding guarantees the result is exact.
This addresses only the value actually passed to sqrt. Whether a number can be converted without error from another format to the floating-point input for sqrt is a separate issue.

Why is Ruby's Float#round behavior different than Python's?

"Behavior of “round” function in Python" observes that Python rounds floats like this:
>>> round(0.45, 1)
0.5
>>> round(1.45, 1)
1.4
>>> round(2.45, 1)
2.5
>>> round(3.45, 1)
3.5
>>> round(4.45, 1)
4.5
>>> round(5.45, 1)
5.5
>>> round(6.45, 1)
6.5
>>> round(7.45, 1)
7.5
>>> round(8.45, 1)
8.4
>>> round(9.45, 1)
9.4
The accepted answer confirms this is caused by the binary representation of floats being inaccurate, which is all logical.
Assuming that Ruby floats are just as inaccurate as Python's, how come Ruby floats round like a human would? Does Ruby cheat?
1.9.3p194 :009 > 0.upto(9) do |n|
1.9.3p194 :010 > puts (n+0.45).round(1)
1.9.3p194 :011?> end
0.5
1.5
2.5
3.5
4.5
5.5
6.5
7.5
8.5
9.5

Summary
Both implementations are confront the same issues surrounding binary floating point numbers.
Ruby operates directly on the floating point number with simple operations (multiply by a power of ten, adjust, and truncate).
Python converts the binary floating point number to a string using David Gay's sophisticated algorithm that yields the shortest decimal representation that is exactly equal to the binary floating point number. This does not do any additional rounding, it is an exact conversion to a string.
With the shortest string representation in-hand, Python rounds to the appropriate number of decimal places using exact string operations. The goal of the float-to-string conversion is to attempt to "undo" some of the binary floating point representation error (i.e. if you enter 6.6, Python rounds on the 6.6 rather that 6.5999999999999996.
In addition, Ruby differs from some versions of Python in rounding modes: round-away-from-zero versus round-half-even.
Detail
Ruby doesn't cheat. It starts with plain old binary float point numbers the same a Python does. Accordingly, it is subject to some of the same challenges (such 3.35 being represented at slightly more than 3.35 and 4.35 being represented as slightly less than 4.35):
>>> Decimal.from_float(3.35)
Decimal('3.350000000000000088817841970012523233890533447265625')
>>> Decimal.from_float(4.35)
Decimal('4.3499999999999996447286321199499070644378662109375')
The best way to see the implementation differences is to look at the underlying source code:
Here's a link to the Ruby source code: https://github.com/ruby/ruby/blob/trunk/numeric.c#L1587
The Python source is starts here: http://hg.python.org/cpython/file/37352a3ccd54/Python/bltinmodule.c
and finishes here: http://hg.python.org/cpython/file/37352a3ccd54/Objects/floatobject.c#l1080
The latter has an extensive comment that reveals the differences between the two implementations:
The basic idea is very simple: convert and round the double to a
decimal string using _Py_dg_dtoa, then convert that decimal string
back to a double with _Py_dg_strtod. There's one minor difficulty:
Python 2.x expects round to do round-half-away-from-zero, while
_Py_dg_dtoa does round-half-to-even. So we need some way to detect and correct the halfway cases.
Detection: a halfway value has the form k * 0.5 * 10**-ndigits for
some odd integer k. Or in other words, a rational number x is exactly
halfway between two multiples of 10**-ndigits if its 2-valuation is
exactly -ndigits-1 and its 5-valuation is at least
-ndigits. For ndigits >= 0 the latter condition is automatically satisfied for a binary float x, since any such float has nonnegative
5-valuation. For 0 > ndigits >= -22, x needs to be an integral
multiple of 5**-ndigits; we can check this using fmod. For -22 >
ndigits, there are no halfway cases: 5**23 takes 54 bits to represent
exactly, so any odd multiple of 0.5 * 10**n for n >= 23 takes at least
54 bits of precision to represent exactly.
Correction: a simple strategy for dealing with halfway cases is to
(for the halfway cases only) call _Py_dg_dtoa with an argument of
ndigits+1 instead of ndigits (thus doing an exact conversion to
decimal), round the resulting string manually, and then convert back
using _Py_dg_strtod.
In short, Python 2.7 goes to great lengths to accurately follow a round-away-from-zero rule.
In Python 3.3, it goes to equally great length to accurately follow a round-to-even rule.
Here's a little additional detail on the _Py_dg_dtoa function. Python calls the float to string function because it implements an algorithm that gives the shortest possible string representation among equal alternatives. In Python 2.6, for example, the number 1.1 shows up as 1.1000000000000001, but in Python 2.7 and later, it is simply 1.1. David Gay's sophisticated dtoa.c algorithm gives "the-result-that-people-expect" without forgoing accuracy.
That string conversion algorithm tends to make-up for some of the issues that plague any implementation of round() on binary floating point numbers (i.e. it less rounding of 4.35 start with 4.35 instead of 4.3499999999999996447286321199499070644378662109375).
That and the rounding mode (round-half-even vs round-away-from-zero) are the essential differences between the Python and Ruby round() functions.

The fundamental difference is:
Python: Convert to decimal and then round
Ruby: Round and then convert to decimal
Ruby is rounding it from the original floating point bit string, but after operating on it with 10n. You can't see the original binary value without looking very closely. The values are inexact because they are binary, and we are used to writing in decimal, and as it happens almost all of the decimal fraction strings we are likely to write do not have an exact equivalence as a base 2 fraction string.
In particular, 0.45 looks like this:
01111111101 1100110011001100110011001100110011001100110011001101
In hex, that is 3fdccccccccccccd.
It repeats in binary, the first unrepresented digit is 0xc, and the clever decimal input conversion has accurately rounded this very last fractional digit to 0xd.
This means that inside the machine, the value is greater than 0.45 by roughly 1/250. This is obviously a very, very small number but it's enough to cause the default round-nearest algorithm to round up instead of to the tie-breaker of even.
Both Python and Ruby are potentially rounding more than once as every operation effectively rounds into the least significant bit.
I'm not sure I agree that Ruby does what a human would do. I think Python is approximating what decimal arithmetic would do. Python (depending on version) is applying round-nearest to the decimal string and Ruby is applying the round nearest algorithm to a computed binary value.
Note that we can see here quite clearly the reason people say that FP is inexact. It's a reasonably true statement, but it's more true to say that we simply can't convert accurately between binary and most decimal fractions. (Some do: 0.25, 0.5, 0.75, ...) Most simple decimal numbers are repeating numbers in binary, so we can never store the exact equivalent value. But, every value we can store is known exactly and all arithmetic performed on it is performed exactly. If we wrote our fractions in binary in the first place our FP arithmetic would be considered exact.

Ruby doesn't cheat. It just chose another way to implement round.
In Ruby, 9.45.round(1) is almost equivalent to (9.45*10.0).round / 10.0.
irb(main):001:0> printf "%.20f", 9.45
9.44999999999999928946=> nil
irb(main):002:0> printf "%.20f", 9.45*10.0
94.50000000000000000000=> nil
So
irb(main):003:0> puts 9.45.round(1)
9.5
If we use such way in Python, we will get 9.5 as well.
>>> round(9.45, 1)
9.4
>>> round(9.45*10)/10
9.5

Why do simple math operations on floating point return unexpected (inaccurate) results in VB.Net and Python?

x = 4.2 - 0.1
vb.net gives 4.1000000000000005
python gives 4.1000000000000005
Excel gives 4.1
Google calc gives 4.1
What is the reason this happens?

Float/double precision.
You must remember that in binary, 4.1 = 4 + 1/10. 1/10 is an infinitely repeating sum in binary, much like 1/9 is an infinite sum in decimal.

>>> x = 4.2 - 0.1
>>> x
4.1000000000000005
>>>>print(x)
4.1
This happens because of how numbers are stored internally.
Computers represent numbers in binary, instead of decimal, as us humans are used to. With floating point numbers, computers have to make an approximation to the closest binary floating point value.
Almost all machines today (November 2000) use IEEE-754 floating point arithmetic, and almost all platforms map Python floats to IEEE-754 “double precision”. 754 doubles contain 53 bits of precision, so on input the computer strives to convert 0.1 to the closest fraction it can of the form J/2***N* where J is an integer containing exactly 53 bits.
If you print the number, it will show the approximation, truncated to a normal value. For example, the real value of 0.1 is 0.1000000000000000055511151231257827021181583404541015625.
If you really need a base 10 based number (if you don't know the answer to this question, you don't), you could use (in Python) decimal.Decimal:
>>> from decimal import Decimal
>>> Decimal("4.2") - Decimal("0.1")
Decimal("4.1")
Binary floating-point arithmetic holds many surprises like this. The problem with “0.1” is explained in precise detail below, in the “Representation Error” section. See The Perils of Floating Point for a more complete account of other common surprises.
As that says near the end, “there are no easy answers.” Still, don’t be unduly wary of floating-point! The errors in Python float operations are inherited from the floating-point hardware, and on most machines are on the order of no more than 1 part in 2**53 per operation. That’s more than adequate for most tasks, but you do need to keep in mind that it’s not decimal arithmetic, and that every float operation can suffer a new rounding error.
While pathological cases do exist, for most casual use of floating-point arithmetic you’ll see the result you expect in the end if you simply round the display of your final results to the number of decimal digits you expect. str() usually suffices, and for finer control see the str.format() method’s format specifiers in Format String Syntax.

There is no problem, really. It is just the way floats work (their internal binary representation). Anyway:
>>> from decimal import Decimal
>>> Decimal('4.2')-Decimal('0.1')
Decimal('4.1')

In vb.net, you can avoid this problem by using Decimal type instead:
Dim x As Decimal = 4.2D - 0.1D
The result is 4.1 .

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.