This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 7 years ago.
This might seem really silly. But I am new to python and like to use equality conditions in my program, and has hit a very surprising road block. While the practical issue here is that the last condition r==rmax is not satisfied and I will miss out on an iteration of the loop, but that is not what is worrying me.
Rather than (trivial) work arounds, can someone explain to me what is going on in simple terms? (Also why the numbers turn out the same no matter how many times I run this loop, therefore it is something systematic and not something probabilistic).
And a proper way to make sure this does not happen (What I mean by proper is a programming practice I should adopt in all my coding, so that such unintentional discrepancy does not occur ever again)? I mean such loops are omnipresent in my codes and it makes me worried.
It seems I cannot trust numbers in python, which would make it useless as a computational tool.
PS : I am working on a scientific computing project with Numpy.
>>> while r<=r_max:
... print repr(r)
... r = r + r_step
...
2.4
2.5
2.6
2.7
2.8000000000000003
2.9000000000000004
3.0000000000000004
3.1000000000000005
3.2000000000000006
3.3000000000000007
3.400000000000001
3.500000000000001
3.600000000000001
3.700000000000001
3.800000000000001
3.9000000000000012
The simple answer is that floating-point numbers, as usually represented in computing, aren't exact, and you can't treat them as if they were exact. Treat them as if they're fuzzy; see if they're within a certain range, not whether they "equal" something.
The numbers turn out the same because it's all deterministic. The calculations are being done in exactly the same way each time. This isn't a case of random errors, it's a case of the machine representing floating-point numbers in an inexact way.
Python has some exact datatypes you can use instead of inexact floats; see the decimal and fractions modules.
There's a classic article called "What Every Computer Scientist Should Know About Floating-Point Arithmetic"; google it and pick any link you like.
Pierre G. is correct. Because computer calculate number in binary, it cannot present a lot float number exactly. But it should be precise with in certain digits depends the data type you use.
For your case, I think maybe you could use round(number, digits) function to get a round number and then compare it.
It's just the matter of float arithmetic. Actually If you know about the representation of floating numbers in the computer memory, then the representation of certain numbers though we consider to be whole integer, is not stored as a whole integer. It's stored with certain precision(in terms of number of digits after decimal point) only. This problem will remain always, whenever you are doing mathematical programming. I suggest you to use comparision, which can accept tolerance value. Numpy has such methods, which facilitates such comparision.
Related
I am learning python recently and have read about infinite bit representation of integers in python, but did not actually understand the technology or math behind it. So anyone here knows how does it work?
I have read about it through this link while studying bitwise operators:
https://wiki.python.org/moin/BitwiseOperators
While I do not know exactly how it's implemented in Python, what's important is the recognition that although arbitrarily large integers can be handled in Python, it can't handle infinite numbers. So the wiki is a little misleading.
Any integer is finite and can be expressed in a finite form. So one way to deal with it is to have the convention that the first bit in the array is what is repeated infinitely. For example:
-5 = 11111011 = ...1111111011
+3 = 00000011 = ...0000000011
Where for these examples, I showed the fixed 8-bit representation of each number along with its "infinite" representation.
These are basically BitArrays from which we can then implement arithmetic on top of.
ok, 2^32, 2^32+1, 2^32+2 isn't precise when stored as a float. however, 2^64 is, is there function out that i can feed it a number, and it tell me if a float will store it precisely. the only reason i am interested in floats is because i want to store very long digit numbers in as few bytes as possible. 2^64 is 20 digits. i found this article, but i'm not sure if it applies since it talks about doubles and i know python does things differently?
How to Calculate Double + Float Precision
This is complex topic. Not only float/double may not be good at representing specific value, computations can add their own error. There is whole branch of mathematics that tries to study this problem.
Solution generally is to use integers. If integers are too small in given language there is always option for software emulated integers of any length.
There are solid libraries for Python that support just that.
>>> float(str(0.65000000000000002))
0.65000000000000002
>>> float(str(0.47000000000000003))
0.46999999999999997 ???
What is going on here?
How do I convert 0.47000000000000003 to string and the resultant value back to float?
I am using Python 2.5.4 on Windows.
str(0.47000000000000003) give '0.47' and float('0.47') can be 0.46999999999999997.
This is due to the way floating point number are represented (see this wikipedia article)
Note: float(repr(0.47000000000000003)) or eval(repr(0.47000000000000003)) will give you the expected result, but you should use Decimal if you need precision.
float (and double) do not have infinite precision. Naturally, rounding errors occur when you operate on them.
This is a Python FAQ
The same question comes up quite regularly in comp.lang.python also.
I think reason it is a FAQ is that because python is perfect in all other respects ;-), we expect it to perform arithmetic perfectly - just like we were taught at school. However, as anyone who has done a numerical methods course will tell you, floating point numbers are a very long way from perfect.
Decimal is a good alternative and if you want more speed and more options gmpy is great too.
by this example
I think this is an error in Python when you devide
>>> print(int(((48/5.0)-9)*5))
2
the easy way, I solve this problem by this
>>> print(int(round(((48/5.0)-9)*5,2)))
3
I'm porting a MATLAB code to Python 3.5.1 and I found a float round-off issue.
In MATLAB, the following number is rounded up to the 6th decimal place:
fprintf(1,'%f', -67.6640625);
-67.664063
In Python, on the other hand, the following number is rounded off to the 6th decimal place:
print('%f' % -67.6640625)
-67.664062
Interestingly enough, if the number is '-67.6000625', then it is rounded up even in Python:
print('%f' % -67.6000625)
-67.600063
... Why does this happen?
What are the criteria to round-off/up in Python?
(I believe this has something to do with handling hexadecimal values.)
More importantly, how can I prevent this difference?
I'm supposed to create a python code which can reproduce exactly the same output as MATLAB produces.
The reason for the python behavior has to do with how floating point numbers are stored in a computer and the standardized rounding rules defined by IEEE, which defined the standard number formats and mathematical operations used on pretty much all modern computers.
The need to store numbers efficiently in binary on a computer has lead computers to use floating-point numbers. These numbers are easy for processors to work with, but have the disadvantage that many decimal numbers cannot be exactly represented. This results in numbers sometimes being a little off from what we think they should be.
The situation becomes a bit clearer if we expand the values in Python, rather than truncating them:
>>> print('%.20f' % -67.6640625)
-67.66406250000000000000
>>> print('%.20f' % -67.6000625)
-67.60006250000000704858
So as you can see, -67.6640625 is a number that can be exactly represented, but -67.6000625 isn't, it is actually a little bigger. The default rounding mode defined by the IEEE stanard for floating-point numbers says that anything above 5 should be rounded up, anything below should be rounded down. So for the case of -67.6000625, it is actualy 5 plus a small amount, so it is rounded up. However, in the case of -67.6640625, it is exactly equal to five, so a tiebreak rule comes into play. The default tiebreaker rule is round to the nearest even number. Since 2 is the nearest event number, it rounds down to two.
So Python is following the approach recommended by the floating-point standard. The question, then, is why your version of MATLAB doesn't do this. I tried it on my computer with 64bit MATLAB R2016a, and I got the same result as in Python:
>> fprintf(1,'%f', -67.6640625)
-67.664062>>
So it seems like MATLAB was, at some point, using a different rounding approach (perhaps a non-standard approach, perhaps one of the alternatives specified in the standard), and has since switched to follow the same rules as everyone else.
As i found out, decimal is is more precise at the cost of processing power.
And i found out that
getcontext().prec = 2
Decimal(number)
also counts the numbers before the point. In my case i need it only to calculate two numbers after the point no matter how big the number is. Because sometimes i got numbers like 12345.15 and sometimes numbers like 2.53. And what if the numbers are 5 or 298.1?
Im a bit confused with all these differences between float, decimal, rounding and truncate.
My main question is:
How can i calculate with a Number like 254.12 or 15.35 with fewest resource costs? Maybe it is even possible to fake these numbers? The rounding doesn't matter but calculating with floats and 8 digits after the point and then truncating them seems like a waste of resources to me. Please correct me if im wrong.
I also know how to do Benchmarks with
import time
start_time = time.clock()
main()
print time.clock() - start_time, "seconds"
But im sure there are enough things i dont know about. Since im quite new to programming i would be very happy if someone can give me a few hints with a piece of code to work and learn with. Thank you for taking your time to read this! :)
First, please be aware that floating point operations are not necessarily as expensive as you might fear. This will depend on the CPU you are using, but for example, a single floating point operation in a mainly integer program will cost about as much as an integer operation, due to pipelining. It's like going to a bathroom at a night club. There's always a line for the girls bathroom - integer ops - but never a line for the guys - floating point ops.
On the other hand, low-power CPUs may not even include floating point support at all, making any float operation hideously expensive! So before you get all judgy about whether you should use float or integer operations, do some profiling. You mention using time.clock and comparing start with end times. You should have a look at the timeit module shipped with python.
Worse than bad performance, though, is the fact that floats don't always represent the number you want. Regardless of decimal point, if a number is large enough, or if you do the wrong operation to it, you can end up with a float that "approximates" your result without storing it exactly.
If you know that your application requires two digits beyond the decimal, I'd suggest that you write a class to implement that behavior using integer numbers. Python's integers automatically convert to big numbers when they get large, and the precision is exact. So there's a performance penalty (bignum ops are slower than integer or float ops on top-end hardware). But you can guarantee whatever behavior you want.
If your application is financial, please be aware that you are going to have to spend some time dealing with rounding issues. Everybody saw Superman 3, and now they think you're stealing their .00001 cents...
In Python, all floats are all the same size, regardless of precision, because they are all represented in a single 'double' type. This means that either way you will be 'wasting' memory (24 bytes is a tiny amount, really). Using sys.getsizeof shows this:
>>> import sys
>>> sys.getsizeof(8.13333333)
24
>>> sys.getsizeof(8.13)
24
This is also shown in the fact that if an int is too big (ints have no max value) it can't be converted into a float - you get an OverflowError:
>>> 2**1024 + 0.5
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OverflowError: long int too large to convert to float
Using Decimal is even less efficient, even when the precision is set up right. This is because class instances take up lots of space on their own, regardless of their content:
>>> import decimal
>>> sys.getsizeof(decimal.Decimal(0))
80
The prec setting actually affects the total number of digits, not only the ones after the decimal point. Regarding that, the documentation says
The significance of a new Decimal is determined solely by the number
of digits input. Context precision and rounding only come into play
during arithmetic operations.
and also
The quantize() method rounds a number to a fixed exponent. This method
is useful for monetary applications that often round results to a
fixed number of places
So these are a few things you can look for if you need to work with a fixed number of digits after the point.
Regarding performance, it seems to me that you are prematurely optimizing. You generally don't need to worry about the fastest way to do a calculation that will take less than a microsecond (unless, of course, you need to do something on the order of millions of such calculations per second). On a quick benchmark, a sum of two numbers takes 48 nanoseconds for floats and 82 nanoseconds for Decimals. The impact of that difference should be very little for most applications.