Rounding in Python to avoid machine precision errors

Rounding in Python to avoid machine precision errors - python

I would like to check if a float is a multiple of another float, but am running into issues with machine precision. For example:
t1 = 0.02
factor = 0.01
print(t1%factor==0)
The above outputs True, but
t2 = 0.030000000000000002
print(round(t2,5)%factor==0)
This outputs False. At some points in my code the number I am checking develops these machine precision errors, and I thought I could fix the issue simply by rounding it (I need 5 decimal places for later in my code, but it also doesn't work if I just round it to 2 decimal places).
Any ideas why the above check round(t2,5)%factor==0 doesn't work as expected, and how I can fix it?

It doesn't work as expected because checking floats for equality almost never works as expected. A quick fix would be to use math.isclose. This allows you to adjust your tolerance as well. Remember that when doing arithmetic mod r, r is equivalent to 0, so you should check if you're close to 0 or r.
import math
t1 = 0.02
factor = 0.01
res = t1 % factor
print(math.isclose(res, 0) or math.isclose(res, factor))
This is pretty quick and dirty and you will want to make sure your tolerances are working correctly and equivalently for both of those checks.

You should use the decimal module. The decimal module provides support for fast correctly-rounded decimal floating point arithmetic.
import decimal
print( decimal.Decimal('0.03') % decimal.Decimal('0.01') == decimal.Decimal('0') )
Gives :
True

Generally, floats in Python are... messed up, for the lack of a better word. And they can act in very unexpected ways. (You can read more about that behaviour here.)
For your goal however, a better way is this:
t2 = 0.03000003
factor = 0.01
precision = 10000 # 4 digits
print(int(t2*precision)%int(factor*precision)==0)
Moving the maths to an integer-based calculation solves most of those issues.

Related

Extremely low values from NumPy

I am attempting to do a few different operations in Numpy (mean and interp), and with both operations I am getting the result 2.77555756156e-17 at various times, usually when I'm expecting a zero. Even attempting to filter these out with array[array < 0.0] = 0.0 fails to remove the values.
I assume there's some sort of underlying data type or environment error that's causing this. The data should all be float.
Edit: It's been helpfully pointed out that I was only filtering out the values of -2.77555756156e-17 but still seeing positive 2.77555756156e-17. The crux of the question is what might be causing these wacky values to appear when doing simple functions like interpolating values between 0-10 and taking a mean of floats in the same range, and how can I avoid it without having to explicitly filter the arrays after every statement.

You're running into numerical precision, which is a huge topic in numerical computing; when you do any computation with floating point numbers, you run the risk of running into tiny values like the one you've posted here. What's happening is that your calculations are resulting in values that can't quite be expressed with floating-point numbers.
Floating-point numbers are expressed with a fixed amount of information (in Python, this amount defaults to 64 bits). You can read more about how that information is encoded on the very good Floating point Wikipedia page. In short, some calculation that you're performing in the process of computing your mean produces an intermediate value that cannot be precisely expressed.
This isn't a property of numpy (and it's not even really a property of Python); it's really a property of the computer itself. You can see this is normal Python by playing around in the repl:
>>> repr(3.0)
'3.0'
>>> repr(3.0 + 1e-10)
'3.0000000001'
>>> repr(3.0 + 1e-18)
'3.0'
For the last result, you would expect 3.000000000000000001, but that number can't be expressed in a 64-bit floating point number, so the computer uses the closest approximation, which in this case is just 3.0. If you were trying to average the following list of numbers:
[3., -3., 1e-18]
Depending on the order in which you summed them, you could get 1e-18 / 3., which is the "correct" answer, or zero. You're in a slightly stranger situation; two numbers that you expected to cancel didn't quite cancel out.
This is just a fact of life when you're dealing with floating point mathematics. The common way of working around it is to eschew the equals sign entirely and to only perform "numerically tolerant comparison", which means equality-with-a-bound. So this check:
a == b
Would become this check:
abs(a - b) < TOLERANCE
For some tolerance amount. The tolerance depends on what you know about your inputs and the precision of your computer; if you're using a 64-bit machine, you want this to be at least 1e-10 times the largest amount you'll be working with. For example, if the biggest input you'll be working with is around 100, it's reasonable to use a tolerance of 1e-8.

You can round your values to 15 digits:
a = a.round(15)
Now the array a should show you 0.0 values.
Example:
>>> a = np.array([2.77555756156e-17])
>>> a.round(15)
array([ 0.])

This is most likely the result of floating point arithmetic errors. For instance:
In [3]: 0.1 + 0.2 - 0.3
Out[3]: 5.551115123125783e-17
Not what you would expect? Numpy has a built in isclose() method that can deal with these things. Also, you can see the machine precision with
eps = np.finfo(np.float).eps
So, perhaps something like this could work too:
a = np.array([[-1e-17, 1.0], [1e-16, 1.0]])
a[np.abs(a) <= eps] = 0.0

Python, infinite floats?

What I wanted to do:
Paradox: Suppose Peter Parker were running to catch a bus. To reach it, he’d first need to get halfway there. Before that, he’d need to get a quarter of the way there……before a quarter, an eighth; before an eighth, a 16th; and so on. Since the distance can be halved infinitely, he’d be trying to complete an infinite number of tasks… WHICH WOULD BE LOGICALLY IMPOSSIBLE!
I tried to to resolve this paradox using Python
I have some questions:
How can I get a number to have no limitations with decimals? Python limits the numbers of decimals, I think to 12, How to make that number infinite?
Aparrently there is no way to make the float decimals infinite, the closest I could get was using this
from decimal import Decimal
Is this the correct way of asking the user for an input in numbers?
Code modified
from decimal import Decimal
def infinite_loop():
x = 0;
number = Decimal(raw_input())
while x != number:
x = x + number
number = number/2
print x
infinite_loop()

What you ask is impossible. There are no "infinite precision" floating point values in the real world of finite computing systems. If there were, a single floating point value could consume all of the system's resources. pi * d? Ooops!! pi is infinite. There goes the system!
What you can do, however, is get arbitrary precision decimal values. They're still finite, but you can choose how much precision you want (and are willing to pay for). E.g.:
>>> from decimal import Decimal
>>> x = Decimal('1.' + '0' * 200)
>>> x
Decimal('1.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000')
Now you have 200 digits of precision. Not enough? Go 400. 800. However many you like. As long as that's a finite, practical value.

If you want "infinite" precision (e.g. a decimal number that be extended as far as you have memory for), either use Python's builtin module Decimal or for more heavy computation, mpmath:
import mpmath as mp
mp.mp.dps = 100
print mp.sqrt(mp.mpf(2))
>> 1.414213562373095048801688724209698078569671875376948073176679737990732478462107038850387534327641573

generate random numbers truncated to 2 decimal places

I would like to generate uniformly distributed random numbers between 0 and 0.5, but truncated to 2 decimal places.
without the truncation, I know this is done by
import numpy as np
rs = np.random.RandomState(123456)
set = rs.uniform(size=(50,1))*0.5
could anyone help me with suggestions on how to generate random numbers up to 2 d.p. only? Thanks!

A float cannot be truncated (or rounded) to 2 decimal digits, because there are many values with 2 decimal digits that just cannot be represented exactly as an IEEE double.
If you really want what you say you want, you need to use a type with exact precision, like Decimal.
Of course there are downsides to doing that—the most obvious one for numpy users being that you will have to use dtype=object, with all of the compactness and performance implications.
But it's the only way to actually do what you asked for.
Most likely, what you actually want to do is either Joran Beasley's answer (leave them untruncated, and just round at print-out time) or something similar to Lauritz V. Thaulow's answer (get the closest approximation you can, then use explicit epsilon checks everywhere).
Alternatively, you can do implicitly fixed-point arithmetic, as David Heffernan suggests in a comment: Generate random integers between 0 and 50, keep them as integers within numpy, and just format them as fixed point decimals and/or convert to Decimal when necessary (e.g., for printing results). This gives you all of the advantages of Decimal without the costs… although it does open an obvious window to create new bugs by forgetting to shift 2 places somewhere.

decimals are not truncated to 2 decimal places ever ... however their string representation maybe
import numpy as np
rs = np.random.RandomState(123456)
set = rs.uniform(size=(50,1))*0.5
print ["%0.2d"%val for val in set]

How about this?
np.random.randint(0, 50, size=(50,1)).astype("float") / 100
That is, create random integers between 0 and 50, and divide by 100.
EDIT:
As made clear in the comments, this will not give you exact two-digit decimals to work with, due to the nature of float representations in memory. It may look like you have the exact float 0.1 in your array, but it definitely isn't exactly 0.1. But it is very very close, and you can get it closer by using a "double" datatype instead.
You can postpone this problem by just keeping the numbers as integers, and remember that they're to be divided by 100 when you use them.
hundreds = random.randint(0, 50, size=(50, 1))
Then at least the roundoff won't happen until at the last minute (or maybe not at all, if the numerator of the equation is a multiple of the denominator).

I managed to find another alternative:
import numpy as np
rs = np.random.RandomState(123456)
set = rs.uniform(size=(50,2))
for i in range(50):
for j in range(2):
set[i,j] = round(set[i,j],2)

A "round"ed number multiplied by 0.01 results in x.y00000000000001 and not x.y?

The reason I'm asking this is because there is a validation in OpenERP that it's driving me crazy:
>>> round(1.2 / 0.01) * 0.01
1.2
>>> round(12.2 / 0.01) * 0.01
12.200000000000001
>>> round(122.2 / 0.01) * 0.01
122.2
>>> round(1222.2 / 0.01) * 0.01
1222.2
As you can see, the second round is returning an odd value.
Can someone explain to me why is this happening?

This has in fact nothing to with round, you can witness the exact same problem if you just do 1220 * 0.01:
>>> 1220*0.01
12.200000000000001
What you see here is a standard floating point issue.
You might want to read what Wikipedia has to say about floating point accuracy problems:
The fact that floating-point numbers cannot precisely represent all real numbers, and that floating-point operations cannot precisely represent true arithmetic operations, leads to many surprising situations. This is related to the finite precision with which computers generally represent numbers.
Also see:
Numerical analysis
Numerical stability
A simple example for numerical instability with floating-point:
the numbers are finite. lets say we save 4 digits after the dot in a given computer or language.
0.0001 multiplied with 0.0001 would result something lower than 0.0001, and therefore it is impossible to save this result!
In this case if you calculate (0.0001 x 0.0001) / 0.0001 = 0.0001, this simple computer will fail in being accurate because it tries to multiply first and only afterwards to divide. In javascript, dividing with fractions leads to similar inaccuracies.

The float type that you are using stores binary floating point numbers. Not every decimal number is exactly representable as a float. In particular there is no exact representation of 1.2 or 0.01, so the actual number stored in the computer will differ very slightly from the value written in the source code. This representation error can cause calculations to give slightly different results from the exact mathematical result.
It is important to be aware of the possibility of small errors whenever you use floating point arithmetic, and write your code to work well even when the values calculated are not exactly correct. For example, you should consider rounding values to a certain number of decimal places when displaying them to the user.
You could also consider using the decimal type which stores decimal floating point numbers. If you use decimal then 1.2 can be stored exactly. However, working with decimal will reduce the performance of your code. You should only use it if exact representation of decimal numbers is important. You should also be aware that decimal does not mean that you'll never have any problems. For example 0.33333... has no exact representation as a decimal.

There is a loss of accuracy from the division due to the way floating point numbers are stored, so you see that this identity doesn't hold
>>> 12.2 / 0.01 * 0.01 == 12.2
False
bArmageddon, has provided a bunch of links which you should read, but I believe the takeaway message is don't expect floats to give exact results unless you fully understand the limits of the representation.
Especially don't use floats to represent amounts of money! which is a pretty common mistake
Python also has the decimal module, which may be useful to you

Others have answered your question and mentioned that many numbers don't have an exact binary fractional representation. If you are accustomed to working only with decimal numbers, it can seem deeply weird that a nice, "round" number like 0.01 could be a non-terminating number in some other base. In the spirit of "seeing is believing," here's a little Python program that will print out a binary representation of any number to any desired number of digits.
from decimal import Decimal
n = Decimal("0.01") # the number to print the binary equivalent of
m = 1000 # maximum number of digits to print
p = -1
r = []
w = int(n)
n = abs(n) - abs(w)
while n and -p < m:
s = Decimal(2) ** p
if n >= s:
r.append("1")
n -= s
else:
r.append("0")
p -= 1
print "%s.%s%s" % ("-" if w < 0 else "", bin(abs(w))[2:],
"".join(r), "..." if n else "")

Doing arithmetic with up to two decimal places in Python?

I have two floats in Python that I'd like to subtract, i.e.
v1 = float(value1)
v2 = float(value2)
diff = v1 - v2
I want "diff" to be computed up to two decimal places, that is compute it using %.2f of v1 and %.2f of v2. How can I do this? I know how to print v1 and v2 up to two decimals, but not how to do arithmetic like that.
The particular issue I am trying to avoid is this. Suppose that:
v1 = 0.982769777778
v2 = 0.985980444444
diff = v1 - v2
and then I print to file the following:
myfile.write("%.2f\t%.2f\t%.2f\n" %(v1, v2, diff))
then I will get the output: 0.98 0.99 0.00, suggesting that there's no difference between v1 and v2, even though the printed result suggests there's a 0.01 difference. How can I get around this?
thanks.

You said in a comment that you don't want to use decimal, but it sounds like that's what you really should use here. Note that it isn't an "extra library", in that it is provided by default with Python since v2.4, you just need to import decimal. When you want to display the values you can use Decimal.quantize to round the numbers to 2 decimal places for display purposes, and then take the difference of the resulting decimals.
>>> v1 = 0.982769777778
>>> v2 = 0.985980444444
>>> from decimal import Decimal
>>> d1 = Decimal(str(v1)).quantize(Decimal('0.01'))
>>> d2 = Decimal(str(v2)).quantize(Decimal('0.01'))
>>> diff = d2 - d1
>>> print d1, d2, diff
0.98 0.99 0.01

I find round is a good alternative.
a = 2.000006
b = 7.45001
c = b - a
print(c) #This gives 5.450004
print(round(c, 2)) ##This gives 5.45

I've used poor man's fixed point in the past. Essentially, use ints, multiply all of your numbers by 100 and then divide them by 100 before you print.
There was a good post on similar issues on Slashdot recently.

The thing about the float type is that you can't really control what precision calculations are done with. The float type is letting the hardware do the calculations, and that typically does them the way it's most efficient. Because floats are (like most machine-optimized things) binary, and not decimal, it's not straightforward (or efficient) to force these calculations to be of a particular precision in decimal. For a fairly on-point explanation, see the last chapter of the Python tutorial. The best you can do with floats is round the result of calculations (preferably just when formatting them.)
For controlling the actual calculations and precision (in decimal notation, no less), you should consider using Decimals from the decimal module instead. It gives you much more control over pretty much everything, although it does so at a speed cost.

What do you mean "two significant figures"? Python float has about 15 sigfigs, but it does have representation errors and roundoff errors (as does decimal.Decimal.) http://docs.python.org/tutorial/floatingpoint.html might prove an interesting read for you, along with the great amount of resources out there about floating point numbers.
float usually has the right kind of precision for representing real numbers, such as physical measurements: weights, distances, durations, temperatures, etc. If you want to print out a certain way of displaying floats, use the string formatting as you suggest.
If you want to represent fixed-precision things exactly, you probably want to use ints. You'll have to keep track of the decimal place yourself, but this is often not too tough.
decimal.Decimal is recommended way too much. It can be tuned to specific, higher-than-float precision, but this is very seldom needed; I don't think I've ever seen a problem where this was the reason someone should use decimal.Decimal. decimal.Decimal lets you represent decimal numbers exactly and control how rounding works, which makes it suitable for representing money in some contexts.

You could format it using a different format string. Try '%.2g' or '%.2e'. Any decent C/C++ reference will describe the different specifiers. %.2e formats the value to three significant digits in exponential notation - the 2 means two digits following the decimal point and one digit preceding it. %.2g will result in either %.2f or %.2e depending on which will yield two significant digits in the minimal amount of space.
>>> v1 = 0.982769777778
>>> v2 = 0.985980444444
>>> print '%.2f' %v1
0.98
>>> print '%.2g' %v1
0.98
>>> print '%.2e' %v1
9.83e-01
>>> print '%.2g' %(v2-v1)
0.0032
>>> print '%.2e' %(v2-v1)
3.21e-03

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Rounding in Python to avoid machine precision errors - python

You should use the decimal module. The decimal module provides support for fast correctly-rounded decimal floating point arithmetic. import decimal print( decimal.Decimal('0.03') % decimal.Decimal('0.01') == decimal.Decimal('0') ) Gives : True

Related

Extremely low values from NumPy

Python, infinite floats?

generate random numbers truncated to 2 decimal places

A "round"ed number multiplied by 0.01 results in x.y00000000000001 and not x.y?

Doing arithmetic with up to two decimal places in Python?

Categories

Resources