Floating point math accuracy in Python 2.7

Floating point math accuracy in Python 2.7 - python

I am having a bit of trouble understanding some results I am getting while operating with Python 2.7.
>>> x=1
>>> e=1e-20
>>> x+e
1.0
>>> x+e-x
0.0
>>> e+x-x
0.0
>>> x-x+e
1e-20
This is copied directly from Python. I am having a class on how to program on Python and I do not understand the disparity of results (x+e==1, x-x+e==1e-20 but x+e-x==0 and e+x-x==0).
I have already read the Python tutorial on Representation Errors, but I believe none of that was mentioned there
Thanks in advance

Floating-point addition is not associative.
x+e-x is grouped as (x+e)-x. It adds x and e, rounds the result to the nearest representable number (which is 1), then subtracts x from the result and rounds again, producing 0.
x-x+e is grouped as (x-x)+e. It subtracts x from x, producing 0, and rounds it to the nearest representable number, which is 0. It then adds e to 0, producing e, and rounds it to the nearest representable number, which is e.

This is because of the way that computers represent floating point numbers.
This is all really in binary format but let's pretend that it works with base 10 numbers because that's a lot easier for us to relate to.
A floating point number is expressed on the form 0.x*10^y where x is a 10-digit number (I'm omitting trailing zeroes here) and y is the exponent. This means that the number 1.0 is expressed as 0.1*10^1 and the number 0.1 as 0.1*10^0.
To add these two numbers together we need to make sure that they have the same exponent. We can do this easily by shifting the numbers back and forth, i.e. we change 0.1*10^0 to 0.01*10^1 and then we add the together to get 0.11*10^1.
When we have 0.1*10^1 and 0.1*10^-19 (1e-20) we will shift 0.1*10^-19 20 steps, meaning that the 1 will fall outside the range of our 10 digit number so we will end up with 0.1*10^1 + 0.0*10^1 = 0.1*10^1.
The reason you end up with 1e-20 in your last example is because addition is done from left to right, so we subtract 0.1*10^1 from 0.1*10^1 ending up with 0.0*10^0 and add 0.1*10^-19 to that, which is a special case where we don't need to shift any of them because one of them is exactly zero.

Related

Convert scientific to decimal - dynamic float precision?

I have a random set of numbers in a SQL database:
1.2
0.4
5.1
0.0000000000232
1
7.54
0.000000000000006534
The decimals way below zero are displayed as scientific notation
num = 0.0000000000232
print(num)
> 2.23e-11
But that causes the rest of my code to bug out as the api behind it expects a decimal number. I checked it as I increased the precision with :.20f - that works fine.
Since the very small numbers are not constant with their precision, It would be unwise to simply set a static .20f.
What is a more elegant way to translate this to the correct decimal, always dynamic with the precision?

If Python provides a way to do this, they've hidden it very well. But a simple function can do it.
def float_to_str(x):
to_the_left = 1 + floor(log(x, 10))
to_the_right = sys.float_info.dig - to_the_left
if to_the_right <= 0:
s = str(int(x))
else:
s = format(x, f'0.{to_the_right}f').rstrip('0')
return s
>>> for num in [1.2, 0.4, 5.1, 0.0000000000232, 1, 7.54, 0.000000000000006534]:
print(float_to_str(num))
1.2
0.4
5.1
0.0000000000232
1.
7.54
0.000000000000006534
The first part uses the logarithm base 10 to figure out how many digits will be on the left of the decimal point, or the number of zeros to the right of it if the number is negative. To find out how many digits can be to the right, we take the total number of significant digits that a float can hold as given by sys.float_info.dig which should be 15 on most Python implementations, and subtract the digits on the left. If this number is negative there won't be anything but garbage after the decimal point, so we can rely on integer conversion instead - it never uses scientific notation. Otherwise we simply conjure up the proper string to use with format. For the final step we strip off the redundant trailing zeros.
Using integers for large numbers isn't perfect because we lose the rounding that naturally occurs with floating point string conversion. float_to_str(1e25) for example will return '10000000000000000905969664'. Since your examples didn't contain any such large numbers I didn't worry about it, but it could be fixed with a little more work. For the reasons behind this see Is floating point math broken?

How to disable rounding in Decimal python?

When calculating, I get the wrong last digit of the number. At first, I just calculated with an accuracy of one digit more than I needed, and then I just removed the last rounded digit with a slice. But then I noticed that sometimes Decimal rounds more than one digit. Is it possible to calculate without rounding?
For example
from decimal import Decimal as dec, Context, setcontext, ROUND_DOWN
from math import log
def sqr(x):
return x*x
def pi(n):
getcontext().prec=n+1
a=p=1
b=dec(1)/dec(2).sqrt()
t=dec(1)/dec(4)
for _ in range(int(log(n,2))):
an=(a+b)/2
b=(a*b).sqrt()
t-=p*sqr(a-an)
p*=2
a=an
return sqr(a+b)/(4*t)
If I try pi (12) I get "3.141592653591" (the last 2 digits are wrong), but if I try pi(13), they both change to the correct ones - "3.1415926535899".

It's called Roundoff Error and is unavoidable when working with Floating-Point Arithmetic. You can write the following code in your Python REPL and should get, interestingly, False.
0.2 + 0.1 == 0.3 # False
It's because the last bits of float numbers are, actually, garbage. One way you can work around this is by using more terms in your series and, then, rounding the result to the wanted precision.
If you want to understand this deeper, you can read these two links I've attached and, maybe, some Numerical Computing textbook.

Python behaviour when approaching 0.0

I have written a Python program that needs to run until the initial value gets to 0 or until another case is met. I wanted to say anything useful about the amount of loops the program would go through until it would reach 0 as this is necessary to find the solution to another program. But for some reason I get the following results, depending on the input variables and searching the Internet thus far has not helped solve my problem. I wrote the following code to simulate my problem.
temp = 1
cool = 0.4999 # This is the important variable
count = 0
while temp != 0.0:
temp *= cool
count += 1
print(count, temp)
print(count)
So, at some point I'd expect the script to stop and print the amount of loops necessary to get to 0.0. And with the code above that is indeed the case. After 1075 loops the program stops and returns 1075. However, if I change the value of cool to something above 0.5 (for example 0.5001) the program seems to run indefinitely. Why is this the case?

The default behavior in many implementations of floating-point arithmetic is to round the real-number arithmetic result to the nearest number representable in floating-point arithmetic.
In any floating-point number format, there is some smallest positive representable value, say s. Consider how s * .25 would be evaluated. The real number result is .25•s. This is not a representable value, so it must be rounded to the nearest representable number. It is between 0 and s, so those are the two closest representable values. Clearly, it is closer to 0 than it is to s, so it is rounded to 0, and that is the floating-point result of the operation.
Similarly, when s is multiplied by any number between 0 and ½, the result is rounded to zero.
In contrast, consider s * .75. The real number result is .75•s. This is closer to s than it is to 0, so it is rounded to s, and that is the floating-point result of the operation.
Similarly, when s is multiplied by any number between ½ and 1, the result is rounded to s.
Thus, if you start with a positive number and continually multiply by some fraction between 0 and 1, the number will get smaller and smaller until it reaches s, and then it will either jump to 0 or remain at s depending on whether the fraction is less than or greater than ½.
If the fraction is exactly ½, .5•s is the same distance from 0 and s. The usual rule for breaking ties favors the number with the even low bit in its fraction portion, so the result is rounded to 0.
Note that Python does not fully specific floating-point semantics, so the behaviors may vary from implementation to implementation. Most commonly, IEEE-754 binary64 is used, in which the smallest representable positive number is 2−1074.

Extract significant digit from 0.0007

I want to extract significant digit of 7 from XX=0.0007
The code is as follows
XX=0.0007
enX1=XX//10**np.floor(np.log10(XX));
But XX becomes 6not 7. Can anyone help me?

In some sense, you were lucky to start out with the value 0.0007. As it turns out, that value is one of the (many!) decimal values that cannot be represented exactly in a floating point format.
A floating point number gets usually stored in the common IEEE-754 format as powers of 2. Just like a whole number such as 175 is stored as the sum of bits with increasing powers-of-two values (165 = 128 + 32 + 4 + 1), fractions are stored as a sum of 1/power-of-two numbers. That means that a value of 1/2, 1/4, and 1/65536 can be stored exactly (and sums thereof, such as 3/4), but your 0.0007 can not. Its closest value is actually 0.0000699999999999999992887633748495. ("Closest" in the sense that adding just one more one-bit at the end will make it slightly larger than 0.0007, and the difference is ever so slightly larger than this lower one.)
In your calculation, you use the double divide slash //, which instructs Python to do an integer division and discard the fractional part. So while the intermediate calculation is correct and you get something like 6.99999..., this gets truncated and you end up with 6.
If you use a single slash, the result will keep its (exact!) decimals but Python will represent it as 7.0000, give or take a few zeroes. By default, Python displays only a small number of decimals.
Note that this still "is" not the exact value 7. The calculation starts out with an imprecise number, and although there may be some intermediate rounding here and there, there is only a small chance you end up with a precise integer. Again, not for all decimals, but for a large number of them. Other fractional values may be stored fractionally larger than the value you enter – 0.0004, for examplea – but the underlying 'problem' of accuracy is also present there. It's just not as visible as with yours.
If you want a nearest integer result, use a single divide slash for the exact calculation, followed by round to force the number to the nearest integer anyway.
a To be precise, as somewhere about 0.000400000000000000019168694409544. After your routine, Python will display it as 4 but internally it's still just a bit larger than that.

Python: Why does have 2^-n work for n>52 and not 1+2^-n-1?

I'm pretty new to python, and I've made a table which calculates T=1+2^-n-1 and C=2^n, which both give the same values from n=40 to n=52, but for n=52 to n=61 I get 0.0 for T, whereas C gives me progressively smaller decimals each time - why is this?
I think I understand why T becomes 0.0, because of python using binary floating point and because of the machine epsilon value - but I'm slightly confused as to why C doesn't also become 0.0.
import numpy as np
import math
t=np.zeros(21)
c=np.zeros(21)
for n in range(40,61):
m=n-40
t[m]=1+2**(-n)-1
c[m]=2**(-n)
print (n,t[m],c[m])

The "floating" in floating point means that values are represented by storing a fixed number of leading digits and a scale factor, rather than assuming a fixed scale (which would be fixed point).
2**-53 only takes one (binary) digit to represent (not including the scale), but 1+2**-53 would take 54 to represent exactly. Python floats only have 53 binary digits of precision; 2**-53 can be represented exactly, but 1+2**-53 gets rounded to exactly 1, and subtracting 1 from that gives exactly 0. Thus, we have
>>> 2**-53
1.1102230246251565e-16
>>> 1+(2**-53)-1
0.0
Postscript: you might wonder why 2**-53 displays as a value not equal to the exact mathematical value when I said it was exact. That's due to the float->string conversion logic, which only keeps enough decimal digits to reconstruct the original float (instead of printing a bunch of digits at the end that are usually just noise).

The difference between both is indeed due to floating-point representation. Indeed, if you perform 1 + X where X is a very very small number, then the floating-point representation sets its exponent value to 0 and the precision is ensured by the mantissa, which is 52-bit on a 64-bit computer. Therefore, 1 + 2^(-X) if X > 52 is equal to 1. However, even 2^-100 can be represented in double-precision floating-point, so you can see C decrease for a larger number of samples.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.