Computing E becomes chaotic at limits 10 ** 12 in Python? - python

I wrote a program that computes the base of natural logarithms (known as e in mathematics) using the following well-known formula:
e = (1 + 1.0/n) ** n
The code is:
def e_formula(lim):
n = lim
e = (1 + 1.0/n) **n
return e
I set up a test that iterates from 101 to 10100:
if __name__ == "__main__":
for i in range(1,100):
print e_formula(10**i)
However the following results around 10**11 blow up.
Actual results from shell:
2.5937424601
2.70481382942
2.71692393224
2.71814592682
2.71826823719
2.7182804691
2.71828169413
2.71828179835
2.71828205201
2.71828205323
2.71828205336
2.71852349604
2.71611003409
2.71611003409
3.03503520655
1.0
I am looking for a reason for this, either to do with the result exceeding the floating-point limit in a 32-bit machine or because of the way Python itself computes floating point numbers. I am not looking for a better solution; I just want to understand why it blows up.

This is simply due to the limited precision of floating point numbers. You get about 15 significant digits.
You are taking (1 + very_small_number). Most of the digits of very_small_number are truncated at this stage.
The **n just multiplies this error

Related

Slight difference in results between Python and Matlab [duplicate]

I am writing a program where I need to delete duplicate points stored in a matrix. The problem is that when it comes to check whether those points are in the matrix, MATLAB can't recognize them in the matrix although they exist.
In the following code, intersections function gets the intersection points:
[points(:,1), points(:,2)] = intersections(...
obj.modifiedVGVertices(1,:), obj.modifiedVGVertices(2,:), ...
[vertex1(1) vertex2(1)], [vertex1(2) vertex2(2)]);
The result:
>> points
points =
12.0000 15.0000
33.0000 24.0000
33.0000 24.0000
>> vertex1
vertex1 =
12
15
>> vertex2
vertex2 =
33
24
Two points (vertex1 and vertex2) should be eliminated from the result. It should be done by the below commands:
points = points((points(:,1) ~= vertex1(1)) | (points(:,2) ~= vertex1(2)), :);
points = points((points(:,1) ~= vertex2(1)) | (points(:,2) ~= vertex2(2)), :);
After doing that, we have this unexpected outcome:
>> points
points =
33.0000 24.0000
The outcome should be an empty matrix. As you can see, the first (or second?) pair of [33.0000 24.0000] has been eliminated, but not the second one.
Then I checked these two expressions:
>> points(1) ~= vertex2(1)
ans =
0
>> points(2) ~= vertex2(2)
ans =
1 % <-- It means 24.0000 is not equal to 24.0000?
What is the problem?
More surprisingly, I made a new script that has only these commands:
points = [12.0000 15.0000
33.0000 24.0000
33.0000 24.0000];
vertex1 = [12 ; 15];
vertex2 = [33 ; 24];
points = points((points(:,1) ~= vertex1(1)) | (points(:,2) ~= vertex1(2)), :);
points = points((points(:,1) ~= vertex2(1)) | (points(:,2) ~= vertex2(2)), :);
The result as expected:
>> points
points =
Empty matrix: 0-by-2
The problem you're having relates to how floating-point numbers are represented on a computer. A more detailed discussion of floating-point representations appears towards the end of my answer (The "Floating-point representation" section). The TL;DR version: because computers have finite amounts of memory, numbers can only be represented with finite precision. Thus, the accuracy of floating-point numbers is limited to a certain number of decimal places (about 16 significant digits for double-precision values, the default used in MATLAB).
Actual vs. displayed precision
Now to address the specific example in the question... while 24.0000 and 24.0000 are displayed in the same manner, it turns out that they actually differ by very small decimal amounts in this case. You don't see it because MATLAB only displays 4 significant digits by default, keeping the overall display neat and tidy. If you want to see the full precision, you should either issue the format long command or view a hexadecimal representation of the number:
>> pi
ans =
3.1416
>> format long
>> pi
ans =
3.141592653589793
>> num2hex(pi)
ans =
400921fb54442d18
Initialized values vs. computed values
Since there are only a finite number of values that can be represented for a floating-point number, it's possible for a computation to result in a value that falls between two of these representations. In such a case, the result has to be rounded off to one of them. This introduces a small machine-precision error. This also means that initializing a value directly or by some computation can give slightly different results. For example, the value 0.1 doesn't have an exact floating-point representation (i.e. it gets slightly rounded off), and so you end up with counter-intuitive results like this due to the way round-off errors accumulate:
>> a=sum([0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1]); % Sum 10 0.1s
>> b=1; % Initialize to 1
>> a == b
ans =
logical
0 % They are unequal!
>> num2hex(a) % Let's check their hex representation to confirm
ans =
3fefffffffffffff
>> num2hex(b)
ans =
3ff0000000000000
How to correctly handle floating-point comparisons
Since floating-point values can differ by very small amounts, any comparisons should be done by checking that the values are within some range (i.e. tolerance) of one another, as opposed to exactly equal to each other. For example:
a = 24;
b = 24.000001;
tolerance = 0.001;
if abs(a-b) < tolerance, disp('Equal!'); end
will display "Equal!".
You could then change your code to something like:
points = points((abs(points(:,1)-vertex1(1)) > tolerance) | ...
(abs(points(:,2)-vertex1(2)) > tolerance),:)
Floating-point representation
A good overview of floating-point numbers (and specifically the IEEE 754 standard for floating-point arithmetic) is What Every Computer Scientist Should Know About Floating-Point Arithmetic by David Goldberg.
A binary floating-point number is actually represented by three integers: a sign bit s, a significand (or coefficient/fraction) b, and an exponent e. For double-precision floating-point format, each number is represented by 64 bits laid out in memory as follows:
The real value can then be found with the following formula:
This format allows for number representations in the range 10^-308 to 10^308. For MATLAB you can get these limits from realmin and realmax:
>> realmin
ans =
2.225073858507201e-308
>> realmax
ans =
1.797693134862316e+308
Since there are a finite number of bits used to represent a floating-point number, there are only so many finite numbers that can be represented within the above given range. Computations will often result in a value that doesn't exactly match one of these finite representations, so the values must be rounded off. These machine-precision errors make themselves evident in different ways, as discussed in the above examples.
In order to better understand these round-off errors it's useful to look at the relative floating-point accuracy provided by the function eps, which quantifies the distance from a given number to the next largest floating-point representation:
>> eps(1)
ans =
2.220446049250313e-16
>> eps(1000)
ans =
1.136868377216160e-13
Notice that the precision is relative to the size of a given number being represented; larger numbers will have larger distances between floating-point representations, and will thus have fewer digits of precision following the decimal point. This can be an important consideration with some calculations. Consider the following example:
>> format long % Display full precision
>> x = rand(1, 10); % Get 10 random values between 0 and 1
>> a = mean(x) % Take the mean
a =
0.587307428244141
>> b = mean(x+10000)-10000 % Take the mean at a different scale, then shift back
b =
0.587307428244458
Note that when we shift the values of x from the range [0 1] to the range [10000 10001], compute a mean, then subtract the mean offset for comparison, we get a value that differs for the last 3 significant digits. This illustrates how an offset or scaling of data can change the accuracy of calculations performed on it, which is something that has to be accounted for with certain problems.
Look at this article: The Perils of Floating Point. Though its examples are in FORTRAN it has sense for virtually any modern programming language, including MATLAB. Your problem (and solution for it) is described in "Safe Comparisons" section.
type
format long g
This command will show the FULL value of the number. It's likely to be something like 24.00000021321 != 24.00000123124
Try writing
0.1 + 0.1 + 0.1 == 0.3.
Warning: You might be surprised about the result!
Maybe the two numbers are really 24.0 and 24.000000001 but you're not seeing all the decimal places.
Check out the Matlab EPS function.
Matlab uses floating point math up to 16 digits of precision (only 5 are displayed).

rounding floats in python (handling 5s)

I am trying figure out the best way to robustly round floats in python using the round half up algorithm. It seems the best way to do this is using the decimal library. However I would expect this method to carry over the rounding up of a 5 across a float. For example:
from decimal import *
Decimal('3.445').quantize(Decimal('0.1'), rounding=ROUND_HALF_UP)
The result is 3.4. What I would expect the algorithm to do is carry over the round up of the 5 such that 3.445 = 3.45 = 3.5.
Does anybody know how to do this in python? I cannot seem to find a robust way of doing this.
Rounding half up doesn't work by carrying over the round-ups from lower digits, but simply determines the half point at the given exponent. Since 3.445 % 0.1 == 0.045, which is less than half of 0.1, it would correctly round down to 3.4.
You can instead implement the desired rounding logic by rounding half up the given decimal number from the second-least significant digit to the given target exponent in a loop:
def round_half_up_carryover(d, target_exp):
exp = Decimal(10) ** (d.as_tuple()[2] + 1)
while exp <= target_exp:
d = d.quantize(exp.normalize(), rounding=ROUND_HALF_UP)
exp *= 10
return d
so that:
print(round_half_up_carryover(Decimal('3.445'), Decimal('0.1')))
would output:
3.5

Million decimal places in Python

We recently delve into infinite series in calculus and that being said, I'm having so much fun with it. I derived my own inverse tan infinte series in python and set to 1 to get pi/4*4 to get pi. I know it's not the fastest algorithm, so please let's not discuss about my algorithm. What I would like to discuss is how do I represent very very small numbers in python. What I notice is as my programs iterate the series, it stops somewhere at the 20 decimal places (give or take). I tried using decimal module and that only pushed to about 509. I want an infinite (almost) representation.
Is there a way to do such thing? I reckon no data type will be able to handle such immensity, but if you can show me a way around that, I would appreciate that very much.
Python's decimal module requires that you specify the "context," which affects how precise the representation will be.
I might recommend gmpy2 for this type of thing - you can do the calculation on rational numbers (arbitrary precision) and convert to decimal at the last step.
Here's an example - substitute your own algorithm as needed:
import gmpy2
# See https://gmpy2.readthedocs.org/en/latest/mpfr.html
gmpy2.get_context().precision = 10000
pi = 0
for n in range(1000000):
# Formula from http://en.wikipedia.org/wiki/Calculating_pi#Arctangent
numer = pow(2, n + 1)
denom = gmpy2.bincoef(n + n, n) * (n + n + 1)
frac = gmpy2.mpq(numer, denom)
pi += frac
# Print every 1000 iterations
if n % 1000 == 0:
print(gmpy2.mpfr(pi))

Prevent Rounding to Zero in Python

I have a program meant to approximate pi using the Chudnovsky Algorithm, but a term in my equation that is very small keeps being rounded to zero.
Here is the algorithm:
import math
from decimal import *
getcontext().prec = 100
pi = Decimal(0.0)
C = Decimal(12/(math.sqrt(640320**3)))
k = 0
x = Decimal(0.0)
result = Decimal(0.0)
sign = 1
while k<10:
r = Decimal(math.factorial(6*k)/((math.factorial(k)**3)*math.factorial(3*k)))
s = Decimal((13591409+545140134*k)/((640320**3)**k))
x += Decimal(sign*r*s)
sign = sign*(-1)
k += 1
result = Decimal(C*x)
pi = Decimal(1/result)
print Decimal(pi)
The equations may be clearer without the "decimal" terms.
import math
pi = 0.0
C = 12/(math.sqrt(640320**3))
k = 0
x = 0.0
result = 0.0
sign = 1
while k<10:
r = math.factorial(6*k)/((math.factorial(k)**3)*math.factorial(3*k))
s = (13591409+545140134*k)/((640320**3)**k)
x += sign*r*s
sign = sign*(-1)
k += 1
result = C*x
pi = 1/result
print pi
The issue is with the "s" variable. For k>0, it always comes to zero. e.g. at k=1, s should equal about 2.1e-9, but instead it is just zero. Because of this all of my terms after the first =0. How do I get python to calculate the exact value of s instead of rounding it down to 0?
Try:
s = Decimal((13591409+545140134*k)) / Decimal(((640320**3)**k))
The arithmetic you're doing is native python - by allowing the Decimal object to perform your division, you should eliminate your error.
You can do the same, then, when computing r.
A couple of comments.
If you are using Python 2.x, the / returns an integer result. If you want a Decimal result, you convert at least one side to Decimal first.
math.sqrt() only return ~16 digits of precision. Since your value for C will only be accurate to ~16 digits, your final result will only be accurate to 16 digits.
If you're doing maths in Python 2.x, you should probably be putting this line into every module:
from __future__ import division
This changes the meaning of the division operator so that it will return a floating point number if needed to give a (closer to) precise answer. The historical behaviour is for x / y to return an int if both x and y are ints, which usually forces the answer to be rounded down.
Returning a float if necessary is generally regarded as a better way to handle division in a language like Python where duck typing is encouraged, since you can just worry about the value of your numbers rather than getting different behaviour for different types.
In Python 3 this is in fact the default, but since old programs relied on the historical behaviour of the division operator it was felt the change was too backwards-incompatible to be made in Python 2. This is why you have to explicitly turn it on with the __future__ import. I would recommend always adding that import in any module that might be doing any mathematics (or just any module at all, if you can be bothered). You'll almost never be upset that it's there, but not having it there has been the cause of a number of obscure bugs I've had to chase.
I feel that the problem with 's' is that all terms are integers, thus you are doing integer maths. A very simple workaround, would be to use 3.0 in the denominator. It only takes one float in the calculation to get a float returned.

Floating point modulus problem

I'm having a problem with modulus on a floating point number in Python. This code:
...
print '(' + repr(olddir) + ' + ' + repr(self.colsize) + ') % (math.pi*2) = ' + repr((olddir+self.colsize)
...
Prints:
(6.281876310240881 + 0.001308996938995747) % (math.pi*2) = 2.9043434324194095e-13
I know floating point numbers aren't precise. But I can't get this to make any sense.
I don't know if it is in any way related but Google Calculator can't handle this calculation either. This is the output from Google Calculator:
(6.28187631024 + 0.001308996939) % (pi * 2) = 6.28318531
What is causing this calculation error? And how can I avoid it in my Python program?
Using str() to print a floating point number actually prints a rounded version of the number:
>>> print repr(math.pi)
3.1415926535897931
>>> print str(math.pi)
3.14159265359
So we can't really reproduce your results, since we don't know the exact values you are doing the computation with. Obviously, the exact value of olddir+self.colsize is slightly greater than 2*math.pi, while the sum of the rounded values you used in Google Calculator is slightly less than 2*math.pi.
The difference between str and repr
>>> import scipy
>>> pi = scipy.pi
>>> str(pi)
'3.14159265359'
>>> repr(pi)
'3.1415926535897931'
str truncates floating point numbers to 12 digits, where repr gives the internal representation (as a string).
EDIT: So in summary, the problem arose because you rounded prematurely and are calculating the modulus of something via a number that's very close to it. With floating point numbers, rounding is inevitably involved in converting decimal numbers into binary.
First, do an example of how rounding hurts you with actual math (not floating point math). Look at (3.14+3.14) % (3.14+3.14), which is obviously zero. Now what would happen if we rounded the digits to one decimal digit first on one side? Well (3.1+3.1) % (3.14+3.14) = 6.2 % (6.28) = 6.2 (what google gave you). Or if you did round(3.14159,5) + round(3.14159,5) % (3.14159 + 3.14159) = 6.2832 % 6.28318 = 2e-5.
So in by rounding to N digits (by using str which effectively rounds the numbers), your calculation is only accurate to less than N digits. To have this work going forward force rounding at some higher digit (keeping two calculated digits for safety) is necessary. E.g., str rounds at digit 12, so maybe we should round at digit 10.
>>> round(6.28187631024 + 0.001308996939,10) % (round(pi * 2,10))
0

Categories