So, I'm having some precision issues in Python.
I would like to calculate functions like this:
P(x,y) = exp(-x)/(exp(-x) + exp(-y))
Where x and y might be >1000. Python's math.exp(-1000) (in 2.6 at least!) doesn't have enough floating point precision to handle this.
this form looks like logistic / logit / log-odds, but it's not, right? Is there some algebraic simplification I'm missing here?
I know about Decimal, but am not sure if it applies here
looks like homework, but it's not, I promise!
(Also, I'm open to titles! I couldn't think of a good one for this question!)
you could divide the top and bottom by exp(-x)
P(x,y) = 1/(1 + exp(x-y))
>>> import decimal
>>> decimal.Decimal(-1000).exp()
Decimal('5.075958897549456765291809480E-435')
>>> decimal.getcontext().prec = 60
>>> decimal.Decimal(-1000).exp()
Decimal('5.07595889754945676529180947957433691930559928289283736183239E-435')
P(x,y) = exp(-x)/(exp(-x) + exp(-y))
is equivalent to:
P(x,y) = 1 / (1 + exp(x-y))
Perhaps the second one works without the use of more precision.
I think you are looking for the bigfloat package for arbitrary precision floating-point reliable arithmetic.
Related
I would like to check if a float is a multiple of another float, but am running into issues with machine precision. For example:
t1 = 0.02
factor = 0.01
print(t1%factor==0)
The above outputs True, but
t2 = 0.030000000000000002
print(round(t2,5)%factor==0)
This outputs False. At some points in my code the number I am checking develops these machine precision errors, and I thought I could fix the issue simply by rounding it (I need 5 decimal places for later in my code, but it also doesn't work if I just round it to 2 decimal places).
Any ideas why the above check round(t2,5)%factor==0 doesn't work as expected, and how I can fix it?
It doesn't work as expected because checking floats for equality almost never works as expected. A quick fix would be to use math.isclose. This allows you to adjust your tolerance as well. Remember that when doing arithmetic mod r, r is equivalent to 0, so you should check if you're close to 0 or r.
import math
t1 = 0.02
factor = 0.01
res = t1 % factor
print(math.isclose(res, 0) or math.isclose(res, factor))
This is pretty quick and dirty and you will want to make sure your tolerances are working correctly and equivalently for both of those checks.
You should use the decimal module. The decimal module provides support for fast correctly-rounded decimal floating point arithmetic.
import decimal
print( decimal.Decimal('0.03') % decimal.Decimal('0.01') == decimal.Decimal('0') )
Gives :
True
Generally, floats in Python are... messed up, for the lack of a better word. And they can act in very unexpected ways. (You can read more about that behaviour here.)
For your goal however, a better way is this:
t2 = 0.03000003
factor = 0.01
precision = 10000 # 4 digits
print(int(t2*precision)%int(factor*precision)==0)
Moving the maths to an integer-based calculation solves most of those issues.
I am trying to compute math.tan(0.000000001) and I am getting 0.00000001
>>> math.tan(0.00000001) == 0.00000001
True
Is this due to how math.tan is implemented? Does it use small-angle approximation?
Where can I get more documentation about this
One way to go would be, by analogy with numpy.expm1, to implement a function that computes tan(x)-x in double precision.
While a production quality version of that night be tricky, here is a simple version, that should give accurate answers for |x| < 1e-6
tan(x)-x = sin(x)/cos(x) - x = (sin(x)-x*cos(x))/cos(x)
for such small x we can write, to better than double precision
sin(x) = x - x*x*x/6 + x*x*x*x*x/120
cos(x) = 1 - x*x/2 + x*x*x*x/24
Substituting these we get
tan(x)-x = x*x*x*(1.0/3 - (1.0/30)*x*x)/cos(x)
There's nothing special about this. Python's float only has limited precision, which we can explore with numpy:
0.000000010000000000000000209226 # np.tan(0.00000001)
0.000000009999999999999998554864 # np.nextafter(np.tan(0.00000001), -1)
0.000000010000000000000001863587 # np.nextafter(np.tan(0.00000001), 1)
0.000000010000000000000000333... # True value
From this we can see that 0.000000010000000000000000209226 is the closest representation to the true value, but also that it's safe to round-trip this to 0.00000001, thus Python chooses to print it that way.
This is not a duplicate of this, I'll explain here.
Consider x = 1.2. I'd like to separate it out into 1 and 0.2. I've tried all these methods as outlined in the linked question:
In [370]: x = 1.2
In [371]: divmod(x, 1)
Out[371]: (1.0, 0.19999999999999996)
In [372]: math.modf(x)
Out[372]: (0.19999999999999996, 1.0)
In [373]: x - int(x)
Out[373]: 0.19999999999999996
In [374]: x - int(str(x).split('.')[0])
Out[374]: 0.19999999999999996
Nothing I try gives me exactly 1 and 0.2.
Is there any way to reliably convert a floating number to its decimal and floating point equivalents that is not hindered by the limitation of floating point representation?
I understand this might be due to the limitation of how the number is itself stored, so I'm open to any suggestion (like a package or otherwise) that overcomes this.
Edit: Would prefer a way that didn't involve string manipulation, if possible.
Solution
It may seem like a hack, but you could separate the string form (actually repr) and convert it back to ints and floats:
In [1]: x = 1.2
In [2]: s = repr(x)
In [3]: p, q = s.split('.')
In [4]: int(p)
Out[4]: 1
In [5]: float('.' + q)
Out[5]: 0.2
How it works
The reason for approaching it this way is that the internal algorithm for displaying 1.2 is very sophisticated (a fast variant of David Gay's algorithm). It works hard to show the shortest of the possible representations of numbers that cannot be represented exactly. By splitting the repr form, you're taking advantage of that algorithm.
Internally, the value entered as 1.2 is stored as the binary fraction, 5404319552844595 / 4503599627370496 which is actually equal to 1.1999999999999999555910790149937383830547332763671875. The Gay algorithm is used to display this as the string 1.2. The split then reliably extracts the integer portion.
In [6]: from decimal import Decimal
In [7]: Decimal(1.2)
Out[7]: Decimal('1.1999999999999999555910790149937383830547332763671875')
In [8]: (1.2).as_integer_ratio()
Out[8]: (5404319552844595, 4503599627370496)
Rationale and problem analysis
As stated, your problem roughly translates to "I want to split the integral and fractional parts of the number as it appears visually rather that according to how it is actually stored".
Framed that way, it is clear that the solution involves parsing how it is displayed visually. While it make feel like a hack, this is the most direct way to take advantage of the very sophisticated display algorithms and actually match what you see.
This way may the only reliable way to match what you see unless you manually reproduce the internal display algorithms.
Failure of alternatives
If you want to stay in realm of integers, you could try rounding and subtraction but that would give you an unexpected value for the floating point portion:
In [9]: round(x)
Out[9]: 1.0
In [10]: x - round(x)
Out[10]: 0.19999999999999996
Here is a solution without string manipulation (frac_digits is the count of decimal digits that you can guarantee the fractional part of your numbers will fit into):
>>> def integer_and_fraction(x, frac_digits=3):
... i = int(x)
... c = 10**frac_digits
... f = round(x*c-i*c)/c
... return (i, f)
...
>>> integer_and_fraction(1.2)
(1, 0.2)
>>> integer_and_fraction(1.2, 1)
(1, 0.2)
>>> integer_and_fraction(1.2, 2)
(1, 0.2)
>>> integer_and_fraction(1.2, 5)
(1, 0.2)
>>>
You could try converting 1.2 to string, splitting on the '.' and then converting the two strings ("1" and "2") back to the format you want.
Additionally padding the second portion with a '0.' will give you a nice format.
So I just did the following in a python terminal and it seemed to work properly...
x=1.2
s=str(x).split('.')
i=int(s[0])
d=int(s[1])/10
The following simple code:
from decimal import getcontext
from decimal import *
import math
context = getcontext()
context.prec = 300
def f(x):
return Decimal(math.atan(10**(-x+1)))
def xNext(x,y):
return x-y*f(2)
def yNext(x,y):
return y+x*f(2)
x= Decimal(1)
y = Decimal(0)
x=xNext(x,y)
y=yNext(x,y)
x=xNext(x,y)
y=yNext(x,y)
x=xNext(x,y)
y=yNext(x,y)
print("{:.16f}".format(x))
print("{:.16f}".format(y))
returns
0.9702971603146833
0.2950554229911823
Which is wrong, should be around 0.97019857 and 0.2980158649
I thought this was a rounding error but this code should be working to 300 decimal places.
Not sure if different problem or not really going to 300 places...
EDIT: Yeah, I doubt it's a rounding error, I've just done the same process on wolfram only to around 20 decimal places at a time and my answer's more accurate than this one.
Decimal doesn't extend your precision, because you use the math module. But that's not the point. Are you sure you calculation is correct? Just tried:
x, y = 1, 0
x, y = xNext(x,y), yNext(x,y)
x, y = xNext(x,y), yNext(x,y)
x, y = xNext(x,y), yNext(x,y)
And it leads to
0.970198479132
0.298015864998
which is basically your expected result.
I think that the problem lies here :
return Decimal(math.atan(10**(-x+1)))
I would imagine that ALL of the calculations in that formula (especially the math.atan function) will be calculated as a normal precision floating point number - and then converted back to a 300 decimal point Decimal.
If you want 300 point precision, you MUST find a way to ensure that every calculation is executed to that level of precision or better, as your result will only be as precise as your LEAST precise calculation.
This question already has answers here:
Python cos(90) and cos(270) not 0
(3 answers)
Closed 9 years ago.
Is there a way to get the exact Tangent/Cosine/Sine of an angle (in radians)?
math.tan()/math.sin()/math.cos() does not give the exact for some angles:
>>> from math import *
>>> from decimal import Decimal
>>> sin(pi) # should be 0
1.2246467991473532e-16
>>> sin(2*pi) # should be 0
-2.4492935982947064e-16
>>> cos(pi/2) # should be 0
6.123233995736766e-17
>>> cos(3*pi/2) # 0
-1.8369701987210297e-16
>>> tan(pi/2) # invalid; tan(pi/2) is undefined
1.633123935319537e+16
>>> tan(3*pi/2) # also undefined
5443746451065123.0
>>> tan(2*pi) # 0
-2.4492935982947064e-16
>>> tan(pi) # 0
-1.2246467991473532e-16
I tried using Decimal(), but this does not help either:
>>> tan(Decimal(pi)*2)
-2.4492935982947064e-16
numpy.sin(x) and the other trigonometric functions also have the same issue.
Alternatively, I could always create a new function with a dictionary of values such as:
def new_sin(x):
sin_values = {math.pi: 0, 2*math.pi: 0}
return sin_values[x] if x in sin_values.keys() else math.sin(x)
However, this seems like a cheap way to get around it. Is there any other way? Thanks!
It is impossible to store the exact numerical value of pi in a computer. math.pi is the closest approximation to pi that can be stored in a Python float. math.sin(math.pi) returns the correct result for the approximate input.
To avoid this, you need to use a library that supports symbolic arithmetic. For example, with sympy:
>>> from sympy import *
>>> sin(pi)
0
>>> pi
pi
>>>
sympy will operate on an object that represents pi and can give exact results.
When you're dealing with inexact numbers, you need to deal with error explicitly. math.pi (or numpy.pi) isn't exactly π, it's, e.g., the closest binary rational number in 56 digits to π. And the sin of that number is not 0.
But it is very close to 0. And likewise, tan(pi/2) is not infinity (or NaN), but huge, and asin(1)/pi is very close to 0.5.
So, even if the algorithms were somehow exact, the results still wouldn't be exact.
If you've never read What Every Computer Scientist Should Know About Floating-Point Arithmetic, you should do so now.
The way to deal with this is to use epsilon-comparisons rather than exact comparisons everywhere, and explicitly round things when printing them out, and so on.
Using decimal.Decimal numbers instead of float numbers makes this easier. First, you probably think in decimal rather than binary, so it's easier for you to understand and make decisions about the error. Second, you can explicitly set precision and other context information on Decimal values, while float are always IEEE double values.
The right way to do it is to do full error analysis on your algorithms, propagate the errors appropriately, and use that information where it's needed. The simple way is to just pick some explicit absolute or relative epsilon (and the equivalent for infinity) that's "good enough" for your application, and use that everywhere. (You'll probably also want to use the appropriate domain-specific knowledge to treat some values as multiples of pi instead of just raw values.)