Avoiding numerical instability when computing 1/(1+exp(x)) python

Avoiding numerical instability when computing 1/(1+exp(x)) python - python

I would like to compute 1/(1+exp(x)) for (possibly large) x. This is a well behaved function between 0 and 1. I could just do
import numpy as np
1.0/(1.0+np.exp(x))
but in this naive implementation np.exp(x) will likely just return 0 or infinity for large x, depending on the sign. Are there functions available in python that will help me out here?
I am considering implementing a series expansion and series acceleration, but I am wondering if this problem has already been solved.

You can use scipy.special.expit(-x). It will avoid the overflow warnings generated by 1.0/(1.0 + exp(x)).

Fundamentally you are limited by floating point precision. For example, if you are using 64 bit floats:
fmax_64 = np.finfo(np.float64).max # the largest representable 64 bit float
print(np.log(fmax_64))
# 709.782712893
If x is larger than about 709 then you simply won't be able to represent np.exp(x) (or 1. / (1 + np.exp(x))) using a 64 bit float.
You could use an extended precision float (i.e. np.longdouble):
fmax_long = np.finfo(np.longdouble).max
print(np.log(fmax_long))
# 11356.5234063
The precision of np.longdouble may vary depending on your platform - on x86 it is usually 80 bit, which would allow you to work with x values up to about 11356:
func = lambda x: 1. / (1. + np.exp(np.longdouble(x)))
print(func(11356))
# 1.41861159972e-4932
Beyond that you would need to rethink how you're computing your expansion, or else use something like mpmath which supports arbitrary precision arithmetic. However this usually comes at the cost of much worse runtime performance compared with numpy, since vectorization is no longer possible.

Related

Increase float precision

I am developing a machine learning based algorithm on python. The main thing, that I need to calculate to solve this problem is probabilities. This way I have the following code:
class_ans = class_probability[current_class] * lambdas[current_class]
for word in appears_words:
if word in message:
class_ans *= words_probability[(word, current_class)]
else:
class_ans *= (1 - words_probability[(word, current_class)])
ans.append(class_ans)
ans[current_class] /= summ
It works, but in case the dataset is too big or lambdas value is too small, I ran out of my float precision.
I've tryed to research an other algorithm of calculating my answer's value, multimplying and dividing on some random consts different variables to make them not to overflow. Despite this, nothing helped.
This way, I would like to ask, is there any ways to increase my float precision in python?
Thanks!

You cannot. When using serious scientific computation where precision is key (and speed is not), consider the following two options:
Instead of using float, switch your datatype to decimal.Decimal and set your desired precision.
For a more battle-hardened thorough implementation, switch to gmpy2.mpfr as your data type.
However, if your entire computation (or at least the problematic part) involves the multiplication of factors, you can often bypass the need for the above by working in log-space as Konrad Rudolph suggests in the comments:
a * b * c * d * ... = exp(log(a) + log(b) + log(c) + log(d) + ...)

How to prevent inf while working with exponential

I'm trying to create a function in a network with trainable parameters. In my function I have an exponential that for large tensor values goes to infinity. What would the best way to avoid this be?
The function is as follows:
step1 = Pss-(k*Pvv)
step2 = step1*s
step3 = torch.exp(step2)
step4 = torch.log10(1+step3)
step5 = step4/s
#or equivalently
# train_curve = torch.log(1+torch.exp((Pss-k*Pvv)*s))/s
If it makes it easier to understand, the basic function is log10(1+e^(x-const)*10)/10. The exponential inside the log gets too big and goes to inf.
I think I might have to normalize my tensor x, and this would mean normalizing the constants and the rest of the function also. Would someone have any thoughts on the best way to go about this?
Thanks so much.

One solution is to just use a more stable computation. Notice that log(1 + exp(x)) is approximately equal to x when x is large enough. Intuitively this can be observed by noting that, for example, exp(50) is approximately 5.18e+21 for which adding 1 will have no effect when using 32-bit floating point arithmetic like PyTorch does. Further verification using an arbitrary precision calculator shows that the error in this approximation at 50 is far outside the maximum 32-bit floating point precision (which is about 7 decimal digits).
Using this information we can implement a simple piecewise function in PyTorch for which we use log1p(exp(x)) for values less than 50 and x for values greater than 50. Also note that this function is autograd compatible
def log1pexp(x):
# more stable version of log(1 + exp(x))
return torch.where(x < 50, torch.log1p(torch.exp(x)), x)
This get's us most of the way to a solution since you actually want to evaluate torch.log10(1+torch.exp((Pss-k*Pvv)*s))/s
Now we can use our new log1pexp function to compute this expression without worrying about infinities
(log1pexp((Pss - k*Pvv)*s) / math.log(10)) / s
and mind the conversion from natural log to log base-10 by dividing by log(10).

Python Numerical Differentiation and the minimum value for h

I calculate the first derivative using the following code:
def f(x):
f = np.exp(x)
return f
def dfdx(x):
Df = (f(x+h)-f(x-h)) / (2*h)
return Df
For example, for x == 10 this works fine. But when I set h to around 10E-14 or below, Df starts
to get values that are really far away from the expected value f(10) and the relative error between the expected value and Df becomes huge.
Why is that? What is happening here?

The evaluation of f(x) has, at best, a rounding error of |f(x)|*mu where mu is the machine constant of the floating point type. The total error of the central difference formula is thus approximately
2*|f(x)|*mu/(2*h) + |f'''(x)|/6 * h^2
In the present case, the exponential function is equal to all of its derivatives, so that the error is proportional to
mu/h + h^2/6
which has a minimum at h = (3*mu)^(1/3), which for the double format with mu=1e-16 is around h=1e-5.
The precision is increased if instead of 2*h the actual difference (x+h)-(x-h) between the evaluation points is used in the denominator. This can be seen in the following loglog plot of the distance to the exact derivative.

You are probably encountering some numerical instability, as for x = 10 and h =~ 1E-13, the argument for np.exp is very close to 10 whether h is added or subtracted, so small approximation errors in the value of np.exp are scaled significantly by the division with the very small 2 * h.

In addition to the answer by #LutzL I will add some info from a great book Numerical Recipes 3rd Edition: The Art of Scientific Computing from chapter 5.7 about Numerical Derivatives, especially about the choice of optimal h value for given x:
Always choose h so that h and x differ by an exactly representable number. Funny stuff like 1/3 should be avoided, except when x is equal to something along the lines of 14.3333333.
Round-off error is approximately epsilon * |f(x) * h|, where epsilon is floating point accuracy, Python represents floating point numbers with double precision so it's 1e-16. It may differ for more complicated functions (where precision errors arise further), though it's not your case.
Choice of optimal h: Not getting into details it would be sqrt(epsilon) * x for simple forward case, except when your x is near zero (you will find more information in the book), which is your case. You may want to use higher x values in such cases, complementary answer is already provided. In the case of f(x+h) - f(x-h) as in your example it would amount to epsilon ** 1/3 * x, so approximately 5e-6 times x, which choice might be a little difficult in case of small values like yours. Quite close (if one can say so bearing in mind floating point arithmetic...) to practical results posted by #LutzL though.
You may use other derivative formulas, except the symmetric one you are using. You may want to use the forward or backward evaluation(if the function is costly to evaluate and you have calculated f(x) beforehand. If your function is cheap to evaluate, you may want to evaluate it multiple times using higher order methods to make the precision error smaller (see five-point stencil on wikipedia as provided in the comment to your question).

This Python tutorial explains the reason behind the limited precision. In summary, decimals are ultimately represented in binary and the precision is about 17 significant digits. So, you are right that it gets fuzzy beyond 10E-14.

Python using scipy.optimise to find the solution to an equation

I want to solve an equation using scipy.optimise
I want to find the solution, n, for the equation
a**n + b**n = c**n
where
a=2.3
b=2.4
c=2.94
I have a list of triplets (a,b,c) I want to experiment with and I know the range of the exponent n will always be 2.0 < n < 4.0. Could I use this fact to speed up the convergence of the solution.

If your function is scalar, and accepts a scalar (your case), and if you know that:
your solution is in a given interval, and the function is continuous in the same interval (your case)
you are interested in one solution, not necessarily in all (if more than 1) solutions in that interval
You can speed up the solution using the bisection algorithm, implemented here in scipy, which requires the conditions above to guarantee convergence.
The idea behind the algorithm is quite simple, with log convergence.
See this fundamental calculus theorem on which the algorithm is based.
EDIT: I couldn't resist, here you have a MWE
import scipy.optimize as opt
def sol(a,b,c):
f = lambda n : a**n + b**n - c**n
return opt.bisect(f,2,4)
print(sol(2.3,2.4,2.94)
>3.1010655957

As requested in the comments, here's how to do it using mpmath.
We supply the a, b, c parameters as strings rather than as Python floats for maximum accuracy. Converting strings to mpf (mp floats) will be as accurate as the current precision allows. If instead we convert from Python floats then we'd be using numbers that suffer from the imprecision inherent in Python floats.
mp.dps allows us to set the precision in the form of the number of decimal digits.
The mpmath findroot function accepts an initial approximation argument. This can be a single value, or it may be an interval, given as a list or a tuple. It's ok to use Python floats in that interval.
from mpmath import mp
mp.dps = 30
a, b, c = [mp.mpf(u) for u in ('2.3', '2.4', '2.94')]
def f(x):
return a**x + b**x - c**x
x = mp.findroot(f, [2, 4])
print(x, f(x))
output
3.10106559575904097402104750305 -3.15544362088404722164691426113e-30
By default, findroot uses a simple secant solver. The docs recommend using the 'anderson' or 'ridder' solvers when supplying an interval, but for this equation all 3 solvers give identical results.

Python 32/64-bit machine float summing of transposed matrix not correct?

First off, I'm not a math guy, so large number precision rarely filters into my daily work. Please be gentle. ;)
Using NumPy to generate a matrix with values equally divided from 1:
>>> m = numpy.matrix([(1.0 / 1000) for x in xrange(1000)]).T
>>> m
matrix[[ 0.001 ],
[ 0.001 ],
...
[ 0.001 ]])
On 64-bit Windows with Python 2.6, summing rarely works out to 1.0. math.fsum() does with this matrix, it doesn't if I change the matrix to use smaller numbers.
>>> numpy.sum(m)
1.0000000000000007
>>> math.fsum(m)
1.0
>>> sum(m)
matrix([[ 1.]])
>>> float(sum(m))
1.0000000000000007
On 32-bit Linux (Ubuntu) with Python 2.6, summing always works out to 1.0.
>>> numpy.sum(m)
1.0
>>> math.fsum(m)
1.0
>>> sum(m)
matrix([[ 1.]])
>>> float(sum(m))
1.0000000000000007
I can add an epsilon to my code when assessing if the matrix sums to 1 (e.g. -epsilon < sum(m) < +epsilon) but I want to first understand what the cause of the difference is within Python, and if there's a better way to determine the sum correctly.
My understanding is that the sum(s) are processing the machine representation of the numbers (floats) differently than how they're displayed, and when sum'ing, the internal repesentation is used. Howeve,r looking at the 3 methods I used to calculate the sum it's not clear why they're all different, or the same between the platforms.
What's the best way to correctly calculate the sum of a matrix?
If you're looking for a more interesting matrix, this simple change will have smaller matrix numbers:
>>> m = numpy.matrix([(1.0 / 999) for x in xrange(999)]).T
Thanks in advance for any help!
Update
I think I figured something out. If I correct the value being stored to a 32-bit float the results match the 32-bit Linux sum'ing.
>>> m = numpy.matrix([(numpy.float32(1.0) / 1000) for x in xrange(1000)]).T
>>> m
matrix[[ 0.001 ],
[ 0.001 ],
...
[ 0.001 ]])
>>> numpy.sum(m)
1.0
This will set the matrix machine numbers to represent 32-bit floats, not 64-bit on my Windows test, and will sum correctly. Why is a 0.001 float not equal as a machine number on a 32-bit and 64-bit system? I would expect them to be different if I was trying to store very small numbers with lots of decimal places.
Does anyone have any thoughts on this? Should I explicitly switch to 32-bit floats in this case, or is there a 64-bit sum'ing method? Or am I back to adding an epsilon? Sorry if I sound dumb, I'm interested in opinions. Thanks!

It's because you're comparing 32-bit floats to 64-bit floats, as you've already found out.
If you specify a 32-bit or 64-bit dtype on both machines, you'll see the same result.
Numpy's default floating point dtype (the numerical type for a numpy array) is the same as the machine precision. This is why you're seeing different results on different machines.
E.g.
The 32-bit version:
m = numpy.ones(1000, dtype=numpy.float32) / 1000
print repr(m.sum())
and the 64-bit version:
m = numpy.ones(1000, dtype=numpy.float64) / 1000
print repr(m.sum())
Will be different due to the differing precision, but you'll see the same results on different machines. (However, the 64-bit operation will be much slower on a 32-bit machine)
If you just specify numpy.float, this will be either a float32 or a float64 depending on the machine's native architecture.

I'd say that the most accurate way (not the most efficient) is to use the decimal module:
>>> from decimal import Decimal
>>> m = numpy.matrix([(Decimal(1) / 1000) for x in xrange(1000)])
>>> numpy.sum(m)
Decimal('1.000')
>>> numpy.sum(m) == 1.0
True

First, if you use numpy to store values, you should use numpy's methods, if provided, to work with the array/matrix. That is, if you want to trust the extremely capable people that have put numpy together.
Now, the 64-bit answer of numpy's sum() can not sum up to exactly 1 for the reasons how floating point numbers are being handled in computers (murgatroid99 provided you with a link, there are hundred's more out there).
Therefore, the only safe way, (and even very helpful in understanding your mathematical treatment of your code much better, and therefore your problem per se) is to use an epsilon value to cut off at a certain precision.
Why do I think it is helpful? Because computational science needs to deal with errors as much as experimental science does and by deliberately dealing (meaning: determining them) with errors at this place, you already have done the first step in dealing with the computational errors of your code.
So, there maybe other ways to deal with it, but most of the time, I would use an epsilon to determine the precision I require for a given problem.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Avoiding numerical instability when computing 1/(1+exp(x)) python - python

You can use scipy.special.expit(-x). It will avoid the overflow warnings generated by 1.0/(1.0 + exp(x)).

Related

Increase float precision

How to prevent inf while working with exponential

Python Numerical Differentiation and the minimum value for h

Python using scipy.optimise to find the solution to an equation

Python 32/64-bit machine float summing of transposed matrix not correct?

Categories

Resources