Avoiding overflow error for exp in numpy - python

I am implementing the following function in numpy:
def weak_softmax(a):
b=np.exp(a)
return b/(1+np.sum(b))
The size of array a is small but the entries can sometimes be big, maybe as large as 1000. So I am receiving the following error very often because of the overflow in exponential function:
a=np.array([1000,1000])
a=weak_softmax(a)
The above code return the vector a=[nan nan] and raises the following warning:
Warning: overflow encountered in exp
Is there any clever way to avoid this issue but still returning the array b as intended? This is because all the entries of bare less than one only and I feel that it must be possible to avoid this issue using some trick.

You can simply divide the numerator and denominator by the same factor exp(c) for suitably sized c.
The following code uses np.finfo to check whether overflow may happen and to calculate c.
def modified_soft_max(a, SAFETY=2.0):
mrn = np.finfo(a.dtype).max # largest representable number
thr = np.log(mrn / a.size) - SAFETY
amx = a.max()
if(amx > thr):
b = np.exp(a - (amx-thr))
return b / (np.exp(thr-amx) + b.sum())
else:
b = np.exp(a)
return b / (1.0 + b.sum())

Related

Handling operations with infinities in python

I have a piece of code that does a simple calculation.
import numpy as np
#Constants
R = 8.314462
T = 298.15
e = -678.692
e_overkbT = e*1000/(R*T)
#Independent variable
mu = np.linspace(-2000,2000,1000)
mu_overkbT = mu*1000/(R*T)
#Calculation
aa = (np.exp(mu_overkbT- e_overkbT))
theta = aa/(1+aa)
For negative values of 'mu', 'aa' is very small and thus the variable "theta" is very close to 0. For positive values of 'mu', 'aa' is very large. Thus for large numbers 'theta' approaches 1. (large number over large number + 1).
For large values of 'aa' python rounds 'theta' to be 1, which is fine. However, eventually for large enough numbers python will say 'aa' is 'inf'. Thus in the final step of calculating 'theta' I encounter a runtime error of dividing 'inf'/'inf'.
I need someway to handle this error such that it gives me '1' as the result for 'theta'. I can't reduce the range of the variable 'mu' and stop before the error, because this calculation is inside of a large function that changes the value of 'e', and thus this error does not always occur at the same spot.
Thanks.
Such overflow happens very often when using the exponential function on large terms. Other than the good very good comment noting that exp(x)/(1+exp(x)) = 1/(1+exp(-x)), another general approach in case you don't find easy transformations is to use the logarithm to make intermediary numbers more manageable, and then in the end to reverse this operation. This is especially true with products of many large (or very small) terms, which by applying the logarithm become a simple sum.
If you don't mind a dependency on SciPy, you can replace
aa = (np.exp(mu_overkbT- e_overkbT))
theta = aa/(1+aa)
with
from scipy.special import expit
theta = expit(mu_overkbT- e_overkbT)
expit is an implementation of the logistic sigmoid function. It handles very large positive and negative numbers correctly. Note that 1/(1 + np.exp(-x)) will generate a warning for large negative values (but it still correctly returns 0):
In [148]: x = -1500
In [149]: 1/(1 + np.exp(-x))
<ipython-input-149-0afe09c93af3>:1: RuntimeWarning: overflow encountered in exp
1/(1 + np.exp(-x))
Out[149]: 0.0

Avoiding overflow in log(cosh(x))

My simulation needs to implement
np.log(np.cosh(x))
This overflows for large x, i. e. I'm getting the RuntimeWarning: overflow encountered in cosh warning. In principle, as logarithm decreases the number in question, in some range of x, cosh should overflow while log(cosh()) should not.
Is there any solution for that in NumPy, for example similar in spirit to the np.log1p() function?
To provide more info: I am aware that a possible solution might be symbolic using SymPy
https://github.com/sympy/sympy/issues/12671
however the simulation should be fast, and symbolic calculation AFAIK might slow it down significantly.
The following implementation of log(cosh(x)) should be numerically stable:
import numpy as np
def logcosh(x):
# s always has real part >= 0
s = np.sign(x) * x
p = np.exp(-2 * s)
return s + np.log1p(p) - np.log(2)
Explanation:
For real values you could use the following identity:
log(cosh(x)) = logaddexp(x, -x) - log(2)
= abs(x) + log1p(exp(-2 * abs(x))) - log(2)
which is numerically stable because the argument to exp is always non-positive. For complex numbers we instead require that the argument to exp has non-positive real part, which we achieve by using -x when real(x) > 0 and x otherwise.

np.exp overflow workaround

I have the following equation:
result = (A * np.exp(b * (t - t0))) / (1 + np.exp(c * (t - t0)))
I feed in an array of t values to get results out. A, b, c, t0 are all constants (b and c are very large, t0 is small but not as small as b and c are large). The problem is, I run into an overflow error because the exponential value quickly gets much too large to fit into a float64 beyond a certain range of t. I'm trying to find a workaround to this while still maintaining a decent level of precision. The result value is well within the range of a float64 container, however the overly large intermediate values of the np.exp calculation prevent me from getting as far as the result.
Some thoughts I had:
Scale down the t input to be able to get the desired range of values, and then de-scale the output so the result is correct
Convert the exponential to a log function
However I'm not sure how to implement either of these ideas, or if they would actually work.
Essential this problem boils down to result = np.exp(a) / np.exp(b), where a and b are in the range of 100-1000. np.exp(709) results in 8.2e307, right at the limit of a float64, but I have larger values that need to feed into it. While the comparison of the two exponentials produces a reasonable value, the exponentials themselves are too large to be calculated.
keeping everything in the log scale is the common solution to this sort of thing. at least that's what we do in statistics where you're often down in the 1e-10000 range, especially at the start before you're any where near convergence. for example, all the scipy probability density functions have logpdf variants which work in the log scale.
I think your expression would be rewritten something like:
d = t - t0
log_result = (np.log(A) + (b * d)) - np.logaddexp(0, c * d)
(untested)

Why am I getting errors using the Lambda function?

I am messing around with the lambda function and I understand what I can do with it in a simple fashion, but when I try something more advanced I am running into errors and I don't see why.
Here is what I am trying if you can tell me where I am going wrong it would be appricated.
import math
C = lambda n,k: math.factorial(n)/(math.factorial(k))(math.factorial(n-k))
print C(10,5)
I should note that I am running into errors trying to run the code on Codepad. I do not have access to Idle.
Try this:
from math import factorial
from __future__ import division
C = lambda n, k : factorial(n) / factorial(k) * factorial(n-k)
print C(10,5)
> 3628800.0
You were missing a *, and also it's possible that the division should take in consideration decimals, so the old division operator / won't do. That's why I'm importing the new / operator, which performs decimal division.
UPDATE:
Well, after all it seems like it's Codepad's fault - it supports Python 2.5.1, and factorial was added in Python 2.6. Just implement your own factorial function and be done with it, or even better start using a real Python interpreter.
def factorial(n):
fac = 1
for i in xrange(1, n+1):
fac *= i
return fac
I think you're missing a * between the second 2 factorial clauses. You're getting an error because you're trying to run (math.factorial(k))(math.factorial(n-k)), which turns into something like 10(math.factorial(n-k), which makes no sense.
Presumably the value you wish to compute is “n-choose-k”, the number of combinations of n things taken k at a time. The formula for that is n!/(k! * (n-k)!). When the missing * is added to your calculation, it produces n!/k! * (n-k)!, which equals (n!/k!)*(n-k)!. (Note, k! evenly divides n!.) For example, with n=10 and k=5, C(10,5) should be 3628800/(120*120) = 252, but your calculation would give 3628800/120*120 = 3628800, which is incorrect by a factor of 14400.
You can of course fix the parenthesization:
>>> C = lambda n,k: math.factorial(n)/(math.factorial(k)*math.factorial(n-k))
>>> C(10,5)
252
But note that if math.factorial(j) takes j-1 multiplications to calculate, then C(n,k) takes n-1+k-1+n-k-1+1 = 2*n-2 multiplications and one division. That's about four times as many multiply operations as necessary. The code shown below uses j multiplies and j divides, where j is the smaller of k and n-k, so j is at most n/2. On some machines division is much slower than multiplication, but on most machines j multiplies and j divides will run a lot faster than 2*n-2 multiplications and one division.
More importantly, C(n,k) is far smaller than n!. Computing via the n!/(k!*(n-k)!) formula requires more than 64-bit precision whenever n exceeds 20. For example, C(21,1) returns the value 21L. By contrast, the code below computes up to D(61,30)=232714176627630544 before requiring more than 64 bits to compute D(62,31)=465428353255261088L. (I named the function below “D” instead of “C” to avoid name clash.)
For small computations on big fast machines, the extra multiplies and extra precision requirements are unimportant. However, for big computations on small machines, they become important.
In short, the order of multiplications and divisions in D() keeps the maximum intermediate values that appear minimal. The largest values appear in the last pass of the for loop. Also note that in the for loop, i is always an exact divisor of c*j and no truncation occurs. This is a fairly standard algorithm for computing “n-choose-k”.
def D(n, k):
c, j, k = 1, n, min(k,n-k)
for i in range(1,k+1):
c, j = c*j/i, j-1
return c
Results from interpreter:
>>> D(10,5)
252
>>> D(61,30)
232714176627630544
>>> D(62,31)
465428353255261088L

mrdivide function in MATLAB: what is it doing, and how can I do it in Python?

I have this line of MATLAB code:
a/b
I am using these inputs:
a = [1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9]
b = ones(25, 18)
This is the result (a 1x25 matrix):
[5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
What is MATLAB doing? I am trying to duplicate this behavior in Python, and the mrdivide documentation in MATLAB was unhelpful. Where does the 5 come from, and why are the rest of the values 0?
I have tried this with other inputs and receive similar results, usually just a different first element and zeros filling the remainder of the matrix. In Python when I use linalg.lstsq(b.T,a.T), all of the values in the first matrix returned (i.e. not the singular one) are 0.2. I have already tried right division in Python and it gives something completely off with the wrong dimensions.
I understand what a least square approximation is, I just need to know what mrdivide is doing.
Related:
Array division- translating from MATLAB to Python
MRDIVIDE or the / operator actually solves the xb = a linear system, as opposed to MLDIVIDE or the \ operator which will solve the system bx = a.
To solve a system xb = a with a non-symmetric, non-invertible matrix b, you can either rely on mridivide(), which is done via factorization of b with Gauss elimination, or pinv(), which is done via Singular Value Decomposition, and zero-ing of the singular values below a (default) tolerance level.
Here is the difference (for the case of mldivide): What is the difference between PINV and MLDIVIDE when I solve A*x=b?
When the system is overdetermined, both algorithms provide the
same answer. When the system is underdetermined, PINV will return the
solution x, that has the minimum norm (min NORM(x)). MLDIVIDE will
pick the solution with least number of non-zero elements.
In your example:
% solve xb = a
a = [1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9];
b = ones(25, 18);
the system is underdetermined, and the two different solutions will be:
x1 = a/b; % MRDIVIDE: sparsest solution (min L0 norm)
x2 = a*pinv(b); % PINV: minimum norm solution (min L2)
>> x1 = a/b
Warning: Rank deficient, rank = 1, tol = 2.3551e-014.
ans =
5.0000 0 0 ... 0
>> x2 = a*pinv(b)
ans =
0.2 0.2 0.2 ... 0.2
In both cases the approximation error of xb-a is non-negligible (non-exact solution) and the same, i.e. norm(x1*b-a) and norm(x2*b-a) will return the same result.
What is MATLAB doing?
A great break-down of the algorithms (and checks on properties) invoked by the '\' operator, depending upon the structure of matrix b is given in this post in scicomp.stackexchange.com. I am assuming similar options apply for the / operator.
For your example, MATLAB is most probably doing a Gaussian elimination, giving the sparsest solution amongst a infinitude (that's where the 5 comes from).
What is Python doing?
Python, in linalg.lstsq uses pseudo-inverse/SVD, as demonstrated above (that's why you get a vector of 0.2's). In effect, the following will both give you the same result as MATLAB's pinv():
from numpy import *
a = array([1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9])
b = ones((25, 18))
# xb = a: solve b.T x.T = a.T instead
x2 = linalg.lstsq(b.T, a.T)[0]
x2 = dot(a, linalg.pinv(b))
TL;DR: A/B = np.linalg.solve(B.conj().T, A.conj().T).conj().T
I did not find the earlier answers to create a satisfactory substitute, so I dug into Matlab's reference documents for mrdivide further and found the solution. I cannot explain the actual mathematics here or take credit for coming up with the answer. I'm just following Matlab's explanation. Additionally, I wanted to post the actual detail from Matlab to give credit. If it's a copyright issue, someone tell me and I'll remove the actual text.
%/ Slash or right matrix divide.
% A/B is the matrix division of B into A, which is roughly the
% same as A*INV(B) , except it is computed in a different way.
% More precisely, A/B = (B'\A')'. See MLDIVIDE for details.
%
% C = MRDIVIDE(A,B) is called for the syntax 'A / B' when A or B is an
% object.
%
% See also MLDIVIDE, RDIVIDE, LDIVIDE.
% Copyright 1984-2005 The MathWorks, Inc.
Note that the ' symbol indicates the complex conjugate transpose. In python using numpy, that requires .conj().T chained together.
Per this handy "cheat sheet" of numpy for matlab users, linalg.lstsq(b,a) -- linalg is numpy.linalg.linalg, a light-weight version of the full scipy.linalg.
a/b finds the least square solution to the system of linear equations bx = a
if b is invertible, this is a*inv(b), but if it isn't, the it is the x which minimises norm(bx-a)
You can read more about least squares on wikipedia.
according to matlab documentation, mrdivide will return at most k non-zero values, where k is the computed rank of b. my guess is that matlab in your case solves the least squares problem given by replacing b by b(:1) (which has the same rank). In this case the moore-penrose inverse b2 = b(1,:); inv(b2*b2')*b2*a' is defined and gives the same answer

Categories