Avoid underflow using exp and minimum positive float128 in numpy

Avoid underflow using exp and minimum positive float128 in numpy - python

I am trying to calculate the following ratio:
w(i) / (sum(w(j)) where w are updated using an exponential decreasing function, i.e. w(i) = w(i) * exp(-k), k being a positive parameter. All the numbers are non-negative.
This ratio is then used to a formula (multiply with a constant and add another constant). As expected, I soon run into underflow problems.
I guess this happens often but can someone give me some references on how to deal with this? I did not find an appropriate transformation so one thing I tried to do is set some minimum positive number as a safety threshold but I did not manage to find which is the minimum positive float (I am representing numbers in numpy.float128). How can I actually get the minimum positive such number on my machine?
The code looks like this:
w = np.ones(n, dtype='float128')
lt = np.ones(n)
for t in range(T):
p = (1-k) * w / w.sum() + (k/n)
# Process a subset of the n elements, call it set I, j is some range()
for i in I:
s = p[list(j[i])].sum()
lt /= s
w[s] *= np.exp(-k * lt)
where k is some constant in (0,1) and n is the length of the array

When working with exponentially small numbers it's usually better to work in log space. For example, log(w*exp(-k)) = log(w) - k, which won't have any over/underflow problems unless k is itself exponentially large or w is zero. And, if w is zero, numpy will correctly return -inf. Then, when doing the sum, you factor out the largest term:
log_w = np.log(w) - k
max_log_w = np.max(log_w)
# Individual terms in the following may underflow, but then they wouldn't
# contribute to the sum anyways.
log_sum_w = max_log_w + np.log(np.sum(np.exp(log_w - max_log_w)))
log_ratio = log_w - log_sum_w
This probably isn't exactly what you want since you could just factor out the k completely (assuming it's a constant and not an array), but it should get you on your way.
Scikit-learn implements a similar thing with extmath.logsumexp, but it's basically the same as the above.

Related

Using Python to use Euler's formula into matrix approximation

I am trying to use Python and NumPy to use Euler’s formula e^i(π) represented as a matrix as e^A where
A = [0 -π]
[π 0]
,and then apply it to the Maclaurin series for an exponential function e^x as
SUMMATION(n=0, infinity) x^n/n! = 1 + x + x^2/2! + x^3/3! +...
So I am trying to compute an approximation matrix S^N+1 and print the matrix and it's four entries.
I have tried emulating euler's and maclaurin's series, which i think the final approximation matrix for this will be when N = 20, but currently my values do not add up. I am also trying to use np.linalg.norm to compute a 2 norm as well.
import math
import numpy as np
n = 0
A = np.eye(2)
A = math.pi * np.rot90(A)
A[0,1] = -A[0,1]
A
mac_series = 0
while n < 120:
print(n)
n += 1
mac_series = (A**n) / (math.factorial(n))
print("\n",mac_series)
np.linalg.norm(mac_series)

The main problem here is that you are confusing A**3 with A#A#A.
Just look at case n=0.
A**0
#array([[1., 1.],
# [1., 1.]])
I am pretty sure, you were expecting A⁰ to be identity (that is only that way that this thinking of x+iy ⇔ np.array([[x,-y],[y,x]]) makes sense)
In numpy, you have np.linalg.matrix_power for that (or you could just accumulate power your self)
sum(np.linalg.matrix_power(A,i) / math.factorial(i) for i in range(20))
is
array([[-1.00000000e+00, 5.28918267e-10],
[-5.28918267e-10, -1.00000000e+00]])
for example. Pretty sure that is what you were expecting (that is the matrix that represents real -1 using the same logic. And whole point of Euler identity is e^(iπ) = -1).
By comparison,
sum(A**i / math.factorial(i) for i in range(20))
returns
array([[ 1. , 0.04321392],
[23.14069263, 1. ]])
Which is just the maclaurin series computed for all four elements of the matrix. In other words, since your matrix is [[0,-π],[π,0]], you are evaluating using a MacLauring series [[e⁰, exp(-π)], [exp(π), e⁰]]. And it works. e⁰=1, obviously. exp(π) is 23.140692632779267, so we got a very good approximation in our result. And exp(-π) is the inverse, 0.04321391826377226. We also got a good approximation.
So it works. Just not at all to do what you obviously intend to do: prove Euler identity's in matrix form; compute exp(iπ) not just exp(π).
Without matrix_power, and with a code closer to your initial code, you could
n=0
mac_series = 0
Apowern=np.eye(2) # A⁰=Id for now
while n < 20:
print(n)
mac_series += Apowern / (math.factorial(n))
Apowern = Apowern # A # # is the matrix multiplication operator
n+=1
Note that I've also moved n+=1 which was misplaced in your code. You were stacking Aⁿ⁺¹/(n+1)! not Aⁿ/n! with your code (in other words, your sum misses the A⁰/0!=Id term).
With this, I get the expected result
>>> mac_series
array([[-1.00000000e+00, 5.28918724e-10],
[-5.28918724e-10, -1.00000000e+00]])
Last problem, more subtle: you may have noticed that I do only 20 iterations, not 120. That is because after 20, you start to have a numerical problem. Apowern (or np.linalg.matrix_power(A,n), it is the same problem for both methods) becomes to big. Since it is divided by n! in the stacking, that doesn't prevent convergence. But it does prevent numeric convergence. And, in practice, after a while, numpy change the type of Apowern.
So, we should not have big matrix divided by big number, and try to iterate things that stay small enough. Like this for example
n=0
mac_series = 0
NthTerm=np.eye(2) # Aⁿ/n!. A⁰/0!=Id for now
while n < 120: # 120 is no longer a problem
print(n)
mac_series += NthTerm
n += 1
NthTerm = (NthTerm # A) / n # so if nthterm was
# Aⁿ/n!, now it becomes Aⁿ/n! # A/(n+1) = Aⁿ⁺¹/(n+1)!
Result
>>> mac_series
array([[-1.00000000e+00, -2.34844612e-16],
[ 2.34844612e-16, -1.00000000e+00]])
tl;dr
You have 4 problems
The one already mentioned by Roy: you are not accumulating the Aⁿ/n!, just replacing them, and eventually keeping only the last. In other words, you need a += instead of =
A**n is not Aⁿ. It is just A, with all the elements to the power n. Said otherwise [[x,-y],[y,x]]**n is not [[x,-y],[y,x]]ⁿ it is [[xⁿ,(-y)ⁿ],[yⁿ,xⁿ]]. So you'll end up computing [[e⁰, 1/e^π], [e^π, e⁰]] ≈ [[1, 0.0432], [23.14, 1]] which is irrelevant.
n+=1 is misplaced
The numerical problem due to Aⁿ becoming huge (even if you intend to divide it by a even huger n!, so it does not theoretically/mathematically pose a problem, but numerically it does, since intermediate result is to big for computer)

How to determine a proportionately decreasing weight given a list of N elements?

I have a list composed of N elements. For context, I am making a time series forecast, and - once the forecasts have been made - would like to weight the forecasts made at the beginning as more important than the later forecasts. This is useful because when I calculate performance error scores (MAPE), this score will be representative of both the forecasts per item, as well as based on how I want to identify good vs. bad models.
How should I update my existing function in order to take any list of elements (N) in order to generate these steadily decreasing weights?
Here is the function that I have come up with on my own. It works for examples like compute_equal_perc(5), but not for other combinations...
def compute_equal_perc(rng):
perc_allocation = []
equal_perc = 1 / rng
half_rng = rng / 2
step_val = equal_perc / (rng - 1)
print(step_val)
for x in [v for v in range(0, rng)]:
if x == int(half_rng):
perc_allocation.append(equal_perc)
elif x < int(half_rng):
diff_plus = ((abs(int(half_rng) - x) * step_val)) + equal_perc
perc_allocation.append(round(float(diff_plus), 3))
elif x >= int(half_rng):
diff_minus = equal_perc - ((abs(int(half_rng) - x) * step_val))
perc_allocation.append(round(float(diff_minus), 3))
return perc_allocation
For compute_equal_perc(5), the output that I get is:
[0.3, 0.25, 0.2, 0.15, 0.1]
The sum of this sequence should always equal 1, and the increments between values should always be equal.

This can be solved through the application of basic algebra. An arithmetic sequence is defined as
A[i] = a + b*i, for i = 0, 1, 2, 3, ... where a is the initial term
The sum of a sequence of elements 0 through n is
S = (A[0] + A[n]) * (n+1) / 2
in words, sum of the first and list terms, times half the number of terms.
Since you know S and n, you need only decide one more "spread" factor to generate your sequence. The mean element must be 1/n -- this is where your algorithm is wrong, as it fumbles this computation for even values of n.
Your code fails in this coupling of statements:
half_rng = rng / 2
step_val = equal_perc / (rng - 1)
# comparing x to int(half_rng)
If rng is even, you assign the mean value to position rng/2, giving you something such as the list for 4 elements:
[0.417, 0.333, 0.25, 0.167]
This means that you have two elements larger than the desired mean, and only one smaller, forcing the sum over 1.0. Instead, when you have an even quantity of elements, you have to make the mean a "phantom" middle element, and take half-steps around it. Let's look at this with fractions: you already have
[5/12, 4/12, 3/12, 2/12]
Your difference is 1/12 ... 1 / (n * (n-1)) ... and you need to shift these values lower by half a step. Instead, the solution with the spread you've chosen (1/12) would be starting a half-step to the side: subtract 1/24 from each element.
[9/24, 7/24, 5/24, 3/24]
You could also change your step with a simple linear factor. Decide on the ratio you want for your elements in simple integers, such as 5:4:3:2, and then generate your weights from the obvious sum of 5+4+3+2:
[5/14, 4/14, 3/14, 2/14]
Note that this works with any arithmetic sequence of integers, another way of choosing your "spread". If you use 4:3:2:1 you get
[4/10, 3/10, 2/10, 1/10]
or you can cluster them more closely with, say, 13:12:11:10
[13/46, 12/46, 11/46, 10/46]
So ... pick the spread you want and simplify your code to take advantage of that.

given x, compute minimum k for which x * 10^k is integer

I have small (Python) command line tool to which the user provides a float, e.g.,
foobar 3.141
and I'd like to compute the granularity of the number, i.e., find the minimum k for which x * 10^k is an integer (up to a given tolerance). Note that the number may be represented in different ways (3141.0e-3 etc.), so counting the number of characters after . isn't necessarily going to cut it.
This
def get_granularity(x):
for k in range(100):
if abs(x - int(x)) < 1.0e-15 * x:
break
x *= 10
return k
works, but I don't like it too much since it makes an assumption about the maximum k (99 in this case). One could invite log10 to the party, but perhaps there's a more elegant solution.

Normalize Small Probabilities in Python

I have a list of probabilities, which I need to normalize to equal 1.0.
e.g. probs = [0.01,0.03,0.005]
I realize that this is done by dividing each probability by the sum of probs. However, if the probabilities become really small, Python will tell me that sum(probs)=0.0. I understand that this is an underflow issue. I suppose I should use the log of each probability. How would I do this?

The sum of even very small floating point values will never truly be 0; they may be close to zero, but can never be exactly zero.
Just divide 1 by their sum, and multiply the probabilities by that factor:
def normalize(probs):
prob_factor = 1 / sum(probs)
return [prob_factor * p for p in probs]
Some probabilities may make up but a very small percentage in the total sum, of course, and that percentage may approach zero. But this just means that when normalising you may end up with normalized probabilities that are either very close to zero, or if smaller than the smallest representable floating point value, equal to zero. The latter only happens if there are probabilities in the list that are so much smaller than the others that they no longer represent anything close to something that'll ever occur.
Demo:
>>> def normalize(probs):
... prob_factor = 1 / sum(probs)
... return [prob_factor * p for p in probs]
...
>>> normalize([0.0000000001,0.000000000003,0.000000000000005])
[0.9708266589000533, 0.029124799767001597, 4.854133294500266e-05]
And the extreme case:
>>> import sys
>>> normalize([sys.float_info.max, sys.float_info.min])
[0.9999999999999999, 0.0]
>>> normalize([sys.float_info.max, sys.float_info.min])[-1] == 0
True

You can always use a scale factor to avoid the underflow problem, either manually entered or automatically calculated, e.g.:
import math
no_z = ([x for x in probs if x > 0.0])
if len(no_z) == 0:
print "Unable to calculate with 0.0 as all the probabilities"
order = int(-math.log10(min(no_z)))
if order > 0:
order = 0
sf = 10**order
scaled = [x * sf for x in probs]
tot = sum(scaled)
norm = [x/tot for x in scaled]
Of course you would probably be better off just using bigfloat or numpy and doing high precision maths.

Generate "random" matrix of certain rank over a fixed set of elements

I'd like to generate matrices of size mxn and rank r, with elements coming from a specified finite set, e.g. {0,1} or {1,2,3,4,5}. I want them to be "random" in some very loose sense of that word, i.e. I want to get a variety of possible outputs from the algorithm with distribution vaguely similar to the distribution of all matrices over that set of elements with the specified rank.
In fact, I don't actually care that it has rank r, just that it's close to a matrix of rank r (measured by the Frobenius norm).
When the set at hand is the reals, I've been doing the following, which is perfectly adequate for my needs: generate matrices U of size mxr and V of nxr, with elements independently sampled from e.g. Normal(0, 2). Then U V' is an mxn matrix of rank r (well, <= r, but I think it's r with high probability).
If I just do that and then round to binary / 1-5, though, the rank increases.
It's also possible to get a lower-rank approximation to a matrix by doing an SVD and taking the first r singular values. Those values, though, won't lie in the desired set, and rounding them will again increase the rank.
This question is related, but accepted answer isn't "random," and the other answer suggests SVD, which doesn't work here as noted.
One possibility I've thought of is to make r linearly independent row or column vectors from the set and then get the rest of the matrix by linear combinations of those. I'm not really clear, though, either on how to get "random" linearly independent vectors, or how to combine them in a quasirandom way after that.
(Not that it's super-relevant, but I'm doing this in numpy.)
Update: I've tried the approach suggested by EMS in the comments, with this simple implementation:
real = np.dot(np.random.normal(0, 1, (10, 3)), np.random.normal(0, 1, (3, 10)))
bin = (real > .5).astype(int)
rank = np.linalg.matrix_rank(bin)
niter = 0
while rank > des_rank:
cand_changes = np.zeros((21, 5))
for n in range(20):
i, j = random.randrange(5), random.randrange(5)
v = 1 - bin[i,j]
x = bin.copy()
x[i, j] = v
x_rank = np.linalg.matrix_rank(x)
cand_changes[n,:] = (i, j, v, x_rank, max((rank + 1e-4) - x_rank, 0))
cand_changes[-1,:] = (0, 0, bin[0,0], rank, 1e-4)
cdf = np.cumsum(cand_changes[:,-1])
cdf /= cdf[-1]
i, j, v, rank, score = cand_changes[np.searchsorted(cdf, random.random()), :]
bin[i, j] = v
niter += 1
if niter % 1000 == 0:
print(niter, rank)
It works quickly for small matrices but falls apart for e.g. 10x10 -- it seems to get stuck at rank 6 or 7, at least for hundreds of thousands of iterations.
It seems like this might work better with a better (ie less-flat) objective function, but I don't know what that would be.
I've also tried a simple rejection method for building up the matrix:
def fill_matrix(m, n, r, vals):
assert m >= r and n >= r
trans = False
if m > n: # more columns than rows I think is better
m, n = n, m
trans = True
get_vec = lambda: np.array([random.choice(vals) for i in range(n)])
vecs = []
n_rejects = 0
# fill in r linearly independent rows
while len(vecs) < r:
v = get_vec()
if np.linalg.matrix_rank(np.vstack(vecs + [v])) > len(vecs):
vecs.append(v)
else:
n_rejects += 1
print("have {} independent ({} rejects)".format(r, n_rejects))
# fill in the rest of the dependent rows
while len(vecs) < m:
v = get_vec()
if np.linalg.matrix_rank(np.vstack(vecs + [v])) > len(vecs):
n_rejects += 1
if n_rejects % 1000 == 0:
print(n_rejects)
else:
vecs.append(v)
print("done ({} total rejects)".format(n_rejects))
m = np.vstack(vecs)
return m.T if trans else m
This works okay for e.g. 10x10 binary matrices with any rank, but not for 0-4 matrices or much larger binaries with lower rank. (For example, getting a 20x20 binary matrix of rank 15 took me 42,000 rejections; with 20x20 of rank 10, it took 1.2 million.)
This is clearly because the space spanned by the first r rows is too small a portion of the space I'm sampling from, e.g. {0,1}^10, in these cases.
We want the intersection of the span of the first r rows with the set of valid values.
So we could try sampling from the span and looking for valid values, but since the span involves real-valued coefficients that's never going to find us valid vectors (even if we normalize so that e.g. the first component is in the valid set).
Maybe this can be formulated as an integer programming problem, or something?

My friend, Daniel Johnson who commented above, came up with an idea but I see he never posted it. It's not very fleshed-out, but you might be able to adapt it.
If A is m-by-r and B is r-by-n and both have rank r then AB has rank r. Now, we just have to pick A and B such that AB has values only in the given set. The simplest case is S = {0,1,2,...,j}.
One choice would be to make A binary with appropriate row/col sums
that guaranteed the correct rank and B with column sums adding to no
more than j (so that each term in the product is in S) and row sums
picked to cause rank r (or at least encourage it as rejection can be
used).
I just think that we can come up with two independent sampling
schemes on A and B that are less complicated and quicker than trying
to attack the whole matrix at once. Unfortunately, all my matrix
sampling code is on the other computer. I know it generalized easily
to allowing entries in a bigger set than {0,1} (i.e. S), but I can't
remember how the computation scaled with m*n.

I am not sure how useful this solution will be, but you can construct a matrix that will allow you to search for the solution on another matrix with only 0 and 1 as entries. If you search randomly on the binary matrix, it is equivalent to randomly modifying the elements of the final matrix, but it is possible to come up with some rules to do better than a random search.
If you want to generate an m-by-n matrix over the element set E with elements ei, 0<=i<k, you start off with the m-by-k*m matrix, A:
Clearly, this matrix has rank m. Now, you can construct another matrix, B, that has 1s at certain locations to pick the elements from the set E. The structure of this matrix is:
Each Bi is a k-by-n matrix. So, the size of AB is m-by-n and rank(AB) is min(m, rank(B)). If we want the output matrix to have only elements from our set, E, then each column of Bi has to have exactly one element set to 1, and the rest set to 0.
If you want to search for a certain rank on B randomly, you need to start off with a valid B with max rank, and rotate a random column j of a random Bi by a random amount. This is equivalent to changing column i row j of A*B to a random element from our set, so it is not a very useful method.
However, you can do certain tricks with the matrices. For example, if k is 2, and there are no overlaps on first rows of B0 and B1, you can generate a linearly dependent row by adding the first rows of these two sub-matrices. The second row will also be linearly dependent on rows of these two matrices. I am not sure if this will easily generalize to k larger than 2, but I am sure there will be other tricks you can employ.
For example, one simple method to generate at most rank k (when m is k+1) is to get a random valid B0, keep rotating all rows of this matrix up to get B1 to Bm-2, set first row of Bm-1 to all 1, and the remaining rows to all 0. The rank cannot be less than k (assuming n > k), because B_0 columns have exactly 1 nonzero element. The remaining rows of the matrices are all linear combinations (in fact exact copies for almost all submatrices) of these rows. The first row of the last submatrix is the sum of all rows of the first submatrix, and the remaining rows of it are all zeros. For larger values of m, you can use permutations of rows of B0 instead of simple rotation.
Once you generate one matrix that satisfies the rank constraint, you may get away with randomly shuffling the rows and columns of it to generate others.

How about like this?
rank = 30
n1 = 100; n2 = 100
from sklearn.decomposition import NMF
model = NMF(n_components=rank, init='random', random_state=0)
U = model.fit_transform(np.random.randint(1, 5, size=(n1, n2)))
V = model.components_
M = np.around(U) # np.around(V)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.