Additional information on numpy.einsum() - python

I am trying to understand numpy.einsum() function but the documentation as well as this answer from stackoverflow still leave me with some questions.
Let's take the einstein sum and the matrices defined in the answer.
A = np.array([0, 1, 2])
B = np.array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
np.einsum('i,ij->i', A, B)
So, based on my understanding of einstein sum, I would translate this function to be equivalent to the notation (A_i*B_ij) so I would obtain :
j = 1 : A_1*B_11 + A_2*B_21 + A_3*B_31
j = 2 : A_1*B_12 + A_2*B_22+ A_3*B_32
and so on until j = 4. This gives
j = 1 : 0 + 4 + 16
j = 2 : 0 + 5 + 18
which would be the einstein summation according to my understanding. Instead, the function does not perform the overall sum but stores the separate term in a matrix where we can spot the results of the (A_i * B_ij)
0 0 0 0
4 5 6 7
16 18 20 22
How is this actually controlled by the function ? I feel this is controlled by the output labels as mentionned in the documentation :
The output can be controlled by specifying output subscript labels as
well. This specifies the label order, and allows summing to be
disallowed or forced when desired
so somehow I assume that putting ->i disables summing of the inner sums. But how does this work exactly ? This is not clear for me. Putting ->j provides the actual einstein sum as expected.

It seems your understanding of the Einstein summation is not correct. The subscript operations you've written out have the multiplication correct, but the summation is over the wrong axis.
Think about what this means: np.einsum('i,ij->i', A, B).
A has shape (i,) and B has shape (i, j).
Multiply every column of B by A.
Sum over the second axis of B, i.e., over the axis labeled j.
This gives an output of shape (i,) == (3,), while your subscript notation gives an output of shape (j,) == (4,). You're summing over the wrong axis.
More detail:
Remember that the multiplication always happens first. The left-hand subscripts tell the np.einsum function which rows/columns/etc of the input arrays are to be multiplied with one another. The output of this step always has the same shape as the highest-dimensional input array. I.e., at this point, a hypothetical "intermediate" array has shape (3, 4) == B.shape.
After multiplication, there is summation. This is controlled by which subscripts are omitted from the right-hand side. In this case, j is omitted, which means to sum along the first axis of the array. (You're summing along the zeroth.)
If you instead wrote: np.einsum('i,ij->ij', A, B), there would be no summation, as no subscripts are omitted. Thus you'd get the array you've got at the end of your question.
Here are a couple of examples:
Ex 1:
No omitted subscripts, so no summation. Just multiply columns of B by A. This is the last array you've written out.
>>> (np.einsum('i,ij->ij', A, B) == (A[:, None] * B)).all()
True
Ex 2:
Same as the example. Multiply columns, then sum across the output's columns.
>>> (np.einsum('i,ij->i', A, B) == (A[:, None] * B).sum(axis=-1)).all()
True
Ex 3:
The sum as you've written it above. Multiply columns, then sum across the output's rows.
>>> (np.einsum('i,ij->j', A, B) == (A[:, None] * B).sum(axis=0)).all()
True
Ex 4:
Note that we can omit all axes at the end, to just get the total sum across the whole array.
>>> np.einsum('i,ij->', A, B)
98
Ex 5:
Note that the summation really happens because we repeated the input label 'i'. If we instead use different labels for each axis of the input arrays, we can compute things similar to Kronecker products:
>>> np.einsum('i,jk', A, B).shape
(3, 3, 4)
EDIT
The NumPy implementation of the Einstein sum differs a bit from the traditional definition. Technically, the Einstein sum doesn't have the idea of "output labels". Those are always implied by the repeated input labels.
From the docs: "Whenever a label is repeated, it is summed." So, traditionally, we'd write something like np.einsum('i,ij', A, B). This is equivalent to np.einsum('i,ij->j', A, B). The i is repeated, so it is summed, leaving only the axis labeled j. You can think about the sum in which we specify no output labels as being the same as specifying only the labels that are not repeated in the input. That is, the label 'i,ij' is the same as 'i,ij->j'.
The output labels are an extension or augmentation implemented in NumPy, which allow the caller to force summation or to enforce no summation on an axis. From the docs: "The output can be controlled by specifying output subscript labels as well. This specifies the label order, and allows summing to be disallowed or forced when desired."

Related

Strange numpy divide behaviour for scalars

I have been trying to upgrade a library which has a bunch of geometric operations for scalars so they will work with numpy arrays as well. While doing this I noticed some strange behaviour with numpy divide.
In original code checks a normalised difference between to variables if neither variable is zero, swapping across to numpy this ended up looking something like:
import numpy as np
a = np.array([0, 1, 2, 3, 4])
b = np.array([1, 2, 3, 0, 4])
o = np.zeros(len(a))
o = np.divide(np.subtract(a, b), b, out=o, where=np.logical_and(a != 0, b != 0))
print(f'First implementation: {o}')
where I passed in a output buffer initialised to zero for instances which could not be calculated; this returns:
First implementation: [ 0. -0.5 -0.33333333 0. 0. ]
I had to slightly modify this for scalars as out required an array, but it seemed fine.
a = 0
b = 4
o = None if np.isscalar(a) else np.zeros(len(a))
o = np.divide(np.subtract(a, b), b, out=o, where=np.logical_and(b != 0, a != 0))
print(f'Modified out for scalar: {o}')
returns
Modified out for scalar: 0.0.
Then ran this through some test functions and found a lot of them failed. Digging into this, I found that the first time I call the divide with a scalar with where set to False the function returns zero, but if I called it again, the second time it returns something unpredictable.
a = 0
b = 4
print(f'First divide: {np.divide(b, a, where=False)}')
print(f'Second divide: {np.divide(b, a, where=False)}')
returns
First divide: 0.0
Second divide: 4.0
Looking at the documentation, it says "locations within it where the condition is False will remain uninitialized", so I guess numpy as some internal buffer which is initially set to zero then subsequently it ends up carrying over an earlier intermediate value.
I am struggling to see how I can use divide with or without a where clause; if I use where I get an unpredictable output and if I don't I can't protect against divide by zero. Am I missing something or do I just need to have a different code path in these cases? I realise I'm half way to a different code path already with the out variable.
I would be grateful for any advice.
It looks like a bug to me. But I think you'd want to short-circuit the calls to ufuncs in the case of scalars for performance reasons anyway, so its a question of trying to keep it from being too messy. Since either a or b could be scalars, you need to check them both. Put that check into a function that conditionally returns an output array or None, and you could do
def scalar_test_np_zeros(a, b):
"""Return np.zeros for the length of arguments unless both
arguments are scalar, then None."""
if a_is:=np.isscalar(a) and np.isscalar(b):
return None
else:
return np.zeros(len(a) if a_is else len(b))
a = 0
b = 4
if o := scalar_test_np_zeros(a, b) is None:
o = (a-b)/b if a and b else 0.0
else:
np.divide(np.subtract(a, b), b, out=o,
where=np.logical_and(b != 0, a != 0))
The scalar test would be useful in other code with similar problems.
For what it's worth, if I helps anyone I have come to the conclusion I need to wrap np.divide to use it safely in functions which can take arrays and scalars. This is my wrapping function:
import numpy as np
def divide_where(a, b, where, out=None, fill=0):
""" wraps numpy divide to safely handle where clause for both arrays and scalars
- a: dividend array or scalar
- b: divisor array or scalar
- where: locations where is True a/b will be set
- out: location where data is written to; if None, an output array will be created using fill value
- fill: defines fill value. if scalar and where True value will used; if out not set fill value is used creating output array
"""
if (a_is_scalar := np.isscalar(a)) and np.isscalar(b):
return fill if not where else a / b
if out is None:
out = np.full_like(b if a_is_scalar else a, fill)
return np.divide(a, b, out=out, where=where)

Python: general sum over numpy rows

I want to sum all the lines of one matrix hence, if I have a n x 2 matrix, the result should be a 1 x 2 vector with all rows summed. I can do something like that with np.sum( arg, axis=1 ) but I get an error if I supply a vector as argument. Is there any more general sum function which doesn't throw an error when a vector is supplied? Note: This was never a problem in MATLAB.
Background: I wrote a function which calculates some stuff and sums over all rows of the matrix. Depending on the number of inputs, the matrix has a different number of rows and the number of rows is >= 1
According to numpy.sum documentation, you cannot specify axis=1 for vectors as you would get a numpy AxisError saying axis 1 is out of bounds for array of dimension 1.
A possible workaround could be, for example, writing a dedicated function that checks the size before performing the sum. Please find below a possible implementation:
import numpy as np
M = np.array([[1, 4],
[2, 3]])
v = np.array([1, 4])
def sum_over_columns(input_arr):
if len(input_arr.shape) > 1:
return input_arr.sum(axis=1)
return input_arr.sum()
print(sum_over_columns(M))
print(sum_over_columns(v))
In a more pythonic way (not necessarily more readable):
def oneliner_sum(input_arr):
return input_arr.sum(axis=(1 if len(input_arr.shape) > 1 else None))
You can do
np.sum(np.atleast_2d(x), axis=1)
This will first convert vectors to singleton-dimensional 2D matrices if necessary.

efficient way to calculate all possible multiplication elements inside a vector in numpy

Is there any efficient way of doing the following:
Assume I have a vector A of length n, I want to calculate a second vector B, where
B[i] = A[0] * A[1] * .. *A[i-1] * A[i+1] *..*A[n-1]
i.e., B[i] is the multipication of all elements in A except for the i'th elmenet.
Initially, I thought of doing something like:
C = np.prod(A)
B = C/A
But, then I have a problem when I have an element of A which is equal to zero. Of course, I can
find out if I have one zero and then immediately set B to be the all-zero vector except for that
single zero and to put there the multiple of the rest of A and in the case of more than 1 zero to zero out B completely. But this becomes a little cumbersome when I want to do that operation for every row inside a matrix and not just for a single vector.
Of course, I can do it in a loop but I was wondering if there is a more efficient way?
You could slice up to (but not including) i, then from i+1 and on. Concatenate those slices together and multiply.
np.prod(np.concatenate([a[:i], a[i+1:]]))
A possible one liner using np.eye, np.tile and np.prod:
np.prod(np.tile(A, (A.size, 1))[(1 - np.eye(A.size)).astype(bool)].reshape(A.size, -1), axis=1)

Numpy: function that creates block matrices

Say I have a dimension k. What I'm looking for is a function that takes k as an input and returns the following block matrix.
Let I be a k-dimensional identity matrix and 0 be k-dimensional square matrix of zeros
That is:
def function(k):
...
return matrix
function(2) -> np.array([I, 0])
function(3) -> np.array([[I,0,0]
[0,I,0]])
function(4) -> np.array([[I,0,0,0]
[0,I,0,0],
[0,0,I,0]])
function(5) -> np.array([[I,0,0,0,0]
[0,I,0,0,0],
[0,0,I,0,0],
[0,0,0,I,0]])
That is, the output is a (k-1,k) matrix where identity matrices are on the diagonal elements and zero matrices elsewhere.
What I've tried:
I know how to create any individual row, I just can't think of a way to put it into a function so that it takes a dimension, k, and spits out the matrix I need.
e.g.
np.block([[np.eye(3),np.zeros((3, 3)),np.zeros((3, 3))],
[np.zeros((3, 3)),np.eye(3),np.zeros((3, 3))]])
Would be the desired output for k=3
scipy.linalg.block_diag seems like it might be on the right track...
IMO, np.eye already has everything you need, as you can define number of rows and columns separately.
So your function should simply look like
def fct(k):
return np.eye(k**2-k, k**2)
If I understand you correctly, this should work:
a = np.concatenate((np.eye((k-1)*k),np.zeros([(k-1)*k,k])), axis=1)
(at least, when I set k=3 and compare with the np.block(...) expression you gave, both results are identical)
IIUC, you can also try np.fill_diagonal such that you create the right shape of matrices and then fill in the diagonal parts.
def make_block(k):
arr = np.zeros(((k-1)*k, k*k))
np.fill_diagonal(arr, 1)
return arr
There are two interpretations to your question. One is where you are basically creating a matrix of the form [[1, 0, 0], [0, 1, 0]], which can be mathematically represented as [I 0], and another where each element contains its own numpy array entirely (which does reduce computational ability but might be what you want).
The former:
np.append(np.eye(k-1), np,zeros((k-1, 1)), axis=1)
The latter (a bit more complicated):
I = np.eye(m) #Whatever dimensions you want, although probably m==n
Z = np.eye(n)
arr = np.zeros((k-1, k)
for i in range(k-1):
for j in range(k):
if i == j:
arr[i,j] = np.array(I)
else:
arr[i,j] = np.array(Z)
I really have no idea how the second one would be useful, so I think you might be a bit confused on the fundamental structure of a block matrix if that's what you think you want. Generally [A b], for example, with A being a matrix and b being a vector, is generally thought of as now representing a single matrix, with block notation just existing for simplicity's sake. Hope this helps!

Generate "random" matrix of certain rank over a fixed set of elements

I'd like to generate matrices of size mxn and rank r, with elements coming from a specified finite set, e.g. {0,1} or {1,2,3,4,5}. I want them to be "random" in some very loose sense of that word, i.e. I want to get a variety of possible outputs from the algorithm with distribution vaguely similar to the distribution of all matrices over that set of elements with the specified rank.
In fact, I don't actually care that it has rank r, just that it's close to a matrix of rank r (measured by the Frobenius norm).
When the set at hand is the reals, I've been doing the following, which is perfectly adequate for my needs: generate matrices U of size mxr and V of nxr, with elements independently sampled from e.g. Normal(0, 2). Then U V' is an mxn matrix of rank r (well, <= r, but I think it's r with high probability).
If I just do that and then round to binary / 1-5, though, the rank increases.
It's also possible to get a lower-rank approximation to a matrix by doing an SVD and taking the first r singular values. Those values, though, won't lie in the desired set, and rounding them will again increase the rank.
This question is related, but accepted answer isn't "random," and the other answer suggests SVD, which doesn't work here as noted.
One possibility I've thought of is to make r linearly independent row or column vectors from the set and then get the rest of the matrix by linear combinations of those. I'm not really clear, though, either on how to get "random" linearly independent vectors, or how to combine them in a quasirandom way after that.
(Not that it's super-relevant, but I'm doing this in numpy.)
Update: I've tried the approach suggested by EMS in the comments, with this simple implementation:
real = np.dot(np.random.normal(0, 1, (10, 3)), np.random.normal(0, 1, (3, 10)))
bin = (real > .5).astype(int)
rank = np.linalg.matrix_rank(bin)
niter = 0
while rank > des_rank:
cand_changes = np.zeros((21, 5))
for n in range(20):
i, j = random.randrange(5), random.randrange(5)
v = 1 - bin[i,j]
x = bin.copy()
x[i, j] = v
x_rank = np.linalg.matrix_rank(x)
cand_changes[n,:] = (i, j, v, x_rank, max((rank + 1e-4) - x_rank, 0))
cand_changes[-1,:] = (0, 0, bin[0,0], rank, 1e-4)
cdf = np.cumsum(cand_changes[:,-1])
cdf /= cdf[-1]
i, j, v, rank, score = cand_changes[np.searchsorted(cdf, random.random()), :]
bin[i, j] = v
niter += 1
if niter % 1000 == 0:
print(niter, rank)
It works quickly for small matrices but falls apart for e.g. 10x10 -- it seems to get stuck at rank 6 or 7, at least for hundreds of thousands of iterations.
It seems like this might work better with a better (ie less-flat) objective function, but I don't know what that would be.
I've also tried a simple rejection method for building up the matrix:
def fill_matrix(m, n, r, vals):
assert m >= r and n >= r
trans = False
if m > n: # more columns than rows I think is better
m, n = n, m
trans = True
get_vec = lambda: np.array([random.choice(vals) for i in range(n)])
vecs = []
n_rejects = 0
# fill in r linearly independent rows
while len(vecs) < r:
v = get_vec()
if np.linalg.matrix_rank(np.vstack(vecs + [v])) > len(vecs):
vecs.append(v)
else:
n_rejects += 1
print("have {} independent ({} rejects)".format(r, n_rejects))
# fill in the rest of the dependent rows
while len(vecs) < m:
v = get_vec()
if np.linalg.matrix_rank(np.vstack(vecs + [v])) > len(vecs):
n_rejects += 1
if n_rejects % 1000 == 0:
print(n_rejects)
else:
vecs.append(v)
print("done ({} total rejects)".format(n_rejects))
m = np.vstack(vecs)
return m.T if trans else m
This works okay for e.g. 10x10 binary matrices with any rank, but not for 0-4 matrices or much larger binaries with lower rank. (For example, getting a 20x20 binary matrix of rank 15 took me 42,000 rejections; with 20x20 of rank 10, it took 1.2 million.)
This is clearly because the space spanned by the first r rows is too small a portion of the space I'm sampling from, e.g. {0,1}^10, in these cases.
We want the intersection of the span of the first r rows with the set of valid values.
So we could try sampling from the span and looking for valid values, but since the span involves real-valued coefficients that's never going to find us valid vectors (even if we normalize so that e.g. the first component is in the valid set).
Maybe this can be formulated as an integer programming problem, or something?
My friend, Daniel Johnson who commented above, came up with an idea but I see he never posted it. It's not very fleshed-out, but you might be able to adapt it.
If A is m-by-r and B is r-by-n and both have rank r then AB has rank r. Now, we just have to pick A and B such that AB has values only in the given set. The simplest case is S = {0,1,2,...,j}.
One choice would be to make A binary with appropriate row/col sums
that guaranteed the correct rank and B with column sums adding to no
more than j (so that each term in the product is in S) and row sums
picked to cause rank r (or at least encourage it as rejection can be
used).
I just think that we can come up with two independent sampling
schemes on A and B that are less complicated and quicker than trying
to attack the whole matrix at once. Unfortunately, all my matrix
sampling code is on the other computer. I know it generalized easily
to allowing entries in a bigger set than {0,1} (i.e. S), but I can't
remember how the computation scaled with m*n.
I am not sure how useful this solution will be, but you can construct a matrix that will allow you to search for the solution on another matrix with only 0 and 1 as entries. If you search randomly on the binary matrix, it is equivalent to randomly modifying the elements of the final matrix, but it is possible to come up with some rules to do better than a random search.
If you want to generate an m-by-n matrix over the element set E with elements ei, 0<=i<k, you start off with the m-by-k*m matrix, A:
Clearly, this matrix has rank m. Now, you can construct another matrix, B, that has 1s at certain locations to pick the elements from the set E. The structure of this matrix is:
Each Bi is a k-by-n matrix. So, the size of AB is m-by-n and rank(AB) is min(m, rank(B)). If we want the output matrix to have only elements from our set, E, then each column of Bi has to have exactly one element set to 1, and the rest set to 0.
If you want to search for a certain rank on B randomly, you need to start off with a valid B with max rank, and rotate a random column j of a random Bi by a random amount. This is equivalent to changing column i row j of A*B to a random element from our set, so it is not a very useful method.
However, you can do certain tricks with the matrices. For example, if k is 2, and there are no overlaps on first rows of B0 and B1, you can generate a linearly dependent row by adding the first rows of these two sub-matrices. The second row will also be linearly dependent on rows of these two matrices. I am not sure if this will easily generalize to k larger than 2, but I am sure there will be other tricks you can employ.
For example, one simple method to generate at most rank k (when m is k+1) is to get a random valid B0, keep rotating all rows of this matrix up to get B1 to Bm-2, set first row of Bm-1 to all 1, and the remaining rows to all 0. The rank cannot be less than k (assuming n > k), because B_0 columns have exactly 1 nonzero element. The remaining rows of the matrices are all linear combinations (in fact exact copies for almost all submatrices) of these rows. The first row of the last submatrix is the sum of all rows of the first submatrix, and the remaining rows of it are all zeros. For larger values of m, you can use permutations of rows of B0 instead of simple rotation.
Once you generate one matrix that satisfies the rank constraint, you may get away with randomly shuffling the rows and columns of it to generate others.
How about like this?
rank = 30
n1 = 100; n2 = 100
from sklearn.decomposition import NMF
model = NMF(n_components=rank, init='random', random_state=0)
U = model.fit_transform(np.random.randint(1, 5, size=(n1, n2)))
V = model.components_
M = np.around(U) # np.around(V)

Categories