If I have a.shape = (3,4,5) and b.shape = (3,5), using np.einsum() makes broadcasting then multiplying the two arrays super easy and explicit:
result = np.einsum('abc, ac -> abc', a, b)
But if I want to add the two arrays, so far as I can tell, I need two separate steps so that the broadcasting happens properly, and the code feels less explicit.
b = np.expand_dims(b, 1)
result = a + b
Is there way out there that allows me to do this array addition with the clarity of np.einsum()?
Broadcasting can occur only on one extra dimension. For adding these two arrays one could expand them in a one-liner as follows:
import numpy as np
a = np.random.rand(3,4,5); b = np.random.rand(3,5);
c = a + b[:, None, :] # c is shape of a, broadcasting occurs along 2nd dimension
Note this is not any different than c = a + np.expand_dim(b, 1). In terms of clarity it is a personal style thing. I prefer broadcasting, others prefer einsum.
Related
I have the following code:
import numpy as np
from numba import jit
Nx = 15
Ny = 1000
v = np.ones((Nx,Ny))
v = np.reshape(v,(Nx*Ny))
A = np.random.rand(Nx*Ny,Nx*Ny,5)
B = np.random.rand(Nx*Ny,Nx*Ny,5)
C = np.random.rand(Nx*Ny,5)
#jit(nopython=True)
def dotplus(B, v, C):
return np.dot(B, v) + C
k = 2
D = dotplus(B[:,:,k], v, C[:,k])
I get the following warning, which I guess refers to arrays B[:,:,k] and v:
NumbaPerformanceWarning: np.dot() is faster on contiguous arrays, called on (array(float64, 2d, A), array(float64, 1d, C))
return np.dot(B, v0) + C
Is there a way to make the two arrays contiguous, so that Numba can speed up the code?
PS in case you're wondering about the meaning of k, note this is just a MRE. In the actual code, dotplus is called multiple times inside a for loop for different values of k (so, different slices of B and C). The for loop updates the values of v, but B and C don't change.
Flawr is correct. B[..., k] returns a np.view() into B, but does not actually copy any data. In memory, two neighbouring elements of the view have a distance of B.strides[1], which evaluates to B.shape[-1]*B.itemsize and is greater than B.itemsize. Consequentially, your array is not contiguous.
The best optimization is to vectorize the dotplus loop and write
D = np.tensordot(B, v, axes=(1, 0)) + C
The second best optimization is to refactor and let the batch dimension be the first dimension of the array. This can be done on top of the above vectorization and is generally advisable. It would look something like
A = np.random.rand(5, Nx*Ny,Nx*Ny)
# rather than
A = np.random.rand(Nx*Ny,Nx*Ny,5)
If you can't refactor the code, you need to start profiling. You can easily swap axes temporarily via
B = np.moveaxis(B, -1, 0)
some_op(B[k, ...], ...)
B = np.moveaxis(B, 0, -1)
Contrary to max9111's comment, this will not net you anything compared to np.ascontiguousarray() because the data has to be copied in both cases. That said, a copy is O(Nx*Ny*k) + buffer allocation. Direct matrix-vector multiplication is O(Nx*Ny), but you have to gather the elements first, which is really expensive. It comes down to your specific architecture and concrete problem, so profiling is the way to go.
I'm wondering if there is a more "numpythonic" or efficient way of doing the following:
Suppose I have a 1D array A of known length L, and I have a multidimensional array B, which has a dimension also with length L. Suppose I want to add (or set) the value of B[:, ..., x, ..., :] += A[x]. In other words, add the value A[x] to every value of the entire sub-array of B in the matching index x.
An extremely simple working example is this:
A = np.arange(10, 20)
B = np.random.rand(3, len(A), 3)
for iii in range(len(A)):
B[:, iii, :] += A[iii]
Clearly I can always loop over the index I want as above, but I'm curious if there's a more efficient way. If there's some more common terminology which describes this process, I'd also be interested because I'm having difficulty even constructing an appropriate Google search.
I'm also attempting to avoid creating a new array of the same shape as B and tiling the A-vector repeatedly over other indices and then adding that to B, as a more "real" world application would likely involve B being a relatively large array.
For your simple case, you can do:
B[:] = A[:, None]
This works because of broadcasting. By simulating the dimensions of B in A, you tell numpy where to place the elements unambiguously. For a more general case, where you want to place A along dimension k of B, you can do:
B[:] = np.expand_dims(A, tuple(range(A.ndim, A.ndim + B.ndim - k)))
np.expand_dims will add axes at the indices you tell it to. There are other ways too. For example, you can index A with B.ndim - k - 1 instances of None:
B[:] = A[(slice(None), *(None,) * (B.ndim - k - 1))]
You can also use np.reshape to get the correctly shaped view:
B[:] = A.reshape(-1, *np.ones(B.ndim - k - 1, dtype=int))
OR
B[:] = A.reshape(-1, *(1,) * (B.ndim - k - 1))
OR
B[:] = A.reshape((-1,) + (1,) * (B.ndim - k - 1))
In all these cases, you only need to expand the trailing dimensions of A, since that's how broadcasting works. Since broadcasting is such an integral part of numpy, you can simply relpace = with += to get the expected result.
I have two lists of matrices [numpy.ndarray]: (a1,a2,a3.....,an) and (b1,b2,b3......,bn) each one is a square matrix of some size, not all a matrices are the same size and not all b matrices are the same size, but it is guaranteed that dim(a[i])==dim(b[i]) (which means we are only multiplying matrices of same size).
i want to dot product them respectively: a1*b1,a2*b2.....an*bn and store the results in say c1,c2....etc
is there any way to do it besides going over the pairs 1 by 1 in a for loop?
im currently using:
# a_list and b_list contain n matrices each
# a[i] & b[i] are numpy.ndarray objects
a_list = [a1,a2,.....]
b_list = [b1,b2,.....]
result_list = []
for i in range(n):
result_list.append(numpy.dot(a_list[i],b_list[i])
I think the accepted solution is syntactic sugar for a for loop, however we can look for a more interesting option here.
Technically what we want is a numpy array of numpy arrays, allowing us to do vectorized operations between them, similar to how np.array([1,2,3]) * np.array([3,4,5]) performs scalar multiplication between each element.
So we'd like a numpy array of numpy arrays, except that we'd like the * operator to be defined as matrix multiplication instead of element-wise multiplication. It's interesting to note that this is the case for the np.matrix class. It is however important to note that this class is deprecated and can cause complications, but for the case of learning / understanding things all the way, we can try using this class..
import nummpy as np
b_0 = np.asmatrix(np.arange(9).reshape(3, 3))
# b_0 = 0 1 2
# 3 4 5
# 6 7 8
b_1 = np.asmatrix(np.arange(4).reshape(2, 2))
# b_1 = 0 1
# 2 3
a_0 = np.asmatrix(np.eye(3))
a_1 = np.asmatrix(np.eye(2))
a = np.asarray([a_0, a_1])
b = np.asarray([b_0, b_1])
a * b # We get [b_0, b_1])
If this were an important syntactic option for you, you could perhaps write a custom class that would be compatible with numpy arrays (and thus not use np.matrix). This will probably however be slightly slower than using a plain old for loop with np.dot.
You can use python list comprehensions:
result_list = [a.dot(b) for a, b in zip(a_list, b_list)]
I want to slice an array so that I can use it to perform an operation with another array of arbitrary dimension. In other words, I am doing the following:
A = np.random.rand(5)
B = np.random.rand(5,2,3,4)
slicer = [slice(None)] + [None]*(len(B.shape)-1)
result = B*A[slicer]
Is there some syntax that I can use so that I do not have to construct slicer?
In this specific case you can use np.einsum with an ellipsis.
result2 = np.einsum('i,i...->i...', A, B)
np.allclose(result, result2)
Out[232]: True
Although, as #hpaulj points out this only works for multiplication (or division if you use 1/B).
Since broadcasting works from the other end normally, you can use np.transpose twice get the axes in the right order.
result3 = np.transpose(np.transpose(B) * A)
But that's also not a general case
I have two multidimensional NumPy arrays, A and B, with A.shape = (K, d, N) and B.shape = (K, N, d). I would like to perform an element-wise operation over axis 0 (K), with that operation being matrix multiplication over axes 1 and 2 (d, N and N, d). So the result should be a multidimensional array C with C.shape = (K, d, d), so that C[k] = np.dot(A[k], B[k]). A naive implementation would look like this:
C = np.vstack([np.dot(A[k], B[k])[np.newaxis, :, :] for k in xrange(K)])
but this implementation is slow. A slightly faster approach looks like this:
C = np.dot(A, B)[:, :, 0, :]
which uses the default behaviour of np.dot on multidimensional arrays, giving me an array with shape (K, d, K, d). However, this approach computes the required answer K times (each of the entries along axis 2 are the same). Asymptotically it will be slower than the first approach, but the overhead is much less. I am also aware of the following approach:
from numpy.core.umath_tests import matrix_multiply
C = matrix_multiply(A, B)
but I am not guaranteed that this function will be available. My question is thus, does NumPy provide a standard way of doing this efficiently? An answer which applies to multidimensional arrays in general would be perfect, but an answer specific to only this case would be great too.
Edit: As pointed out by #Juh_, the second approach is incorrect. The correct version is:
C = np.dot(A, B).diagonal(axis1=0, axis2=2).transpose(2, 0, 1)
but the overhead added makes it slower than the first approach, even for small matrices. The last approach is winning by a long shot on all my timing tests, for small and large matrices. I'm now strongly considering using this if no better solution crops up, even if that would mean copying the numpy.core.umath_tests library (written in C) into my project.
A possible solution to your problem is:
C = np.sum(A[:,:,:,np.newaxis]*B[:,np.newaxis,:,:],axis=2)
However:
it is quicker than the vstack approach only if K is much bigger than d and N
their might be some memory issue: in the above solution an KxdxNxd array is allocated (i.e. all possible product paires, before summing). Actually I could not test with big K,d and N as I was going out of memory.
btw, note that:
C = np.dot(A, B)[:, :, 0, :]
does not give the correct result. It got me tricked because I first checked my method by comparing the results to those given by this np.dot command.
I have this same issue in my project. The best I've been able to come up with is, I think it's a little faster (maybe 10%) than using vstack:
K, d, N = A.shape
C = np.empty((K, d, d))
for k in xrange(K):
C[k] = np.dot(A[k], B[k])
I'd love to see a better solution, I can't quite see how one would use tensordot to do this.
A very flexible, compact, and fast solution:
C = np.einsum('Kab,Kbc->Kac', A, B, optimize=True)
Confirmation:
import numpy as np
K = 10
d = 5
N = 3
A = np.random.rand(K,d,N)
B = np.random.rand(K,N,d)
C_old = np.dot(A, B).diagonal(axis1=0, axis2=2).transpose(2, 0, 1)
C_new = np.einsum('Kab,Kbc->Kac', A, B)
print(np.max(C_old-C_new)) # should be 0 or a very small number
For large multi-dimensional arrays, the optional parameter optimize=True can save you a lot of time.
You can learn about einsum here:
https://ajcr.net/Basic-guide-to-einsum/
https://rockt.github.io/2018/04/30/einsum
https://numpy.org/doc/stable/reference/generated/numpy.einsum.html
Quote:
The Einstein summation convention can be used to compute many multi-dimensional, linear algebraic array operations. einsum provides a succinct way of representing these. A non-exhaustive list of these operations is:
Trace of an array, numpy.trace.
Return a diagonal, numpy.diag.
Array axis summations, numpy.sum.
Transpositions and permutations, numpy.transpose.
Matrix multiplication and dot product, numpy.matmul numpy.dot.
Vector inner and outer products, numpy.inner numpy.outer.
Broadcasting, element-wise and scalar multiplication, numpy.multiply.
Tensor contractions, numpy.tensordot.
Chained array operations, in efficient calculation order, numpy.einsum_path.
You can do
np.matmul(A, B)
Look at https://numpy.org/doc/stable/reference/generated/numpy.matmul.html.
Should be faster than einsum for big enough K.