I have two NxM arrays in numpy, a and b. I would like to perform a vectorized operation that does the following:
c = np.zeros(N)
for i in range(N):
for j in range(M):
c[i] += a[i, j]*b[i, j]
Stated in a more mathematical way, I have two matrices A and B, and want to compute the diagonal of the matrix A*B (being imprecise with matrix transposition, etc). I've been trying to accomplish something like this with the tensordot function, but haven't had much success. This is an operation that is going to be performed many times, so I would like for it to be efficient (i.e., without literally calculating the matrix AB and just taking the diagonal from that).
Related
Imagine having 2 sparse matrix:
> A, A.shape = (n,m)
> B, B.shape = (m,n)
I would like to compute the dot product A*B, but then only keep the diagonal. The matrices being big, I actually don't want to compute other values than the ones in the diagonal.
This is a variant of the question Is there a numpy/scipy dot product, calculating only the diagonal entries of the result?
Where the most relevant answer seems to be to use np.einsum:
np.einsum('ij,ji->i', A, B)
However this does not work:
ValueError: einstein sum subscripts string contains too many subscripts for operand 0
The solution is to use todense(), but it increases a lot the memory usage: np.einsum('ij,ji->i', A.todense(), B.todense())
The other solution, that I currently use, is to iterate over all the rows of A and compute each product in the loop :
for i in range(len_A):
result = np.float32(A[i].dot(B[:, i])[0, 0])
...
None of these solutions seems perfect. Is there an equivalent to np.einsum that could work with sparse matrices ?
[sum(A[i]*B.T[i]) for i in range(min(A.shape[0], B.shape[1]))]
otherwise this is faster:
l = min(A.shape[0], B.shape[1])
(A[np.arange(l)]*B.T[np.arange(l)]).sum(axis=1)
In general you shouldn't try to use numpy functions on the scipy.sparse arrays. In your case I'd first make sure both arrays actually have a compatible shape, that is
A, A.shape = (r,m)
B, B.shape = (m,r)
where r = min(n, p). Then we can compute the diagonal of the matrix product using
d = (A.multiply(B.T)).sum(axis=1)
Here we compute the entry wise row-column products, and manually sum them up. This avoids all the unnecessary computations you'd get using dot/#/*. (Note that unlike in numpy, both * and # perform matrix multiplication.)
Suppose that we are given a two dimensional matrix A of dtype=uint8 with N rows and M columns and a uint8 vector of size N called x. We need to bit-wise XOR each row of A, e.g. A[i], with the corresponding element in x, i.e. x[i].
Currently, I am doing this as follows, but think that there are more efficient ways of doing that with numpy vectorization capabilities.
for i in range(A.shape[0]):
A[i,:] = np.bitwise_xor(A[i,:], x[i]
This is the row wised XOR. Besides this, this XOR needs to be applied column-wise, too.
Thanks in advance.
How can I speed up this code in python?
while ( norm_corr > corr_len ):
correlation = 0.0
for i in xrange(6):
for j in xrange(6):
correlation += (p[i] * T_n[j][i]) * ((F[j] - Fbar) * (F[i] - Fbar))
Integral += correlation
T_n =np.mat(T_n) * np.mat(TT)
T_n = T_n.tolist()
norm_corr = correlation / variance
Here, TT is a fixed 6x6 matrix, p is a fixed 1x6 matrix, and F is fixed 1x6 matrix. T_n is the nth power of TT.
This while loop might be repeated for 10^4 times.
The way to do these things quickly is to use Numpy's built-in functions and operators to perform the operations. Numpy is implemented internally with optimized C code and if you set up your computation properly, it will run much faster.
But leveraging Numpy effectively can sometimes be tricky. It's called "vectorizing" your code - you have to figure out how to express it in a way that acts on whole arrays, rather than with explicit loops.
For example in your loop you have p[i] * T_n[j][i], which IMHO can be done with a vector-by-matrix multiplication: if v is 1x6 and m is 6x6 then v.dot(m) is 1x6 that computes dot products of v with the columns of m. You can use transposes and reshapes to work in different dimensions, if necessary.
I have several numpy arrays that I would like to multiply (using dot, so matrix multiplication). I'd like to put them all into a numpy array, but I can't figure out how to do it.
E.g.
a = np.random.randn((10,2,2))
b = np.random.randn((10,2))
So I have 10 2x2 matrices (a) and 10 2x1 matrices (b). What I could do is this:
c = np.zeros((10,2))
for i in range(10):
c[i] = np.dot(a[i,:,:],b[i,:])
You get the idea.
But I feel like there's a usage of dot or tensordot or something that would do this in one line really easily. I just can't make sense of the dot and tensordot functions for >2 dimensions like this.
You could use np.einsum:
c = np.einsum('ijk,ik->ij', a, b)
einsum performs a sum of products. Since matrix multiplication is a sum of products, any matrix multiplication can be expressed using einsum. It is based on Einstein summation notation.
The first argument to einsum, ijk,ik->ij is a string of subscripts.
ijk declares that a has three axes which are denoted by i, j, and k.
ik, similarly, declares that the axes of b will be denoted i and k.
When the subscripts repeat, those axes are locked together for purposes of summation.
The part of the subscript that follows the -> shows the axes which will remain after summation.
Since the k appears on the left (of the ->) but disappears on the right, there is summation over k. It means that the sum
c_ij = sum over k ( a_ijk * b_ik )
should be computed. Since this sum can be computed for each i and j, the result is an array with subscripts i and j.
I have a large symmetric matrix A of dimensions (N, N) (N is about twenty million), and for sure I cannot store this matrix (50% components of A are zeros).
But every component A[i, j] is explicitly known: A[i, j] = f(i, j). For example A[i, j] = cos(i)*cos(j).
I need to multiply this matrix with a vector of length N. What is "doable" way to do that on a machine of 64 cores, 128GB of RAM?
If you have a way to compute elements of matrix on the fly there is no need to store whole matrix in memory. Also each element of result vector in independent of each other so you can run as many parallel workers as you want.
The only optimization of algorithm I can think of is take into consideration that f(i, j) = cos(i)*cos(j) is symmetric function (f(i, j) = f(j, i)). But that's if this is your real function.
Also check numpy and Cython for much faster computations in Python as pure Python can be a little slow for this kind of job.