I'm trying to speed up my code to perform some numerical calculations where I need to multiply 3 matrices with an array. The structure of the problem is the following:
The array as a shape of (N, 10)
The first matrix is constant along the dynamic dimension of the array and has a shape of (10, 10)
The other two matrices vary along the first dimension of the array and have a (N, 10, 10) shape
The result of the calculation should be an array with (N, shape)
I've implemented a solution using for loops that is working, but I'd like to have a better performance so I'm trying to use the numpy functions. I've tried using numpy.tensordot but when multiplying the dynamic matrices with the array I get a shape of (N, 10, N) instead of (N, 10)
My for loop is the following:
res = np.zeros(temp_rho.shape, dtype=np.complex128)
for i in range(temp_rho.shape[0]):
res[i] = np.dot(self.constMatrix, temp_rho[i])
res[i] += np.dot(self.dinMat1[i], temp_rho[i])
res[i] += np.dot(self.dinMat2[i], np.conj(temp_rho[i]))
#temp_rho.shape = (N, 10)
#res.shape = (N, 10)
#self.constMatrix.shape = (10, 10)
#self.dinMat1.shape = (N, 10, 10)
#self.dinMat2.shape = (N, 10, 10)
How should this code be implemented dot products of numpy, returning the correct dimensions?
Here's an approach using a combination of np.dot and np.einsum -
parte1 = constMatrix.dot(temp_rho.T).T
parte2 = np.einsum('ijk,ik->ij',dinMat1, temp_rho)
parte3 = np.einsum('ijk,ik->ij',dinMat2, np.conj(temp_rho))
out = parte1 + parte2 + parte3
Alternative way to get parte1 would be with np.tensordot -
parte1 = np.tensordot(temp_rho, constMatrix, axes=([1],[1]))
Why doesn't numpy.tensordot work for the later two sum-reductions?
Well, we need to keep the first axis between dinMat1/dinMat2 aligned against the first axis of temp_rho/np.conj(temp_rho), which isn't possible with tensordot as the axes that are not sum-reduced are elementwise multiplied along two separate axes. Therefore, when used with np.tensordot, we would end up with two axes of length N corresponding to the first axis each from the two inputs.
Related
I'm currently trying to fill a matrix K where each entry in the matrix is just a function applied to two entries of an array x.
At the moment I'm using the most obvious method of running through rows and columns one at a time using a double for-loop:
K = np.zeros((x.shape[0],x.shape[0]), dtype=np.float32)
for i in range(x.shape[0]):
for j in range(x.shape[0]):
K[i,j] = f(x[i],x[j])
While this works fine the resulting matrix is a 10,000 by 10,000 matrix and takes very long to calculate. I was wondering if there is a more efficient way to do this built into NumPy?
EDIT: The function in question here is a gaussian kernel:
def gaussian(a,b,sigma):
vec = a-b
return np.exp(- np.dot(vec,vec)/(2*sigma**2))
where I set sigma in advance before calculating the matrix.
The array x is an array of shape (10000, 8). So the scalar product in the gaussian is between two vectors of dimension 8.
You can use a single for loop together with broadcasting. This requires to change the implementation of the gaussian function to accept 2D inputs:
def gaussian(a,b,sigma):
vec = a-b
return np.exp(- np.sum(vec**2, axis=-1)/(2*sigma**2))
K = np.zeros((x.shape[0],x.shape[0]), dtype=np.float32)
for i in range(x.shape[0]):
K[i] = gaussian(x[i:i+1], x)
Theoretically you could accomplish this even without any for loop, again by using broadcasting, but here an intermediary array of size len(x)**2 * x.shape[1] will be created which might run out of memory for your array sizes:
K = gaussian(x[None, :, :], x[:, None, :])
I want to repeat a 1D-array along the dimensions of another array, knowing that this number of dimensions can change.
For example:
import numpy as np
to_repeat = np.linspace(0, 100, 10)
base_array = np.random.random((24, 60)) ## this one can have more than two dimensions.
final_array = np.array([[to_repeat for i in range(base_array.shape[0])] for j in range(base_array.shape[1])]).T
print(final_array.shape)
# >>> (10, 24, 60)
How can this be extended to an array base_array with an arbitrary number of dimensions?
Possibly using numpy vectorized functions in order to avoid loops?
EDIT (bigger picture):
base_array is in fact of shape (10, 24, 60) (if we stick to this example), where the coordinates along the first dimension are the vector to_repeat.
I'm looking for the minimum along the first dimension of base_array, and create the array of corresponding coordinates, here of shape (24, 60).
You don't need final_array, you can get the result you want by:
to_repeat[base_array.argmin(0)]
I have numpy array 'test' of dimension (100, 100, 16, 16) which gives me a different 16x16 array for points on a 100x100 grid.
I also have some eigenvalues and vectors where vals has the dimension (100, 100, 16) and vecs (100, 100, 16, 16) where vecs[x, y, :, i] would be the ith eigenvector of the matrix at the point (x, y) corresponding to the ith eigenvalue vals[x, y, i].
Now I want to take the first eigenvector of the array at ALL points on the grid, do a matrix product with the test matrix and then do a scalar product of the resulting vector with all the other eigenvectors of the array at all points on the grid and sum them.
The resulting array should have the dimension (100, 100). After this I would like to take the 2nd eigenvector of the array, matrix multiply it with test and then take the scalar product of the result with all the eigenvectors that is not the 2nd and so on so that in the end I have 16 (100, 100) or rather a (100, 100, 16) array. I only succeded sofar with a lot of for loops which I would like to avoid, but using tensordot gives me the wrong dimension and I don't see how to pick the axis which is vectorized along for the np.dot function.
I heard that einsum might be suitable to this task, but everything that doesn't rely on the python loops is fine by me.
import numpy as np
from numpy import linalg as la
test = np.arange(16*16*100*100).reshape((100, 100, 16, 16))
vals, vecs = la.eig(test + 1)
np.tensordot(vecs, test, axes=[2, 3]).shape
>>> (10, 10, 16, 10, 10, 16)
EDIT: Ok, so I used np.einsum to get a correct intermediate result.
np.einsum('ijkl, ijkm -> ijlm', vecs, test)
But in the next step I want to do the scalarproduct only with all the other entries of vec. Can I implement maybe some inverse Kronecker delta in this einsum formalism? Or should I switch back to the usual numpy now?
Ok, I played around and with np.einsum I found a way to do what is described above. A nice feature of einsum is that if you repeat doubly occuring indices in the 'output' (so right of the '->'-thing) you can have element-wise multiplication along some and contraction along some other axes (something that you don't have in handwritten tensor algebra notation).
result = np.einsum('ijkl, ijlm -> ijkm', np.einsum('ijkl, ijkm -> ijlm', vecs, test), vecs)
This nearly does the trick. Now only the diagonal terms have to be taken out. We could do this by just substracting the diagonal terms like this:
result = result - result * np.eye(np.shape(test)[-1])[None, None, ...]
I'm trying to implement the n-mode tensor-matrix product (as defined by Kolda and Bader: https://www.sandia.gov/~tgkolda/pubs/pubfiles/SAND2007-6702.pdf) efficiently in Python using Numpy. The operation effectively gets down to (for matrix U, tensor X and axis/mode k):
Extract all vectors along axis k from X by collapsing all other axes.
Multiply these vectors on the left by U using standard matrix multiplication.
Insert the vectors again into the output tensor using the same shape, apart from X.shape[k], which is now equal to U.shape[0] (initially, X.shape[k] must be equal to U.shape[1], as a result of the matrix multiplication).
I've been using an explicit implementation for a while which performs all these steps separately:
Transpose the tensor to bring axis k to the front (in my full code I added an exception in case k == X.ndim - 1, in which case it's faster to leave it there and transpose all future operations, or at least in my application, but that's not relevant here).
Reshape the tensor to collapse all other axes.
Calculate the matrix multiplication.
Reshape the tensor to reconstruct all other axes.
Transpose the tensor back into the original order.
I would think this implementation creates a lot of unnecessary (big) arrays, so once I discovered np.einsum I thought this would speed things up considerably. However using the code below I got worse results:
import numpy as np
from time import time
def mode_k_product(U, X, mode):
transposition_order = list(range(X.ndim))
transposition_order[mode] = 0
transposition_order[0] = mode
Y = np.transpose(X, transposition_order)
transposed_ranks = list(Y.shape)
Y = np.reshape(Y, (Y.shape[0], -1))
Y = U # Y
transposed_ranks[0] = Y.shape[0]
Y = np.reshape(Y, transposed_ranks)
Y = np.transpose(Y, transposition_order)
return Y
def einsum_product(U, X, mode):
axes1 = list(range(X.ndim))
axes1[mode] = X.ndim + 1
axes2 = list(range(X.ndim))
axes2[mode] = X.ndim
return np.einsum(U, [X.ndim, X.ndim + 1], X, axes1, axes2, optimize=True)
def test_correctness():
A = np.random.rand(3, 4, 5)
for i in range(3):
B = np.random.rand(6, A.shape[i])
X = mode_k_product(B, A, i)
Y = einsum_product(B, A, i)
print(np.allclose(X, Y))
def test_time(method, amount):
U = np.random.rand(256, 512)
X = np.random.rand(512, 512, 256)
start = time()
for i in range(amount):
method(U, X, 1)
return (time() - start)/amount
def test_times():
print("Explicit:", test_time(mode_k_product, 10))
print("Einsum:", test_time(einsum_product, 10))
test_correctness()
test_times()
Timings for me:
Explicit: 3.9450525522232054
Einsum: 15.873924326896667
Is this normal or am I doing something wrong? I know there are circumstances where storing intermediate results can decrease complexity (e.g. chained matrix multiplication), however in this case I can't think of any calculations that are being repeated. Is matrix multiplication so optimized that it removes the benefits of not transposing (which technically has a lower complexity)?
I'm more familiar with the subscripts style of using einsum, so worked out these equivalences:
In [194]: np.allclose(np.einsum('ij,jkl->ikl',B0,A), einsum_product(B0,A,0))
Out[194]: True
In [195]: np.allclose(np.einsum('ij,kjl->kil',B1,A), einsum_product(B1,A,1))
Out[195]: True
In [196]: np.allclose(np.einsum('ij,klj->kli',B2,A), einsum_product(B2,A,2))
Out[196]: True
With a mode parameter, your approach in einsum_product may be best. But the equivalences help me visualize the calculation better, and may help others.
Timings should basically be the same. There's an extra setup time in einsum_product that should disappear in larger dimensions.
After updating Numpy, Einsum is only slightly slower than the explicit method, with or without multi-threading (see comments to my question).
I am wondering how to use np.reshape to reshape a long vector into n columns array without giving the row numbers.
Normally I can find out the row number by len(a)//n:
a = np.arange(0, 10)
n = 2
b = a.reshape(len(a)//n,n)
If there a more direct way without using len(a)//n?
You can use -1 on one dimension, numpy will figure out what this number should be:
a = np.arange(0, 10)
n = 2
b = a.reshape(-1, n)
The doc is pretty clear about this feature: https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html
One shape dimension can be -1. In this case, the value is inferred
from the length of the array and remaining dimensions.