Say, I have a matrix: v_mat=[v_1, v_2, ... v_N] where v_i is the i-th vector. Here v_i is drawn from R^N. I am looking to calculate:
result = np.zeros((N, N, N))
for i in range(N):
for j in range(N):
for k in range(N):
result[i, j, k] = v[i, j] - v[i, k]
Essentially, I want to efficiently determine the difference between every pair of elements in a vector, for N vectors.
I have tried:
v_diff = v_mat.T[:, :, np.newaxis] - v_mat.T[:, np.newaxis, :]
where v_diff is a NxNxN matrix, and index ijk represents:
i: i-th vector
j: j-th eleemnt
k: k-th element,
so v[i,j,k] = v[i, j] - v[i, k]
I tested the execution time of this function:
On: 10
v_diff_mat_time: 3.783032298088074e-05 seconds
On: 100
v_diff_mat_time: 0.002480798400938511 seconds
On: 1000
v_diff_mat_time: 47.49346733186394 seconds
On: 10000
MemoryError: Unable to allocate 7.28 TiB for an array with shape (10000, 10000, 10000) and data type float64
There seem to be two problems. 1) the run-time of calculating v_diff is growing exponentially 2) the space needed to store v_diff is also growing.
I think I can optimize 1) since v[i,j] - v[i,k] = -(v[i,k] - v[i,j]). Therefore, I need only compute half of i,j. I am not sure how to write this in numpy though.
I am not sure how to address 2), and would appreciate any suggestions on the matter.
Related
Suppose that I have a sparse matrix X with dimensions (N*J) x K. I want to compute the following sum:
sum_Xi = 0
for i in range(N):
Xi = X[i*J:(i+1)*J,:] # (J x K)
Xi_sum = Xi.sum(axis=0) # (1 x K)
temp = Xi_sum.transpose().dot(Xi_sum) # (K x K)
sum_Xi += temp
That is, my sparse matrix has this "block" structure with N blocks of dimension J x K. The end result of this sum is K x K. Obviously, the sum above is very inefficient, but doesn't rely on any intermediate (and potentially large) matrices.
My current approach is the following
V = csr_matrix((np.ones(N*J), (np.repeat(range(N), J), range(N*J)))) # (N x (N*J))
temp = V.dot(X) # (N x K)
sum_Xi = temp.transpose().dot(temp).toarray(). # (K x K)
which is significantly faster, but K <<< N < N*J so I am not thrilled with having this V and V.dot(X) sit in memory when the end result is so much smaller.
Any advice?
Thank you in advance!!
Edit: making V csr instead of csc is obvious improvement, changing above.
In Python, I have a matrix K of dimensions (N x N). I want to normalize K by dividing every entry K_ij by sqrt(K_(i,i)*K_(j,j)). What is a fast way to achieve this in Python without iterating through every entry?
My current solution is:
import numpy as np
K = np.random.rand(3,3)
diag = np.diag(K)
for i in range(np.shape(K)[0]):
for j in range(np.shape(K)[1]):
K[i,j] = K[i,j]/np.sqrt(diag[i]*diag[j])
Of course you have to iterate through every entry, at least internally. For square matrices:
K / np.sqrt(np.einsum('ii,jj->ij', K, K))
If the matrix is not square, you first have to define what should replace the "missing" values K[i,i] where i > j etc.
Alternative: use numba to leave your loop as is, get free speedup, and even avoid intermediate allocation:
#njit
def normalize(K):
M = np.empty_like(K)
m, n = K.shape
for i in range(m):
Kii = K[i,i]
for j in range(n):
Kjj = K[j,j]
M[i,j] = K[i,j] / np.sqrt(Kii * Kjj)
return M
I have two vectors X = [a,b,c,d] and Y = [m,n,o]. I'd like to construct a matrix M where each element is an operation on each pair from X and Y. i.e.
M[j,i] = f(X[i], Y[j])
# e.g. where f(x,y) = x-y:
M :=
a-m b-m c-m d-m
a-n b-n c-n d-n
a-o b-o c-o d-o
I imagine I could do this with two tf.while_loop(), but that seems inefficient, I was wondering if there is a more compact and parallel way of doing this.
P.S. There is a slight complication that X and Y are in fact not vectors, but R2. i.e. each element in X and Y is itself a fixed length vector, and f(X, Y) performs f() element wise. Plus there is a batch component too.
I.e.
X.shape => [BATCH, I, K]
Y.shape => [BATCH, J, K]
M[batch, j, i, k] = f( X[batch, i, k], Y[batch, j, k] )
# e.g.:
= X[batch, i, k] - Y[batch, j, k]
this is using the python API btw
I found a way of doing this by increasing rank and using broadcasting. I still don't know if this is the most efficient way of doing it, but it's a heck of a lot better than using tf.while_loop I guess! I'm still open to suggestions / improvements.
X_expand = tf.expand_dims(X, 1)
Y_expand = tf.expand_dims(Y, 2)
# now I think M = f(X,Y) will broadcast each tensor to the higher dimension on each axis duplicating the data e.g.:
M = X-Y
I have a large 2D array of size Nx3. This array contains point cloud data in (X,Y,Z) format. I am using Python in Ubuntu in a virtual environment to read data from a .ply file.
When I am trying to find the covariance of this array with rowvar set to True (meaning each row being considered a variable), I am getting MemoryError.
I understand that this is creating a very large array, apparently too large for my 8 Gb allocated memory to handle. Without increasing memory allocation, is there a different way of getting around this issue? Are there different methods of calculating the covariance matrix elements so that the memory is not overloaded?
You could chop it up in a loop and keep the upper triangle only.
import numpy as np
N = 23000
a = np.random.random((N, 3))
c = a - a.mean(axis=-1, keepdims=True)
out = np.empty((N*(N+1) // 2,))
def ravel_triu(i, j, n):
i, j = np.where(i>j, np.broadcast_arrays(j, i), np.broadcast_arrays(i, j))
return i*n - i*(i+1) // 2 + j
def unravel_triu(k, n):
i = n - (0.5 + np.sqrt(n*(n+1) - 2*k - 1)).astype(int)
return i, k - (i*n - i*(i+1) // 2)
ii, jj = np.ogrid[:N, :N]
for j in range(0, N, 500):
out[ravel_triu(j, j, N):ravel_triu(min(N, j+500), min(N, j+500), N)] \
= np.einsum(
'i...k,...jk->ij', c[j:j+500], c[j:]) [ii[j:j+500] <= jj[:, j:]]
Obviously your covariances will be quite undersampled and the covariance matrix highly rank-defective...
I'm currently writting code where I need to compute as fast as possible a kind of inner product between three 2-D arrays.
Let's call them a,b,c. They all have the same size (N x M).
I want to compute the following 3-d array, op, of size (N x N x N), such that op[i, j, k] is the sum over m of the a[i, m] b[j, m] c[k, m]
(click here for the nice Latex formula)
This is basically an extend version of np.inner to 3 inputs rather than 2.
In practice, the dimensions I will run into are something like N = 100 and M = 300 000. The matrices are not going to be sparse at all, so op contains about 1 million nonzero values.
So far, I've attempted two methods.
The first one uses broadcasting:
import numpy as np
N = 100
M = 300000
a = np.random.randn(N, M)
b = np.random.randn(N, M)
c = np.random.randn(N, M)
def method1(a, b, c):
a_i = a[:, None, None, :]
b_j = b[None, :, None, :]
c_k = c[None, None, :, :]
return np.sum(a_i * b_j * c_k, axis=3)
The problem with this is that it first computes a_i * b_j * c_k which is an N x N x N x M array, so in my case it is simply too much to handle.
I've tried another method using np.einsum, and it is much faster than the previous method:
def method2(a, b, c):
return np.einsum('im,jm,km', a, b, c)
My problem is that it is still too slow. For N = 100 and M = 30 000, it already takes 95 seconds to run on my computer, so taking M to its actual value of 300 000 is impossible.
My question is: do you know any pythonic way to solve my problem (maybe a magic numpy function?), or do I have to resort to things like cython or numba to actually make this computation feasible?
Thanks in advance for any help!
Very interesting one and related to this other problem.
Approach #1: For decent size arrays
Based on the winning approach there at the above mentioned Q&A, here's one solution -
np.tensordot(a[:,None]*b,c,axes=(2,1))
Explanation :
1) a[:,None]*b : Get a 3D array of shape (N, N, M). So, for the use case, it would be (100, 100, 30000), which might be a bit too much for regular systems, but might just work out given some extra system memory juice.
2) np.tensordot(..): Next up, we would sum-reduce that last axis from previous step with tensor-dot against the third array c to have a (100, 100, 100) shaped output array.
Approach #2: For very large arrays and with b identical to c
out = np.zeros((N, N, N))
for i in range(N):
for j in range(N):
for k in range(j+1):
out[i,j,k] = np.einsum('i,i,i->',a[i],b[j],b[k])
r,c = np.triu_indices(N,1)
out[np.arange(N)[:,None], r,c] = out[np.arange(N)[:,None], c,r]