Efficient sparse matrix multiplication special case - python

Suppose that I have a sparse matrix X with dimensions (N*J) x K. I want to compute the following sum:
sum_Xi = 0
for i in range(N):
Xi = X[i*J:(i+1)*J,:] # (J x K)
Xi_sum = Xi.sum(axis=0) # (1 x K)
temp = Xi_sum.transpose().dot(Xi_sum) # (K x K)
sum_Xi += temp
That is, my sparse matrix has this "block" structure with N blocks of dimension J x K. The end result of this sum is K x K. Obviously, the sum above is very inefficient, but doesn't rely on any intermediate (and potentially large) matrices.
My current approach is the following
V = csr_matrix((np.ones(N*J), (np.repeat(range(N), J), range(N*J)))) # (N x (N*J))
temp = V.dot(X) # (N x K)
sum_Xi = temp.transpose().dot(temp).toarray(). # (K x K)
which is significantly faster, but K <<< N < N*J so I am not thrilled with having this V and V.dot(X) sit in memory when the end result is so much smaller.
Any advice?
Thank you in advance!!
Edit: making V csr instead of csc is obvious improvement, changing above.


Fastest Way to Compute Element-wise Difference Between Every Two Vectors

Say, I have a matrix: v_mat=[v_1, v_2, ... v_N] where v_i is the i-th vector. Here v_i is drawn from R^N. I am looking to calculate:
result = np.zeros((N, N, N))
for i in range(N):
for j in range(N):
for k in range(N):
result[i, j, k] = v[i, j] - v[i, k]
Essentially, I want to efficiently determine the difference between every pair of elements in a vector, for N vectors.
I have tried:
v_diff = v_mat.T[:, :, np.newaxis] - v_mat.T[:, np.newaxis, :]
where v_diff is a NxNxN matrix, and index ijk represents:
i: i-th vector
j: j-th eleemnt
k: k-th element,
so v[i,j,k] = v[i, j] - v[i, k]
I tested the execution time of this function:
On: 10
v_diff_mat_time: 3.783032298088074e-05 seconds
On: 100
v_diff_mat_time: 0.002480798400938511 seconds
On: 1000
v_diff_mat_time: 47.49346733186394 seconds
On: 10000
MemoryError: Unable to allocate 7.28 TiB for an array with shape (10000, 10000, 10000) and data type float64
There seem to be two problems. 1) the run-time of calculating v_diff is growing exponentially 2) the space needed to store v_diff is also growing.
I think I can optimize 1) since v[i,j] - v[i,k] = -(v[i,k] - v[i,j]). Therefore, I need only compute half of i,j. I am not sure how to write this in numpy though.
I am not sure how to address 2), and would appreciate any suggestions on the matter.

Is there a way to speed up these nested loops (Laplacian case) Python?

I am trying to speed up the nested loop in my function Gram.
My function that is causing a big delay is the Laplacian (Abel) because it requires to calculate for each cell of the matrix the norm of a column by a row.
abel = lambda x,y,t,p: np.exp(-np.abs(p) * np.linalg.norm(x-y))
def Gram(X,Y,function,t,p):
n = X.shape[0]
s = Y.shape[0]
K = np.zeros((n,s))
if function==abel:
for i in range(n):
for j in range(s):
K[i,j] = abel(X[i,:],Y[j,:],t,p)
K = polynomial(X,Y,t,p)
return K
I was able to speed up the function a bit by keeping the exponential part out of the abel equation and then I apply it for the whole matrix.
abel_2 = lambda x,y,t,p: np.linalg.norm(x-y) (don't mind the t and p).
def Gram_2(X,Y,function,t,p):
n = X.shape[0]
s = Y.shape[0]
K = np.zeros((n,s))
if function==abel_2:
for i in range(n):
for j in range(s):
K[i,j] = abel_2(X[i,:],Y[j,:],0,0)
K = np.exp(-abs(p)*K)
K = polynomial(X,Y,t,p)
return K
The time is reduced by 50%, however, the double loops (nested) are still a major problem, I believe.
Can someone help with this?
Thank you!
Basically, instead of going through the loops one by one to subtract X[i,:] from Y[j,:], it would save tons of time of just selecting X[i,:] and subtracting it from all Y, then applying the norm on a certain axis!
In my case it was axis=1.
def Gram_10(X,Y,function,t,p):
n = X.shape[0]
s = Y.shape[0]
K = np.zeros((n,s))
if function==abel_2:
for i in range(n):
# it is important to put the correct slice (:s) , so the matrix provided by the norm goes
# to the right place in the function
K[i,:s] = np.linalg.norm(X[i,:]-Y,axis=1)
K = np.exp(-abs(p)*K)
K = polynomial(X,Y,t,p)
return K

Quick way to divide matrix entries K_ij by K_ii*K_jj in Python

In Python, I have a matrix K of dimensions (N x N). I want to normalize K by dividing every entry K_ij by sqrt(K_(i,i)*K_(j,j)). What is a fast way to achieve this in Python without iterating through every entry?
My current solution is:
import numpy as np
K = np.random.rand(3,3)
diag = np.diag(K)
for i in range(np.shape(K)[0]):
for j in range(np.shape(K)[1]):
K[i,j] = K[i,j]/np.sqrt(diag[i]*diag[j])
Of course you have to iterate through every entry, at least internally. For square matrices:
K / np.sqrt(np.einsum('ii,jj->ij', K, K))
If the matrix is not square, you first have to define what should replace the "missing" values K[i,i] where i > j etc.
Alternative: use numba to leave your loop as is, get free speedup, and even avoid intermediate allocation:
def normalize(K):
M = np.empty_like(K)
m, n = K.shape
for i in range(m):
Kii = K[i,i]
for j in range(n):
Kjj = K[j,j]
M[i,j] = K[i,j] / np.sqrt(Kii * Kjj)
return M

how to create a matrix from combinations of elements from two vectors in tensorflow

I have two vectors X = [a,b,c,d] and Y = [m,n,o]. I'd like to construct a matrix M where each element is an operation on each pair from X and Y. i.e.
M[j,i] = f(X[i], Y[j])
# e.g. where f(x,y) = x-y:
M :=
a-m b-m c-m d-m
a-n b-n c-n d-n
a-o b-o c-o d-o
I imagine I could do this with two tf.while_loop(), but that seems inefficient, I was wondering if there is a more compact and parallel way of doing this.
P.S. There is a slight complication that X and Y are in fact not vectors, but R2. i.e. each element in X and Y is itself a fixed length vector, and f(X, Y) performs f() element wise. Plus there is a batch component too.
X.shape => [BATCH, I, K]
Y.shape => [BATCH, J, K]
M[batch, j, i, k] = f( X[batch, i, k], Y[batch, j, k] )
# e.g.:
= X[batch, i, k] - Y[batch, j, k]
this is using the python API btw
I found a way of doing this by increasing rank and using broadcasting. I still don't know if this is the most efficient way of doing it, but it's a heck of a lot better than using tf.while_loop I guess! I'm still open to suggestions / improvements.
X_expand = tf.expand_dims(X, 1)
Y_expand = tf.expand_dims(Y, 2)
# now I think M = f(X,Y) will broadcast each tensor to the higher dimension on each axis duplicating the data e.g.:
M = X-Y

Numpy.Cov of a Large Nx3 Array Produces MemoryError

I have a large 2D array of size Nx3. This array contains point cloud data in (X,Y,Z) format. I am using Python in Ubuntu in a virtual environment to read data from a .ply file.
When I am trying to find the covariance of this array with rowvar set to True (meaning each row being considered a variable), I am getting MemoryError.
I understand that this is creating a very large array, apparently too large for my 8 Gb allocated memory to handle. Without increasing memory allocation, is there a different way of getting around this issue? Are there different methods of calculating the covariance matrix elements so that the memory is not overloaded?
You could chop it up in a loop and keep the upper triangle only.
import numpy as np
N = 23000
a = np.random.random((N, 3))
c = a - a.mean(axis=-1, keepdims=True)
out = np.empty((N*(N+1) // 2,))
def ravel_triu(i, j, n):
i, j = np.where(i>j, np.broadcast_arrays(j, i), np.broadcast_arrays(i, j))
return i*n - i*(i+1) // 2 + j
def unravel_triu(k, n):
i = n - (0.5 + np.sqrt(n*(n+1) - 2*k - 1)).astype(int)
return i, k - (i*n - i*(i+1) // 2)
ii, jj = np.ogrid[:N, :N]
for j in range(0, N, 500):
out[ravel_triu(j, j, N):ravel_triu(min(N, j+500), min(N, j+500), N)] \
= np.einsum(
'i...k,...jk->ij', c[j:j+500], c[j:]) [ii[j:j+500] <= jj[:, j:]]
Obviously your covariances will be quite undersampled and the covariance matrix highly rank-defective...
