Numpy.Cov of a Large Nx3 Array Produces MemoryError - python

I have a large 2D array of size Nx3. This array contains point cloud data in (X,Y,Z) format. I am using Python in Ubuntu in a virtual environment to read data from a .ply file.
When I am trying to find the covariance of this array with rowvar set to True (meaning each row being considered a variable), I am getting MemoryError.
I understand that this is creating a very large array, apparently too large for my 8 Gb allocated memory to handle. Without increasing memory allocation, is there a different way of getting around this issue? Are there different methods of calculating the covariance matrix elements so that the memory is not overloaded?

You could chop it up in a loop and keep the upper triangle only.
import numpy as np
N = 23000
a = np.random.random((N, 3))
c = a - a.mean(axis=-1, keepdims=True)
out = np.empty((N*(N+1) // 2,))
def ravel_triu(i, j, n):
i, j = np.where(i>j, np.broadcast_arrays(j, i), np.broadcast_arrays(i, j))
return i*n - i*(i+1) // 2 + j
def unravel_triu(k, n):
i = n - (0.5 + np.sqrt(n*(n+1) - 2*k - 1)).astype(int)
return i, k - (i*n - i*(i+1) // 2)
ii, jj = np.ogrid[:N, :N]
for j in range(0, N, 500):
out[ravel_triu(j, j, N):ravel_triu(min(N, j+500), min(N, j+500), N)] \
= np.einsum(
'i...k,...jk->ij', c[j:j+500], c[j:]) [ii[j:j+500] <= jj[:, j:]]
Obviously your covariances will be quite undersampled and the covariance matrix highly rank-defective...

Related

Fastest Way to Compute Element-wise Difference Between Every Two Vectors

Say, I have a matrix: v_mat=[v_1, v_2, ... v_N] where v_i is the i-th vector. Here v_i is drawn from R^N. I am looking to calculate:
result = np.zeros((N, N, N))
for i in range(N):
for j in range(N):
for k in range(N):
result[i, j, k] = v[i, j] - v[i, k]
Essentially, I want to efficiently determine the difference between every pair of elements in a vector, for N vectors.
I have tried:
v_diff = v_mat.T[:, :, np.newaxis] - v_mat.T[:, np.newaxis, :]
where v_diff is a NxNxN matrix, and index ijk represents:
i: i-th vector
j: j-th eleemnt
k: k-th element,
so v[i,j,k] = v[i, j] - v[i, k]
I tested the execution time of this function:
On: 10
v_diff_mat_time: 3.783032298088074e-05 seconds
On: 100
v_diff_mat_time: 0.002480798400938511 seconds
On: 1000
v_diff_mat_time: 47.49346733186394 seconds
On: 10000
MemoryError: Unable to allocate 7.28 TiB for an array with shape (10000, 10000, 10000) and data type float64
There seem to be two problems. 1) the run-time of calculating v_diff is growing exponentially 2) the space needed to store v_diff is also growing.
I think I can optimize 1) since v[i,j] - v[i,k] = -(v[i,k] - v[i,j]). Therefore, I need only compute half of i,j. I am not sure how to write this in numpy though.
I am not sure how to address 2), and would appreciate any suggestions on the matter.

Efficient sparse matrix multiplication special case

Suppose that I have a sparse matrix X with dimensions (N*J) x K. I want to compute the following sum:
sum_Xi = 0
for i in range(N):
Xi = X[i*J:(i+1)*J,:] # (J x K)
Xi_sum = Xi.sum(axis=0) # (1 x K)
temp = Xi_sum.transpose().dot(Xi_sum) # (K x K)
sum_Xi += temp
That is, my sparse matrix has this "block" structure with N blocks of dimension J x K. The end result of this sum is K x K. Obviously, the sum above is very inefficient, but doesn't rely on any intermediate (and potentially large) matrices.
My current approach is the following
V = csr_matrix((np.ones(N*J), (np.repeat(range(N), J), range(N*J)))) # (N x (N*J))
temp = V.dot(X) # (N x K)
sum_Xi = temp.transpose().dot(temp).toarray(). # (K x K)
which is significantly faster, but K <<< N < N*J so I am not thrilled with having this V and V.dot(X) sit in memory when the end result is so much smaller.
Any advice?
Thank you in advance!!
Edit: making V csr instead of csc is obvious improvement, changing above.

Quick way to divide matrix entries K_ij by K_ii*K_jj in Python

In Python, I have a matrix K of dimensions (N x N). I want to normalize K by dividing every entry K_ij by sqrt(K_(i,i)*K_(j,j)). What is a fast way to achieve this in Python without iterating through every entry?
My current solution is:
import numpy as np
K = np.random.rand(3,3)
diag = np.diag(K)
for i in range(np.shape(K)[0]):
for j in range(np.shape(K)[1]):
K[i,j] = K[i,j]/np.sqrt(diag[i]*diag[j])
Of course you have to iterate through every entry, at least internally. For square matrices:
K / np.sqrt(np.einsum('ii,jj->ij', K, K))
If the matrix is not square, you first have to define what should replace the "missing" values K[i,i] where i > j etc.
Alternative: use numba to leave your loop as is, get free speedup, and even avoid intermediate allocation:
#njit
def normalize(K):
M = np.empty_like(K)
m, n = K.shape
for i in range(m):
Kii = K[i,i]
for j in range(n):
Kjj = K[j,j]
M[i,j] = K[i,j] / np.sqrt(Kii * Kjj)
return M

How to increase speed while maintaining memory with numpy arrays?

I need to write a code to do a one-sample t-test given the sample mean (E(X)) and sample second raw moment (E(X^2)) for each entry in a 2-dimensional array.
There are two ways I am doing this but both of them are not quite working.
With numpy vetorized operations - out of memory error for certain sizes of the array.
def calc_normal_pvals(vt_sum_counter, vt_ssum_counter):
global nsubs
vt_sum_counter = vt_sum_counter/nsubs
vt_ssum_counter = vt_ssum_counter/nsubs
sample_var = nsubs * (vt_ssum_counter - np.square(vt_sum_counter))/(nsubs - 1)
t_array = np.divide(vt_sum_counter, (np.sqrt(sample_var/nsubs)))
pvals = t.sf(t_array, nsubs-1)
pvals[np.isnan(pvals)] = 0
return pvals
Normal for loop method - takes a lot of time in comparison
def calc_normal_pvals(vt_sum_counter, vt_ssum_counter, tail=1):
global nsubs
V, T = vt_sum_counter.shape
pvals = np.zeros((V, T))
for i in range(V):
for j in range(T):
sigma = ((vt_ssum_counter[i, j]/nsubs -(vt_sum_counter[i,j]/nsubs)**2)/(nsubs - 1))**0.5
if (sigma != 0):
pvals[i, j] = t.sf(vt_sum_counter[i, j]/(nsubs*sigma), nsubs-1)
return pvals
The input arrays are huge - typically of size ~ 900000 X 400.

Fast inner product of more than two matrices in python

I'm currently writting code where I need to compute as fast as possible a kind of inner product between three 2-D arrays.
Let's call them a,b,c. They all have the same size (N x M).
I want to compute the following 3-d array, op, of size (N x N x N), such that op[i, j, k] is the sum over m of the a[i, m] b[j, m] c[k, m]
(click here for the nice Latex formula)
This is basically an extend version of np.inner to 3 inputs rather than 2.
In practice, the dimensions I will run into are something like N = 100 and M = 300 000. The matrices are not going to be sparse at all, so op contains about 1 million nonzero values.
So far, I've attempted two methods.
The first one uses broadcasting:
import numpy as np
N = 100
M = 300000
a = np.random.randn(N, M)
b = np.random.randn(N, M)
c = np.random.randn(N, M)
def method1(a, b, c):
a_i = a[:, None, None, :]
b_j = b[None, :, None, :]
c_k = c[None, None, :, :]
return np.sum(a_i * b_j * c_k, axis=3)
The problem with this is that it first computes a_i * b_j * c_k which is an N x N x N x M array, so in my case it is simply too much to handle.
I've tried another method using np.einsum, and it is much faster than the previous method:
def method2(a, b, c):
return np.einsum('im,jm,km', a, b, c)
My problem is that it is still too slow. For N = 100 and M = 30 000, it already takes 95 seconds to run on my computer, so taking M to its actual value of 300 000 is impossible.
My question is: do you know any pythonic way to solve my problem (maybe a magic numpy function?), or do I have to resort to things like cython or numba to actually make this computation feasible?
Thanks in advance for any help!
Very interesting one and related to this other problem.
Approach #1: For decent size arrays
Based on the winning approach there at the above mentioned Q&A, here's one solution -
np.tensordot(a[:,None]*b,c,axes=(2,1))
Explanation :
1) a[:,None]*b : Get a 3D array of shape (N, N, M). So, for the use case, it would be (100, 100, 30000), which might be a bit too much for regular systems, but might just work out given some extra system memory juice.
2) np.tensordot(..): Next up, we would sum-reduce that last axis from previous step with tensor-dot against the third array c to have a (100, 100, 100) shaped output array.
Approach #2: For very large arrays and with b identical to c
out = np.zeros((N, N, N))
for i in range(N):
for j in range(N):
for k in range(j+1):
out[i,j,k] = np.einsum('i,i,i->',a[i],b[j],b[k])
r,c = np.triu_indices(N,1)
out[np.arange(N)[:,None], r,c] = out[np.arange(N)[:,None], c,r]

Categories