How to increase speed while maintaining memory with numpy arrays?

How to increase speed while maintaining memory with numpy arrays? - python

I need to write a code to do a one-sample t-test given the sample mean (E(X)) and sample second raw moment (E(X^2)) for each entry in a 2-dimensional array.
There are two ways I am doing this but both of them are not quite working.
With numpy vetorized operations - out of memory error for certain sizes of the array.
def calc_normal_pvals(vt_sum_counter, vt_ssum_counter):
global nsubs
vt_sum_counter = vt_sum_counter/nsubs
vt_ssum_counter = vt_ssum_counter/nsubs
sample_var = nsubs * (vt_ssum_counter - np.square(vt_sum_counter))/(nsubs - 1)
t_array = np.divide(vt_sum_counter, (np.sqrt(sample_var/nsubs)))
pvals = t.sf(t_array, nsubs-1)
pvals[np.isnan(pvals)] = 0
return pvals
Normal for loop method - takes a lot of time in comparison
def calc_normal_pvals(vt_sum_counter, vt_ssum_counter, tail=1):
global nsubs
V, T = vt_sum_counter.shape
pvals = np.zeros((V, T))
for i in range(V):
for j in range(T):
sigma = ((vt_ssum_counter[i, j]/nsubs -(vt_sum_counter[i,j]/nsubs)**2)/(nsubs - 1))**0.5
if (sigma != 0):
pvals[i, j] = t.sf(vt_sum_counter[i, j]/(nsubs*sigma), nsubs-1)
return pvals
The input arrays are huge - typically of size ~ 900000 X 400.

Related

Speed up distance calculations, sliding window

I have two time series A and B. A with length m and B with length n. m << n. Both have the dimension d.
I calculate the distance between A and all subsequencies in B by sliding A over B.
In python the code looks like this.
def sliding_dist(A,B)
n = len(B)
dist = np.zeros(n)
for i in range(n-m):
subrange = B[i:i+m,:]
distance = np.linalg.norm(A-subrange)
dist[i] = distance
return dist
Now this code takes a lot of time to execute and I have very many calculations to do.
I need to speed up the calculations. My guess is that I could do this by using convolutions, and multiplication in frequency domain(FFT). However, I have been unable to implement it.
Any ideas? :) Thanks

norm(A - subrange) isn't a convolution in itself, but it may be expressed as:
sqrt(dot(A, A) + dot(subrange, subrange) - 2 * dot(A, subrange))
How to calculate each term fast:
dot(A, A) - this is just a constant.
dot(subrange, subrange) - this can be calculated in O(1) (per position) using a recursive approach.
dot(A, subrange) - this is a convolution in this context. So this can be calculated in the frequency domain via the convolution theorem.1
Note, however, that you're unlikely to see a performance improvement if the subrange size is only 10.
1. AKA fast convolution.

Implementation with matrix operations, like I mentioned in the comment. Idea is to evaluate norm step by step. In your case i'th value is:
d[i] = sqrt((A[0] - B[i])^2 + (A[1] - B[+1])^2 + ... + (A[m-1] - B[i+m-1])^2)
First three lines calculate sum of squares, and last line is doing sqrt().
Speed-up is ~60x.
import numpy
import time
def sliding_dist(A, B):
m = len(A)
n = len(B)
dist = numpy.zeros(n-m)
for i in range(n-m):
subrange = B[i:i+m]
distance = numpy.linalg.norm(A-subrange)
dist[i] = distance
return dist
def sd_2(A, B):
m = len(A)
dist = numpy.square(A[0] - B[:-m])
for i in range(1, m):
dist += numpy.square(A[i] - B[i:-m+i])
return numpy.sqrt(dist, out=dist)
A = numpy.random.rand(10)
B = numpy.random.rand(500)
x = 1000
t = time.time()
for _ in range(x):
d1 = sliding_dist(A, B)
t1 = time.time()
for _ in range(x):
d2 = sd_2(A, B)
t2 = time.time()
print numpy.allclose(d1, d2)
print 'Orig %0.3f ms, second approach %0.3f ms' % ((t1 - t) * 1000., (t2 - t1) * 1000.)
print 'Speedup ', (t1 - t) / (t2 - t1)
Update
This is 're-implementation' of norm you need in matrix operations. It is not flexible if you want some other norm that numpy offers. Different approach is possible, to create matrix of B sliding windows and make norm on that whole array, since norm() receives parameter axis. Here is implementation of that approach, but speed-up is ~40x, which is slower than previous.
def sd_3(A, B):
m = len(A)
n = len(B)
bb = numpy.empty((len(B) - m, m))
for i in range(m):
bb[:, i] = B[i:-m+i]
return numpy.linalg.norm(A - bb, axis=1)

Numpy.Cov of a Large Nx3 Array Produces MemoryError

I have a large 2D array of size Nx3. This array contains point cloud data in (X,Y,Z) format. I am using Python in Ubuntu in a virtual environment to read data from a .ply file.
When I am trying to find the covariance of this array with rowvar set to True (meaning each row being considered a variable), I am getting MemoryError.
I understand that this is creating a very large array, apparently too large for my 8 Gb allocated memory to handle. Without increasing memory allocation, is there a different way of getting around this issue? Are there different methods of calculating the covariance matrix elements so that the memory is not overloaded?

You could chop it up in a loop and keep the upper triangle only.
import numpy as np
N = 23000
a = np.random.random((N, 3))
c = a - a.mean(axis=-1, keepdims=True)
out = np.empty((N*(N+1) // 2,))
def ravel_triu(i, j, n):
i, j = np.where(i>j, np.broadcast_arrays(j, i), np.broadcast_arrays(i, j))
return i*n - i*(i+1) // 2 + j
def unravel_triu(k, n):
i = n - (0.5 + np.sqrt(n*(n+1) - 2*k - 1)).astype(int)
return i, k - (i*n - i*(i+1) // 2)
ii, jj = np.ogrid[:N, :N]
for j in range(0, N, 500):
out[ravel_triu(j, j, N):ravel_triu(min(N, j+500), min(N, j+500), N)] \
= np.einsum(
'i...k,...jk->ij', c[j:j+500], c[j:]) [ii[j:j+500] <= jj[:, j:]]
Obviously your covariances will be quite undersampled and the covariance matrix highly rank-defective...

How to create an array that can be accessed according to its indices in Numpy?

I am trying to solve the following problem via a Finite Difference Approximation in Python using NumPy:
$u_t = k \, u_{xx}$, on $0 < x < L$ and $t > 0$;
$u(0,t) = u(L,t) = 0$;
$u(x,0) = f(x)$.
I take $u(x,0) = f(x) = x^2$ for my problem.
Programming is not my forte so I need help with the implementation of my code. Here is my code (I'm sorry it is a bit messy, but not too bad I hope):
## This program is to implement a Finite Difference method approximation
## to solve the Heat Equation, u_t = k * u_xx,
## in 1D w/out sources & on a finite interval 0 < x < L. The PDE
## is subject to B.C: u(0,t) = u(L,t) = 0,
## and the I.C: u(x,0) = f(x).
import numpy as np
import matplotlib.pyplot as plt
# definition of initial condition function
def f(x):
return x^2
# parameters
L = 1
T = 10
N = 10
M = 100
s = 0.25
# uniform mesh
x_init = 0
x_end = L
dx = float(x_end - x_init) / N
#x = np.zeros(N+1)
x = np.arange(x_init, x_end, dx)
x[0] = x_init
# time discretization
t_init = 0
t_end = T
dt = float(t_end - t_init) / M
#t = np.zeros(M+1)
t = np.arange(t_init, t_end, dt)
t[0] = t_init
# Boundary Conditions
for m in xrange(0, M):
t[m] = m * dt
# Initial Conditions
for j in xrange(0, N):
x[j] = j * dx
# definition of solution to u_t = k * u_xx
u = np.zeros((N+1, M+1)) # NxM array to store values of the solution
# finite difference scheme
for j in xrange(0, N-1):
u[j][0] = x**2 #initial condition
for m in xrange(0, M):
for j in xrange(1, N-1):
if j == 1:
u[j-1][m] = 0 # Boundary condition
else:
u[j][m+1] = u[j][m] + s * ( u[j+1][m] - #FDM scheme
2 * u[j][m] + u[j-1][m] )
else:
if j == N-1:
u[j+1][m] = 0 # Boundary Condition
print u, t, x
#plt.plot(t, u)
#plt.show()
So the first issue I am having is I am trying to create an array/matrix to store values for the solution. I wanted it to be an NxM matrix, but in my code I made the matrix (N+1)x(M+1) because I kept getting an error that the index was going out of bounds. Anyways how can I make such a matrix using numpy.array so as not to needlessly take up memory by creating a (N+1)x(M+1) matrix filled with zeros?
Second, how can I "access" such an array? The real solution u(x,t) is approximated by u(x[j], t[m]) were j is the jth spatial value, and m is the mth time value. The finite difference scheme is given by:
u(x[j],t[m+1]) = u(x[j],t[m]) + s * ( u(x[j+1],t[m]) - 2 * u(x[j],t[m]) + u(x[j-1],t[m]) )
(See here for the formulation)
I want to be able to implement the Initial Condition u(x[j],t[0]) = x**2 for all values of j = 0,...,N-1. I also need to implement Boundary Conditions u(x[0],t[m]) = 0 = u(x[N],t[m]) for all values of t = 0,...,M. Is the nested loop I created the best way to do this? Originally I tried implementing the I.C. and B.C. under two different for loops which I used to calculate values of the matrices x and t (in my code I still have comments placed where I tried to do this)
I think I am just not using the right notation but I cannot find anywhere in the documentation for NumPy how to "call" such an array so at to iterate through each value in the proposed scheme. Can anyone shed some light on what I am doing wrong?
Any help is very greatly appreciated. This is not homework but rather to understand how to program FDM for Heat Equation because later I will use similar methods to solve the Black-Scholes PDE.
EDIT: So when I run my code on line 60 (the last "else" that I use) I get an error that says invalid syntax, and on line 51 (u[j][0] = x**2 #initial condition) I get an error that reads "setting an array element with a sequence." What does that mean?

Vectorizing for loop with repeated indices in python

I am trying to optimize a snippet that gets called a lot (millions of times) so any type of speed improvement (hopefully removing the for-loop) would be great.
I am computing a correlation function of some j'th particle with all others
C_j(|r-r'|) = sqrt(E((s_j(r')-s_k(r))^2)) averaged over k.
My idea is to have a variable corrfun which bins data into some bins (the r, defined elsewhere). I find what bin of r each s_k belongs to and this is stored in ind. So ind[0] is the index of r (and thus the corrfun) for which the j=0 point corresponds to. Multiple points can fall into the same bin (in fact I want bins to be big enough to contain multiple points) so I sum together all of the (s_j(r')-s_k(r))^2 and then divide by number of points in that bin (stored in variable rw). The code I ended up making for this is the following (np is for numpy):
for k, v in enumerate(ind):
if j==k:
continue
corrfun[v] += (s[k]-s[j])**2
rw[v] += 1
rw2 = rw
rw2[rw < 1] = 1
corrfun = np.sqrt(np.divide(corrfun, rw2))
Note, the rw2 business was because I want to avoid divide by 0 problems but I do return the rw array and I want to be able to differentiate between the rw=0 and rw=1 elements. Perhaps there is a more elegant solution for this as well.
Is there a way to make the for-loop faster? While I would like to not add the self interaction (j==k) I am even ok with having self interaction if it means I can get significantly faster calculation (length of ind ~ 1E6 so self interaction is probably insignificant anyways).
Thank you!
Ilya
Edit:
Here is the full code. Note, in the full code I am averaging over j as well.
import numpy as np
def twopointcorr(x,y,s,dr):
width = np.max(x)-np.min(x)
height = np.max(y)-np.min(y)
n = len(x)
maxR = np.sqrt((width/2)**2 + (height/2)**2)
r = np.arange(0, maxR, dr)
print(r)
corrfun = r*0
rw = r*0
print(maxR)
''' go through all points'''
for j in range(0, n-1):
hypot = np.sqrt((x[j]-x)**2+(y[j]-y)**2)
ind = [np.abs(r-h).argmin() for h in hypot]
for k, v in enumerate(ind):
if j==k:
continue
corrfun[v] += (s[k]-s[j])**2
rw[v] += 1
rw2 = rw
rw2[rw < 1] = 1
corrfun = np.sqrt(np.divide(corrfun, rw2))
return r, corrfun, rw
I debug test it the following way
from twopointcorr import twopointcorr
import numpy as np
import matplotlib.pyplot as plt
import time
n=1000
x = np.random.rand(n)
y = np.random.rand(n)
s = np.random.rand(n)
print('running two point corr functinon')
start_time = time.time()
r,corrfun,rw = twopointcorr(x,y,s,0.1)
print("--- Execution time is %s seconds ---" % (time.time() - start_time))
fig1=plt.figure()
plt.plot(r, corrfun,'-x')
fig2=plt.figure()
plt.plot(r, rw,'-x')
plt.show()
Again, the main issue is that in the real dataset n~1E6. I can resample to make it smaller, of course, but I would love to actually crank through the dataset.

Here is the code that use broadcast, hypot, round, bincount to remove all the loops:
def twopointcorr2(x, y, s, dr):
width = np.max(x)-np.min(x)
height = np.max(y)-np.min(y)
n = len(x)
maxR = np.sqrt((width/2)**2 + (height/2)**2)
r = np.arange(0, maxR, dr)
osub = lambda x:np.subtract.outer(x, x)
ind = np.clip(np.round(np.hypot(osub(x), osub(y)) / dr), 0, len(r)-1).astype(int)
rw = np.bincount(ind.ravel())
rw[0] -= len(x)
corrfun = np.bincount(ind.ravel(), (osub(s)**2).ravel())
return r, corrfun, rw
to compare, I modified your code as follows:
def twopointcorr(x,y,s,dr):
width = np.max(x)-np.min(x)
height = np.max(y)-np.min(y)
n = len(x)
maxR = np.sqrt((width/2)**2 + (height/2)**2)
r = np.arange(0, maxR, dr)
corrfun = r*0
rw = r*0
for j in range(0, n):
hypot = np.sqrt((x[j]-x)**2+(y[j]-y)**2)
ind = [np.abs(r-h).argmin() for h in hypot]
for k, v in enumerate(ind):
if j==k:
continue
corrfun[v] += (s[k]-s[j])**2
rw[v] += 1
return r, corrfun, rw
and here is the code to check the results:
import numpy as np
n=1000
x = np.random.rand(n)
y = np.random.rand(n)
s = np.random.rand(n)
r1, corrfun1, rw1 = twopointcorr(x,y,s,0.1)
r2, corrfun2, rw2 = twopointcorr2(x,y,s,0.1)
assert np.allclose(r1, r2)
assert np.allclose(corrfun1, corrfun2)
assert np.allclose(rw1, rw2)
and the %timeit results:
%timeit twopointcorr(x,y,s,0.1)
%timeit twopointcorr2(x,y,s,0.1)
outputs:
1 loop, best of 3: 5.16 s per loop
10 loops, best of 3: 134 ms per loop

Your original code on my system runs in about 5.7 seconds. I fully vectorized the inner loop and got it to run in 0.39 seconds. Simply replace your "go through all points" loop with this:
points = np.column_stack((x,y))
hypots = scipy.spatial.distance.cdist(points, points)
inds = np.rint(hypots.clip(max=maxR) / dr).astype(np.int)
# go through all points
for j in range(n): # n.b. previously n-1, not sure why
ind = inds[j]
np.add.at(corrfun, ind, (s - s[j])**2)
np.add.at(rw, ind, 1)
rw[ind[j]] -= 1 # subtract self
The first observation was that your hypot code was computing 2D distances, so I replaced that with cdist from SciPy to do it all in a single call. The second was that the inner for loop was slow, and thanks to an insightful comment from #hpaulj I vectorized that as well using np.add.at().
Since you asked how to vectorize the inner loop as well, I did that later. It now takes 0.25 seconds to run, for a total speedup of over 20x. Here's the final code:
points = np.column_stack((x,y))
hypots = scipy.spatial.distance.cdist(points, points)
inds = np.rint(hypots.clip(max=maxR) / dr).astype(np.int)
sn = np.tile(s, (n,1)) # n copies of s
diffs = (sn - sn.T)**2 # squares of pairwise differences
np.add.at(corrfun, inds, diffs)
rw = np.bincount(inds.flatten(), minlength=len(r))
np.subtract.at(rw, inds.diagonal(), 1) # subtract self
This uses more memory but does produce a substantial speedup vs. the single-loop version above.

Ok, so as it turns out outer products are incredibly memory expensive, however, using answers from #HYRY and #JohnZwinck i was able to make code that is still roughly linear in n in memory and computes fast (0.5 seconds for the test case)
import numpy as np
def twopointcorr(x,y,s,dr,maxR=-1):
width = np.max(x)-np.min(x)
height = np.max(y)-np.min(y)
n = len(x)
if maxR < dr:
maxR = np.sqrt((width/2)**2 + (height/2)**2)
r = np.arange(0, maxR+dr, dr)
corrfun = r*0
rw = r*0
for j in range(0, n):
ind = np.clip(np.round(np.hypot(x[j]-x,y[j]-y) / dr), 0, len(r)-1).astype(int)
np.add.at(corrfun, ind, (s - s[j])**2)
np.add.at(rw, ind, 1)
rw[0] -= n
corrfun = np.sqrt(np.divide(corrfun, np.maximum(rw,1)))
r=np.delete(r,-1)
rw=np.delete(rw,-1)
corrfun=np.delete(corrfun,-1)
return r, corrfun, rw

Build an approximately uniform grid from random sample (python)

I want to build a grid from sampled data. I could use a machine learning - clustering algorithm, like k-means, but I want to restrict the centres to be roughly uniformly distributed.
I have come up with an approach using the scikit-learn nearest neighbours search: pick a point at random, delete all points within radius r then repeat. This works well, but wondering if anyone has a better (faster) way of doing this.
In response to comments I have tried two alternate methods, one turns out much slower the other is about the same...
Method 0 (my first attempt):
def get_centers0(X, r):
N = X.shape[0]
D = X.shape[1]
grid = np.zeros([0,D])
nearest = near.NearestNeighbors(radius = r, algorithm = 'auto')
while N > 0:
nearest.fit(X)
x = X[int(random()*N), :]
_, del_x = nearest.radius_neighbors(x)
X = np.delete(X, del_x[0], axis = 0)
grid = np.vstack([grid, x])
N = X.shape[0]
return grid
Method 1 (using the precomputed graph):
def get_centers1(X, r):
N = X.shape[0]
D = X.shape[1]
grid = np.zeros([0,D])
nearest = near.NearestNeighbors(radius = r, algorithm = 'auto')
nearest.fit(X)
graph = nearest.radius_neighbors_graph(X)
#This method is very slow even before doing any 'pruning'
Method 2:
def get_centers2(X, r, k):
N = X.shape[0]
D = X.shape[1]
k = k
grid = np.zeros([0,D])
nearest = near.NearestNeighbors(radius = r, algorithm = 'auto')
while N > 0:
nearest.fit(X)
x = X[np.random.randint(0,N,k), :]
#min_dist = near.NearestNeighbors().fit(x).kneighbors(x, n_neighbors = 1, return_distance = True)
min_dist = dist(x, k, 2, np.ones(k)) # where dist is a cython compiled function
x = x[min_dist < 0.1,:]
_, del_x = nearest.radius_neighbors(x)
X = np.delete(X, del_x[0], axis = 0)
grid = np.vstack([grid, x])
N = X.shape[0]
return grid
Running these as follows:
N = 50000
r = 0.1
x1 = np.random.rand(N)
x2 = np.random.rand(N)
X = np.vstack([x1, x2]).T
tic = time.time()
grid0 = get_centers0(X, r)
toc = time.time()
print 'Method 0: ' + str(toc - tic)
tic = time.time()
get_centers1(X, r)
toc = time.time()
print 'Method 1: ' + str(toc - tic)
tic = time.time()
grid2 = get_centers2(X, r)
toc = time.time()
print 'Method 1: ' + str(toc - tic)
Method 0 and 2 are about the same...
Method 0: 0.840130090714
Method 1: 2.23365592957
Method 2: 0.774812936783

I'm not sure from the question exactly what you are trying to do. You mention wanting to create an "approximate grid", or a "uniform distribution", while the code you provide selects a subset of points such that no pairwise distance is greater than r.
A couple possible suggestions:
if what you want is an approximate grid, I would construct the grid you want to approximate, and then query for the nearest neighbor of each grid point. Depending on your application, you might further trim these results to cut-out points whose distance from the grid point is larger than is useful for you.
if what you want is an approximately uniform distribution drawn from among the points, I would do a kernel density estimate (sklearn.neighbors.KernelDensity) at each point, and do a randomized sub-selection from the dataset weighted by the inverse of the local density at each point.
if what you want is a subset of points such that no pairwise distance is greater than r, I would start by constructing a radius_neighbors_graph with radius r, which will, in one go, give you a list of all points which are too close together. You can then use a pruning algorithm similar to the one you wrote above to remove points based on these sparse graph distances.
I hope that helps!

I have come up with a very simple method which is much more efficient than my previous attempts.
This one simply loops over the data set and adds the current point to the list of grid points only if it is greater than r distance from all existing centers. This method is around 20 times faster than my previous attempts. Because there are no external libraries involved I can run this all in cython...
#cython.boundscheck(False)
#cython.wraparound(False)
#cython.nonecheck(False)
def get_centers_fast(np.ndarray[DTYPE_t, ndim = 2] x, double radius):
cdef int N = x.shape[0]
cdef int D = x.shape[1]
cdef int m = 1
cdef np.ndarray[DTYPE_t, ndim = 2] xc = np.zeros([10000, D])
cdef double r = 0
cdef double r_min = 10
cdef int i, j, k
for k in range(D):
xc[0,k] = x[0,k]
for i in range(1, N):
r_min = 10
for j in range(m):
r = 0
for k in range(D):
r += (x[i, k] - xc[j, k])**2
r = r**0.5
if r < r_min:
r_min = r
if r_min > radius:
m = m + 1
for k in range(D):
xc[m - 1,k] = x[i,k]
nonzero = np.nonzero(xc[:,0])[0]
xc = xc[nonzero,:]
return xc
Running these methods as follows:
N = 40000
r = 0.1
x1 = np.random.normal(size = N)
x1 = (x1 - min(x1)) / (max(x1)-min(x1))
x2 = np.random.normal(size = N)
x2 = (x2 - min(x2)) / (max(x2)-min(x2))
X = np.vstack([x1, x2]).T
tic = time.time()
grid0 = gt.get_centers0(X, r)
toc = time.time()
print 'Method 0: ' + str(toc - tic)
tic = time.time()
grid2 = gt.get_centers2(X, r, 10)
toc = time.time()
print 'Method 2: ' + str(toc - tic)
tic = time.time()
grid3 = gt.get_centers_fast(X, r)
toc = time.time()
print 'Method 3: ' + str(toc - tic)
The new method is around 20 times faster. It could be made even faster, if I stopped looping early (e.g. if k successive iterations fail to produce a new center).
Method 0: 0.219595909119
Method 2: 0.191949129105
Method 3: 0.0127329826355

Maybe you could only re-fit the nearest object every k << N deletions to speedup the process. Most of the time the neighborhood structure should not change much.

Sounds like you are trying to reinvent one of the following:
cluster features (see BIRCH)
data bubbles (see "Data bubbles: Quality preserving performance boosting for hierarchical clustering")
canopy pre-clustering
i.e. this concept has already been invented at least three times with small variations.
Technically, it is not clustering. K-means isn't really clustering either.
It is much more adequately described as vector quantization.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to increase speed while maintaining memory with numpy arrays? - python

Related

Speed up distance calculations, sliding window

Numpy.Cov of a Large Nx3 Array Produces MemoryError

How to create an array that can be accessed according to its indices in Numpy?

Vectorizing for loop with repeated indices in python

Build an approximately uniform grid from random sample (python)

Categories

Resources