Looking for memory efficient outer product sum - python

I have an equation of the form:
where a and b are 1D arrays of size n and m. I want to avoid forming anything of size n*m in an intermediate step because n and m are both very large the memory needed will be too expensive.
One solution is a simple python loop:
total = 0
for j in range(n):
total += a[j] * b**j
I'm looking for a native numpy solution without a python loop. But to solve without a python loop I can't find a numpy function that will work without forming the n by m array of [0*b, b, b**2, b**3, ...] as an input.
Edit:
As hpaulj pointed out a polynomial solution might work. I found this:
from numpy.polynomial import polynomial
total = polynomial.polyval(b, a)
This gives the same values in a quick test, and according to the notes on polyval it uses Horner's method and appears to be optimal. My actually problem is a sum over two dimensions but I think this solution will generalize.
Edit 2:
polyval seems to cause a
RuntimeWarning: overflow encountered in multiply c0 = c[-i] + c0*x
at smaller array sizes than even np.outer causes a MemoryError. When it does work it is much faster than the python loop.
Edit 3:
Ignore Edits 1 & 2. The following runs faster (on my machine) with the python loop. polyval does at least seem to be memory efficient.
import numpy as np
from numpy.polynomial import polynomial
a_size = 3
b_size = 6
a = np.random.rand(10**a_size)
b = np.exp(2j * np.pi * np.random.rand(10**b_size))
start = time.time()
c = polynomial.polyval(b, a)
middle = time.time()
print(middle - start)
c2, b_current = 0, b**0
for i in range(0, a.size):
c2 += a[i] * b_current
b_current *= b
end = time.time()
print(end-middle)

Related

Creating a symmetric array with power of an element

I am trying to create an array which is symmetric with elements placed as below
I have written the following code to get this form with parameter being 0.5 and dimension being 4-by-4.
import numpy as np
a = np.eye(4)
for i in range(4):
for j in range(4):
a[i, j] = (0.5) ** (np.abs(i-j))
This does what I need but for large dimension (1000s) this causes a lot of overhead. Is there any other low complexity method to get this matrix? Thanks.
We can leverage broadcasting after creating a ranged array to represent the iterator variable and then performing an outer-subtraction to simulate i-j part -
n = 4
p = 0.5
I = np.arange(n)
out = p ** (np.abs(I[:,None]-I))
Optimization #1
We can do a hashing based one with indexing, so that we optimize on expensive power computations, like so -
out = (p**np.arange(n))[(np.abs(I[:,None]-I))]
Optimization #2
We can optimize further to use multi-cores with numexpr -
import numexpr as ne
out = ne.evaluate('p**abs(I2D-I)',{'I2D':I[:,None],'I':I})

Speeding up nested loops in python

How can I speed up this code in python?
while ( norm_corr > corr_len ):
correlation = 0.0
for i in xrange(6):
for j in xrange(6):
correlation += (p[i] * T_n[j][i]) * ((F[j] - Fbar) * (F[i] - Fbar))
Integral += correlation
T_n =np.mat(T_n) * np.mat(TT)
T_n = T_n.tolist()
norm_corr = correlation / variance
Here, TT is a fixed 6x6 matrix, p is a fixed 1x6 matrix, and F is fixed 1x6 matrix. T_n is the nth power of TT.
This while loop might be repeated for 10^4 times.
The way to do these things quickly is to use Numpy's built-in functions and operators to perform the operations. Numpy is implemented internally with optimized C code and if you set up your computation properly, it will run much faster.
But leveraging Numpy effectively can sometimes be tricky. It's called "vectorizing" your code - you have to figure out how to express it in a way that acts on whole arrays, rather than with explicit loops.
For example in your loop you have p[i] * T_n[j][i], which IMHO can be done with a vector-by-matrix multiplication: if v is 1x6 and m is 6x6 then v.dot(m) is 1x6 that computes dot products of v with the columns of m. You can use transposes and reshapes to work in different dimensions, if necessary.

find 2d elements in a 3d array which are similar to 2d elements in another 3d array

I have two 3D arrays and want to identify 2D elements in one array, which have one or more similar counterparts in the other array.
This works in Python 3:
import numpy as np
import random
np.random.seed(123)
A = np.round(np.random.rand(25000,2,2),2)
B = np.round(np.random.rand(25000,2,2),2)
a_index = np.zeros(A.shape[0])
for a in range(A.shape[0]):
for b in range(B.shape[0]):
if np.allclose(A[a,:,:].reshape(-1, A.shape[1]), B[b,:,:].reshape(-1, B.shape[1]),
rtol=1e-04, atol=1e-06):
a_index[a] = 1
break
np.nonzero(a_index)[0]
But of course this approach is awfully slow. Please tell me, that there is a more efficient way (and what it is). THX.
You are trying to do an all-nearest-neighbor type query. This is something that has special O(n log n) algorithms, I'm not aware of a python implementation. However you can use regular nearest-neighbor which is also O(n log n) just a bit slower. For example scipy.spatial.KDTree or cKDTree.
import numpy as np
import random
np.random.seed(123)
A = np.round(np.random.rand(25000,2,2),2)
B = np.round(np.random.rand(25000,2,2),2)
import scipy.spatial
tree = scipy.spatial.cKDTree(A.reshape(25000, 4))
results = tree.query_ball_point(B.reshape(25000, 4), r=1e-04, p=1)
print [r for r in results if r != []]
# [[14252], [1972], [7108], [13369], [23171]]
query_ball_point() is not an exact equivalent to allclose() but it is close enough, especially if you don't care about the rtol parameter to allclose(). You also get a choice of metric (p=1 for city block, or p=2 for Euclidean).
P.S. Consider using query_ball_tree() for very large data sets. Both A and B have to be indexed in that case.
P.S. I'm not sure what effect the 2d-ness of the elements should have; the sample code I gave treats them as 1d and that is identical at least when using city block metric.
From the docs of np.allclose, we have :
If the following equation is element-wise True, then allclose returns
True.
absolute(a - b) <= (atol + rtol * absolute(b))
Using that criteria, we can have a vectorized implementation using broadcasting, customized for the stated problem, like so -
# Setup parameters
rtol,atol = 1e-04, 1e-06
# Use np.allclose criteria to detect true/false across all pairwise elements
mask = np.abs(A[:,None,] - B) <= (atol + rtol * np.abs(B))
# Use the problem context to get final output
out = np.nonzero(mask.all(axis=(2,3)).any(1))[0]

Optimizing histogram distance metric for two matrices in Python

I have two matrices A and B, each with a size of NxM, where N is the number of samples and M is the size of histogram bins. Thus, each row represents a histogram for that particular sample.
What I would like to do is to compute the chi-square distance between two matrices for a different pair of samples. Therefore, each row in the matrix A will be compared to all rows in the other matrix B, resulting a final matrix C with a size of NxN and C[i,j] corresponds to the chi-square distance between A[i] and B[j] histograms.
Here is my python code that does the job:
def chi_square(histA,histB):
esp = 1.e-10
d = sum((histA-histB)**2/(histA+histB+eps))
return 0.5*d
def matrix_cost(A,B):
a,_ = A.shape
b,_ = B.shape
C = zeros((a,b))
for i in xrange(a):
for j in xrange(b):
C[i,j] = chi_square(A[i],B[j])
return C
Currently, for a 100x70 matrix, this entire process takes 0.1 seconds.
Is there any way to improve this performance?
I would appreciate any thoughts or recommendations.
Thank you.
Sure! I'm assuming you're using numpy?
If you have the RAM available, you could use broadcast the arrays and use numpy's efficient vectorization of the operations on those arrays.
Here's how:
Abroad = A[:,np.newaxis,:] # prepared for broadcasting
C = np.sum((Abroad - B)**2/(Abroad + B), axis=-1)/2.
Timing considerations on my platform show a factor of 10 speed gain compared to your algorithm.
A slower option (but still faster than your original algorithm) that uses less RAM than the previous option is simply to broadcast the rows of A into 2D arrays:
def new_way(A,B):
C = np.empty((A.shape[0],B.shape[0]))
for rowind, row in enumerate(A):
C[rowind,:] = np.sum((row - B)**2/(row + B), axis=-1)/2.
return C
This has the advantage that it can be run for arrays with shape (N,M) much larger than (100,70).
You could also look to Theano to push the expensive for-loops to the C-level if you don't have the memory available. I get a factor 2 speed gain compared to the first option (not taking into account the initial compile time) for both the (100,70) arrays as well as (1000,70):
import theano
import theano.tensor as T
X = T.matrix("X")
Y = T.matrix("Y")
results, updates = theano.scan(lambda x_i: ((x_i - Y)**2/(x_i+Y)).sum(axis=1)/2., sequences=X)
chi_square_norm = theano.function(inputs=[X, Y], outputs=[results])
chi_square_norm(A,B) # same result

numpy row pair sum of squared row wise differences without for loops (only api calls)

For those who can read Latex, this is what I am trying to compute:
$$k_{xyi} = \sum_{j}\left ( \left ( x_{i}-x_{j} \right )^{2}+\left ( y_{i}-y_{j} \right )^{2} \right )$$
where x and y are rows of a matrix A.
For computer language only folk this would translate as:
k(x,y,i) = sum_j( (xi - xj)^2 + (yi - yj)^2 )
where x and y are rows of a matrix A.
So k is a 3d matrix.
Can this be done with API calls only? (no for loops)
Here is testing startup:
import numpy as np
A = np.random.rand(4,4)
k = np.empty((4,4,4))
for ix in range(4):
for iy in range(4):
x = A[ix,]
y = A[iy,]
sx = np.power(x - x[:,np.newaxis],2)
sy = np.power(y - y[:,np.newaxis],2)
k[ix,iy] = (sx + sy).sum(axis=1).T
And now for the master coders, please replace the two for loops with numpy API calls.
Update:
Forgot to mention that I need a method that saves up RAM space, my A matrices are usually 20-30 thousand squared. So it would be great if your answer does not create huge temporary multidimensional arrays.
I would change your latex to look something more like the following- it is much less confusing imo:
From this I assume the last line in your expression should really be:
k[ix,iy] = (sx + sy).sum(axis=-1)
If so, you can compute the above expression as follows:
Axij = (A[:, None, :] - A[..., None])**2
k = np.sum(Axij[:, None, :, :] + Axij, axis=-1)
The above first expands out a memory intensive 4D array. You can skip this if you are worried about memory by introducing a new for loop:
k = np.empty((4,4,4))
Axij = (A[:, None, :] - A[..., None])**2
for xi in range(A.shape[0]):
k[xi] = np.sum(Axij[xi, None, :, :] + Axij, axis=-1)
This will be slower, but not by as much as you would think since you still do a lot of the operations in numpy. You could probably skip the 3D Axij intermediate, but again you are going to take a performance penalty doing so.
If your matrices are really 20k on an edge your 3D output will be 64TB. You are not going to do this in numpy or even in memory (unless you have a large scale distributed memory system).

Categories