Sum of Gaussians into fast Numpy? - python

here is my problem:
I have two sets of 3d points. Lets call them "Gausspoints" and "XYZ". I define a function which is a sum of Gaussians in which every Gaussian is centered at one of the Gausspoints. Now I want to evaluate this function on the XYZ points: My approach is working fine but it is rather slow. Any idea how to speed it up by exploiting numpy a little better?
def sumgaus(r):
t=r-Gausspoints
t=map(np.linalg.norm,t)
t = -np.power(t,2.0)
t=np.exp(t)
res=np.sum(t)
return res
result=map(sumgaus,XYZ)
Thanks for any help
Edit:
shape of XYZ N*3 and Gausspoints are M*3 with M, N being different integers
Edit2: I want to apply the following function on each item in XYZ

The tricky part is how to vectorize the computation of all the differences between your points without any explicit Python looping or mapping. You can roll out your own implementation using broadcasting by doing something like:
dist2 = XYZ[:, np.newaxis, :] - Gausspoints
dist2 *= dist
dist2 = np.sum(dist, axis=-1)
And if XYZ has shape (n, 3) and Gausspoints has shape (m, 3), then dist will have shape (n, m), with dist[i, j] being the distance between points XYZ[i] and Gausspoints[j].
It may be easier to understand using scipy.spatial.distance.cdist:
from scipy.spatial.distance import cdist
dist2 = cdist(XYZ, Gausspoints)
dist2 *= dist2
But once you have your array of squared distances, it's child's play:
f = np.sum(np.exp(-dist2), axis=1)

Related

Efficient way to fill NumPy array for independent entries?

I'm currently trying to fill a matrix K where each entry in the matrix is just a function applied to two entries of an array x.
At the moment I'm using the most obvious method of running through rows and columns one at a time using a double for-loop:
K = np.zeros((x.shape[0],x.shape[0]), dtype=np.float32)
for i in range(x.shape[0]):
for j in range(x.shape[0]):
K[i,j] = f(x[i],x[j])
While this works fine the resulting matrix is a 10,000 by 10,000 matrix and takes very long to calculate. I was wondering if there is a more efficient way to do this built into NumPy?
EDIT: The function in question here is a gaussian kernel:
def gaussian(a,b,sigma):
vec = a-b
return np.exp(- np.dot(vec,vec)/(2*sigma**2))
where I set sigma in advance before calculating the matrix.
The array x is an array of shape (10000, 8). So the scalar product in the gaussian is between two vectors of dimension 8.
You can use a single for loop together with broadcasting. This requires to change the implementation of the gaussian function to accept 2D inputs:
def gaussian(a,b,sigma):
vec = a-b
return np.exp(- np.sum(vec**2, axis=-1)/(2*sigma**2))
K = np.zeros((x.shape[0],x.shape[0]), dtype=np.float32)
for i in range(x.shape[0]):
K[i] = gaussian(x[i:i+1], x)
Theoretically you could accomplish this even without any for loop, again by using broadcasting, but here an intermediary array of size len(x)**2 * x.shape[1] will be created which might run out of memory for your array sizes:
K = gaussian(x[None, :, :], x[:, None, :])

vectorized / linear algebra distance between points?

Suppose I have an array of points,
import numpy as np
pts = np.random.rand(100,3) # 1000 points, X, Y, Z along second dimension
The naive approach to calculate the distance between each combination of points involves a double for loop and will be excruciatingly slow for large numbers of points,
def euclidian_distance(p1, p2):
d = p2 - p1
return np.sqrt(d**2).sum()
out = np.empty((pts.shape[0], pts.shape[0]))
pts_swapped = pts.swapaxes(0,1)
for idx, point in enumerate(pts_swapped):
for idx2, point_inner in enumerate(pts_swapped):
out[idx,idx2] = euclidian_distance(point, point_inner)
How do I vectorize this calculation?
Take a look at the scipy.spatial.distance.cdist. I'm not sure but i assume that scipy optimized this quite a lot. If you use the pts array for both inputs, I assume you'll get an M x M array with zeros on the diagonal . function

Pairwise Euclidean distances between two binary tensors

I am trying to compute the pairwise distances between all points in two binary areas/volume/hypervolume in Tensorflow.
E.g. In 2D the areas are defined as binary tensors with ones and zeros:
input1 = tf.constant(np.array([[1,0,0], [0,1,0], [0,0,1]))
input2 = tf.constant(np.array([[0,1,0], [0,0,1], [0,1,0]))
input1 has 3 points and input2 has 2 points.
So far I have managed to convert the binary tensors into arrays of spatial coordinates:
coord1 = tf.where(tf.cast(input1, tf.bool))
coord2 = tf.where(tf.cast(input2, tf.bool))
Where, coord1 will have shape=(3,2) and coord2 will have shape=(2,2). The first dimension refers to the number of points and the second to their spatial coordinates (in this case 2D).
The result that I want is a tensor with shape=(6, ) with the pairwise Euclidean distances between all of the points in the areas.
Example (the order of the distances might be incorrect):
output = [1, sqrt(5), 1, 1, sqrt(5), 1]
Since TensorFlow isn't great with loops and in my real application the number of points in each tensor is unknown, I think I might be missing some linear algebra here.
I'm not familiar with Tensorflow, but my understanding from reading this is that the underlying NumPy arrays should be easy to extract from your data. So I will provide a solution which shows how to calculate pairwise Euclidean distances between points of 3x2 and 2x2 NumPy arrays, and hopefully it helps.
Generating random NumPy arrays in the same shape as your data:
coord1 = np.random.random((3, 2))
coord2 = np.random.random((2, 2))
Import the relevant SciPy function and run:
from scipy.spatial.distance import cdist
distances = cdist(coord1, coord2, metric='euclidean')
This will return a 3x2 array, but you can use distances.flatten() to get your desired 1-dimensional array of length 6.
I have come up with an answer using only matrix multiplies and transposition. This makes use of the fact that distances can be expressed with inner products (d^2 = x^2 + y^2 - 2xy):
input1 = np.array([[1,0,0],[0,1,0],[0,0,1]])
input2 = np.array([[1,1,0],[0,0,1],[1,0,0]])
c1 = tf.cast(tf.where(tf.cast(input1, tf.bool)), tf.float32)
c2 = tf.cast(tf.where(tf.cast(input2, tf.bool)), tf.float32)
distances = tf.sqrt(-2 * tf.matmul(c1, tf.transpose(c2)) + tf.reduce_sum(tf.square(c2), axis=1)
+ tf.expand_dims(tf.reduce_sum(tf.square(c1), axis=1), axis=1))
with tf.Session() as sess:
d = sess.run(distances)
Since Tensorflow has broadcast by default the fact the arrays have different dimensions doesn't matter.
Hope it helps somebody.

Vectorise Python code

I have coded a kriging algorithm but I find it quite slow. Especially, do you have an idea on how I could vectorise the piece of code in the cons function below:
import time
import numpy as np
B = np.zeros((200, 6))
P = np.zeros((len(B), len(B)))
def cons():
time1=time.time()
for i in range(len(B)):
for j in range(len(B)):
P[i,j] = corr(B[i], B[j])
time2=time.time()
return time2-time1
def corr(x,x_i):
return np.exp(-np.sum(np.abs(np.array(x) - np.array(x_i))))
time_av = 0.
for i in range(30):
time_av+=cons()
print "Average=", time_av/100.
Edit: Bonus questions
What happens to the broadcasting solution if I want corr(B[i], C[j]) with C the same dimension than B
What happens to the scipy solution if my p-norm orders are an array:
p=np.array([1.,2.,1.,2.,1.,2.])
def corr(x, x_i):
return np.exp(-np.sum(np.abs(np.array(x) - np.array(x_i))**p))
For 2., I tried P = np.exp(-cdist(B, C,'minkowski', p)) but scipy is expecting a scalar.
Your problem seems very simple to vectorize. For each pair of rows of B you want to compute
P[i,j] = np.exp(-np.sum(np.abs(B[i,:] - B[j,:])))
You can make use of array broadcasting and introduce a third dimension, summing along the last one:
P2 = np.exp(-np.sum(np.abs(B[:,None,:] - B),axis=-1))
The idea is to reshape the first occurence of B to shape (N,1,M) while the second B is left with shape (N,M). With array broadcasting, the latter is equivalent to (1,N,M), so
B[:,None,:] - B
is of shape (N,N,M). Summing along the last index will then result in the (N,N)-shape correlation array you're looking for.
Note that if you were using scipy, you would be able to do this using scipy.spatial.distance.cdist (or, equivalently, a combination of scipy.spatial.distance.pdist and scipy.spatial.distance.squareform), without unnecessarily computing the lower triangular half of this symmetrix matrix. Using #Divakar's suggestion in comments for the simplest solution this way:
from scipy.spatial.distance import cdist
P3 = 1/np.exp(cdist(B, B, 'minkowski',1))
cdist will compute the Minkowski distance in 1-norm, which is exactly the sum of the absolute values of coordinate differences.

Python-Numpy: Convolution, Code optimization

in the python code I'm currently developing there is a particular function that really requires a speed optimization.
To a first approximation I would like to focus on pure python code (no C or Cython implementations).
The function generates a series of gaussian curves with varying sigma depending on the x-axis position. It takes three arguments:
x0, 1d numpy array, central values of the gaussian curves
h , 1d numpy array, heights of the gaussian curves
x , 1d numpy array, values for the definition of the total sum
My goal is to obtain the sum of all the curves in the fastest way possible (it is a sort of convolution with a gaussian curve that has a position dependent sigma).
At the moment my code is:
sigs = get_sigmas(x0) # function that returns the value of sigma at each position
all_gauss_args = -0.5*np.power((x[:, np.newaxis] - x0[np.newaxis, :]) /
sigs[np.newaxis, :], 2.0)
sum = (1.0/(np.sqrt(2 * np.pi) * sigs[np.newaxis, :])) * np.exp(all_gauss_arg) *\
h[np.newaxis, :]
sum = np.sum(sum, axis=1)
return sum
It is possible to make it faster?
Thanks in advance for the help

Categories