Pairwise Euclidean distances between two binary tensors - python

I am trying to compute the pairwise distances between all points in two binary areas/volume/hypervolume in Tensorflow.
E.g. In 2D the areas are defined as binary tensors with ones and zeros:
input1 = tf.constant(np.array([[1,0,0], [0,1,0], [0,0,1]))
input2 = tf.constant(np.array([[0,1,0], [0,0,1], [0,1,0]))
input1 has 3 points and input2 has 2 points.
So far I have managed to convert the binary tensors into arrays of spatial coordinates:
coord1 = tf.where(tf.cast(input1, tf.bool))
coord2 = tf.where(tf.cast(input2, tf.bool))
Where, coord1 will have shape=(3,2) and coord2 will have shape=(2,2). The first dimension refers to the number of points and the second to their spatial coordinates (in this case 2D).
The result that I want is a tensor with shape=(6, ) with the pairwise Euclidean distances between all of the points in the areas.
Example (the order of the distances might be incorrect):
output = [1, sqrt(5), 1, 1, sqrt(5), 1]
Since TensorFlow isn't great with loops and in my real application the number of points in each tensor is unknown, I think I might be missing some linear algebra here.

I'm not familiar with Tensorflow, but my understanding from reading this is that the underlying NumPy arrays should be easy to extract from your data. So I will provide a solution which shows how to calculate pairwise Euclidean distances between points of 3x2 and 2x2 NumPy arrays, and hopefully it helps.
Generating random NumPy arrays in the same shape as your data:
coord1 = np.random.random((3, 2))
coord2 = np.random.random((2, 2))
Import the relevant SciPy function and run:
from scipy.spatial.distance import cdist
distances = cdist(coord1, coord2, metric='euclidean')
This will return a 3x2 array, but you can use distances.flatten() to get your desired 1-dimensional array of length 6.

I have come up with an answer using only matrix multiplies and transposition. This makes use of the fact that distances can be expressed with inner products (d^2 = x^2 + y^2 - 2xy):
input1 = np.array([[1,0,0],[0,1,0],[0,0,1]])
input2 = np.array([[1,1,0],[0,0,1],[1,0,0]])
c1 = tf.cast(tf.where(tf.cast(input1, tf.bool)), tf.float32)
c2 = tf.cast(tf.where(tf.cast(input2, tf.bool)), tf.float32)
distances = tf.sqrt(-2 * tf.matmul(c1, tf.transpose(c2)) + tf.reduce_sum(tf.square(c2), axis=1)
+ tf.expand_dims(tf.reduce_sum(tf.square(c1), axis=1), axis=1))
with tf.Session() as sess:
d = sess.run(distances)
Since Tensorflow has broadcast by default the fact the arrays have different dimensions doesn't matter.
Hope it helps somebody.

Related

How to align dimensions to use np.dot() between two different matrices

I'm trying to use the dot product in Numpy between two matrices with different dimensions.
w is (1, 5) and X is (3, 5)
I'm not sure which command I can use to change the dimensions as I am new to python.
Thank you.
When I try running my function, it gives me an error saying:
ValueError: shapes (1,5) and (3,5) not aligned: 5 (dim 1) != 3 (dim 0)
from numpy.core.memmap import ndarray
def L(w, X, y):
"""
Arguments:
w -- vector of size n representing weights of input features n
X -- matrix of size m x n represnting input data, m data sample with n features each
y -- vector of size m (true labels)
Returns:
loss -- the value of the loss function defined above
"""
### START CODE HERE ### (2-4 lines of code)
#w needs to match X matrix
# w = (1, 5)
# x = (3, 5)
yhat = np.dot(w, X)
L1 = y - yhat
loss = np.dot(L1, L1)
### END CODE HERE ###
return loss
Here is the picture of directions:
image of directions
The dot product of two vectors is the sum of the products of elements with regards to position. The first element of the first vector is multiplied by the first element of the second vector and so on. The sum of these products is the dot product which can be done with np.dot() function.
Since we multiply elements at the same positions, the two vectors must have same length in order to have a dot product.
import numpy as np
a = np.array([[1,2],[3,4]])
b = np.array([[11,12],[13,14]])
np.dot(a,b)
It will produce the following output −
[[37 40]
[85 92]]
Note that the dot product is calculated as −
[[1*11+2*13, 1*12+2*14],[3*11+4*13, 3*12+4*14]]
You get the full flexibility with tensordot which implements tensor products with arbitrary choice of axes.
A nice application is estimating the covariance matrix, without messing with transpositions:
import numpy as np
from scipy.stats import multivariate_normal
dist = multivariate_normal(mean=[0,0],cov=[[1,1],[1,2]])
samples = dist.rvs(1000,2)
np.tensordot(samples,samples,axes=[0,0])/len(samples) # close to [[1,1],[1,2]

Get the components of a multidimensional array dot product without a loop

I want to vectorise the dot product of several 3x3 matrices (rotation matrix around x-axis) with several 3x1 vectors. The application is the transformation of points (approx 500k per array) from one to another coordinate system.
Here in the example only four of each. Hence, the result should be again 4 times a 3x1 vector, respectively the single components x,y,z be a 4x0 vector. But I cannot get the dimensions figured out: Here the dot product with tensordot in results in a shape of (4,3,4), of which I need the diagonals again:
x,y,z = np.zeros((3,4,1))
rota = np.arange(4* 3 * 3).reshape((4,3, 3))
v= np.arange(4 * 3).reshape((4, 3))
result = np.zeros_like(v, dtype = np.float64)
vec_rotated = np.tensordot(rota,v, axes=([-1],[1]))
for i in range(result.shape[0]):
result[i,:] = vec_rotated[i,:,i]
x,y,z = result.T
How can i vectorise the complete thing?
Use np.einsum for an efficient solution -
x,y,z = np.einsum('ijk,ik->ji',rota,v)
Alternative with np.matmul/# operator in Python 3.x -
x,y,z = np.matmul(rota,v[:,:,None])[...,0].T
x,y,z = (rota#v[...,None])[...,0].T
works via transpose to obtain one component per diagonal:
vec_rotated = vec_rotated.transpose((1,0,2))
x,y,z = np.diag(vec_rotated[0,:,:]),np.diag(vec_rotated[1,:,:]),np.diag(vec_rotated[2,:,:])

Is there a way to implement convex optimization using N-dimensional arrays?

Given data with shape = (t,m,n), I need to find a vector variable of shape (n,) that minimizes a convex function of the data and vector. I've used cvxopt (and cvxpy) to perform convex optimizations using 2D input, but it seems like they don't support 3D arrays. Is there a way to implement this convex optimization using these or other similar packages?
Given data with shape (t,m,n) and (t,m) and var with shape (n,), here's a simplification of the type of function I need to minimize:
import numpy as np
obj_func(var,data1,data2):
#data1.shape = (t,m,n)
#data2.shape = (t,m)
#var.shape = (n,)
score = np.sum(data1*var,axis=2) #dot product along axis 2
time_series = np.sum(score*data2,axis=1) #weighted sum along axis 1
return np.sum(time_series)-np.sum(time_series**2) #some function
This seems like it should be a simple convex optimization, but unfortunately these functions aren't supported on N-dimensional arrays in cvxopt/cvxpy. Is there a way to implement this?
I think if you simply reshape data1 to be 2d temporarily you'll be fine, e.g.
import numpy as np
import cvxpy as cp
t, m, n = 10, 8, 6
data1 = np.ones((t, m, n))
data2 = np.ones((t, m))
x = cp.Variable(n)
score = cp.reshape(data1.reshape(-1, n) * x, (t, m))
time_series = cp.sum(cp.multiply(score, data2), axis=1)
expr = cp.sum(time_series) - cp.sum(time_series ** 2)
print(repr(expr))
Outputs:
Expression(CONCAVE, UNKNOWN, ())

How to construct square of pairwise difference from a vector in tensorflow?

I have a 1D vector having N dimension in TensorFlow,
how to construct sum of a pairwise squared difference?
Example
Input Vector
[1,2,3]
Output
6
Computed As
(1-2)^2+(1-3)^2+(2-3)^2.
if I have input as an N-dim vector l, the output should be sigma_{i,j}((l_i-l_j)^2).
Added question: if I have a 2d matrix and want to perform the same process for each row of the matrix, and then average the results from all the rows, how can I do it? Many thanks!
For pair-wise difference, subtract the input and the transpose of input and take only the upper triangular part, like:
pair_diff = tf.matrix_band_part(a[...,None] -
tf.transpose(a[...,None]), 0, -1)
Then you can square and sum the differences.
Code:
a = tf.constant([1,2,3])
pair_diff = tf.matrix_band_part(a[...,None] -
tf.transpose(a[...,None]), 0, -1)
output = tf.reduce_sum(tf.square(pair_diff))
with tf.Session() as sess:
print(sess.run(output))
# 6

Sum of Gaussians into fast Numpy?

here is my problem:
I have two sets of 3d points. Lets call them "Gausspoints" and "XYZ". I define a function which is a sum of Gaussians in which every Gaussian is centered at one of the Gausspoints. Now I want to evaluate this function on the XYZ points: My approach is working fine but it is rather slow. Any idea how to speed it up by exploiting numpy a little better?
def sumgaus(r):
t=r-Gausspoints
t=map(np.linalg.norm,t)
t = -np.power(t,2.0)
t=np.exp(t)
res=np.sum(t)
return res
result=map(sumgaus,XYZ)
Thanks for any help
Edit:
shape of XYZ N*3 and Gausspoints are M*3 with M, N being different integers
Edit2: I want to apply the following function on each item in XYZ
The tricky part is how to vectorize the computation of all the differences between your points without any explicit Python looping or mapping. You can roll out your own implementation using broadcasting by doing something like:
dist2 = XYZ[:, np.newaxis, :] - Gausspoints
dist2 *= dist
dist2 = np.sum(dist, axis=-1)
And if XYZ has shape (n, 3) and Gausspoints has shape (m, 3), then dist will have shape (n, m), with dist[i, j] being the distance between points XYZ[i] and Gausspoints[j].
It may be easier to understand using scipy.spatial.distance.cdist:
from scipy.spatial.distance import cdist
dist2 = cdist(XYZ, Gausspoints)
dist2 *= dist2
But once you have your array of squared distances, it's child's play:
f = np.sum(np.exp(-dist2), axis=1)

Categories