Scipy cdist maximum distance - python

New to scipy. I am trying to use the cdist function to pick the greatest distance between vectors. My attempt is
dm = cdist(XA, XB, lambda u, v: np.max(np.sqrt(((u-v)**2).sum())))
but it doesn't seem to produce the correct result. Any suggestions?

The cdist function returns a NxM matrix containing all distances between the N vectors of XA and M vectors of XB. If you want the max distance, regardless of the vectors that originate it, you need to ravel() the 2D array into a 1D array and then look for the max() value:
dm = cdist(XA, XB,metric='euclidean').ravel().max()

Related

Implementation of Hellinger distance with numpy only

I got this task to implement a python function using NumPy.
The function should compute the Hellinger distance between two matrices P and Q with dimensions (n, k). p_i is the vector of row i of P and p_i,j is the value of row i in column j of P.
The Hellinger distance for matrices is defined as followed:
h_i = i/sqrt(2) * sqrt(sum(j=1,k) of (sqrt(p_i,j)-sqrt(q_i,j))^2)
H is a vector of length n and h_i is the value i of H, with i = 1,...,n. So the Hellinger distance between two matrices is equivalent to the Hellinger distance between the rows of the matrices. For each row, the distance is stored in the output vector H.
The task now is to implement the function (using NumPy), which will compute the above-described problem. It gets handed over two 2D-NumPy-Arrays P and Q, and it should return a 1D-Numpy-Array H of the right length.
I never worked with NumPy before, so I would be very grateful for any suggestions.
I informed myself a little bit on the NumPy-Docs but I would love to get any suggentions.
I found out that you need to use the axis argument in certain NumPy functions (e.g. np.sum()) in order to tell NumPy if it should iterate over the rows or columns of an array. I did exactly that: return np.sqrt(1/2) * np.sqrt( np.sum((np.sqrt(P) - np.sqrt(Q))**2,axis=1) ) and it works.
The only problem is that it still gives back negative values. How is that possible, since the subtraction is taken to the power of 2?

Finding Hessian matrix of multi dimensional function

I am trying to create 10 dimensional convex function. I know that eigen values of its hessian matrix must be positive for function to be convex. I am doing the things below to find the hessian matrix, but its input is an array, I dont know how to represent a function as array.
def hessian(x):
"""
Calculate the hessian matrix with finite differences
Parameters:
- x : ndarray
Returns:
an array of shape (x.dim, x.ndim) + x.shape
where the array[i, j, ...] corresponds to the second derivative x_ij
"""
x_grad = np.gradient(x)
hessian = np.empty((x.ndim, x.ndim) + x.shape, dtype=x.dtype)
for k, grad_k in enumerate(x_grad):
# iterate over dimensions
# apply gradient again to every component of the first derivative.
tmp_grad = np.gradient(grad_k)
for l, grad_kl in enumerate(tmp_grad):
hessian[k, l, :, :] = grad_kl
return hessian
x = np.random.randn(100,100)
t=hessian(x)
As stated in the question you got this code from, x is the value of the function at the nodes of an evenly spaced mesh in parameter space, not the function itself.
If it's not possible to calculate the hessian of your function analytically, you have to use finite difference formulas like in this example code.

How to use scipy minimize in a 2d array?

I want to use the minimization function from scipy scipy.optimize.minimize.
I have a function def f(x,a,b,c) whose arguments are three scalars. I have 3 numpy matrices A B C and i want to calculate a matrix whose (i,j) component is the minimum of f(x,A[i,j],B[i,j],C[i,j]) over all posible x. Using just scipy.optimize.minimize(f,1,args=(A,B,C)) do not work. Any idea of how can I do it efficiently?

Find the distance of each pair between two vectors

I have two vectors, let's say x=[2,4,6,7] and y=[2,6,7,8] and I want to find the euclidean distance, or any other implemented distance (from scipy for example), between each corresponding pair. That will be
dist=[0, 2, 1, 1].
When I try
dist = scipy.spatial.distance.cdist(x,y, metric='sqeuclidean')
or
dist = [scipy.spatial.distance.cdist(x,y, metric='sqeuclidean') for x,y in zip(x,y)]
I get
ValueError: XA must be a 2-dimensional array.
How am I supposed to calculate dist and why do I have to reshape data for that purpose?
cdist does not compute the list of distances between corresponding pairs, but the matrix of distances between all pairs.
np.linalg.norm((np.asarray(x)-np.asarray(y))[:, None], axis=1)
Is how id typically write this for the Euclidian distance between n-dimensional points; but if you are only dealing with 1 dimensional points, the absolute difference, as suggested by elpres would be simpler.

Compute numpy array pairwise Euclidean distance except with self

edit: this question is not specifically about calculating distances, rather the most efficient way to loop through a numpy array, specifying that for index i all comparisons should be made with the rest of the array, as long as the second index is not i.
I have a numpy array with columns (X, Y, ID) and want to compare each element to each other element, but not itself. So, for each X, Y coordinate, I want to calculate the distance to each other X, Y coordinate, but not itself (where distance = 0).
Here is what I have - there must be a more "numpy" way to write this.
import math, arcpy
# Point feature class
fc = "MY_FEATURE_CLASS"
# Load points to numpy array: (X, Y, ID)
npArray = arcpy.da.FeatureClassToNumPyArray(fc,["SHAPE#X","SHAPE#Y","OID#"])
for row in npArray:
for row2 in npArray:
if row[2] != row2[2]:
# Pythagoras's theorem
distance = math.sqrt(math.pow((row[0]-row2[0]),2)+math.pow((row[1]-row2[1]),2))
Obviously, I'm a numpy newbie. I will not be surprised to find this a duplicate, but I don't have the numpy vocabulary to search out the answer. Any help appreciated!
Using SciPy's pdist, you could write something like
from scipy.spatial.distance import pdist, squareform
distances = squareform(pdist(npArray, lambda a,b: np.sqrt((a[0]-b[0])**2 + (a[1]-b[1])**2)))
pdist will compute the pair-wise distances using the custom metric that ignores the 3rd coordinate (which is your ID in this case). squareform turns this into a more readable matrix such that distances[0,1] gives the distance between the 0th and 1st rows.
Each row of X is a 3 dimensional data instance or point.
The output pairwisedist[i, j] is distance of X[i, :] and X[j, :]
X = np.array([[6,1,7],[10,9,4],[13,9,3],[10,8,15],[14,4,1]])
a = np.sum(X*X,1)
b = np.repeat( a[:,np.newaxis],5,axis=1)
pairwisedist = b + b.T -2* X.dot(X.T)
I wanted to point out that custom written sqrt of sum of squares are prone to overflow and underflow. Bultin math.hypot, np.hypot are way safer for no compromise on performance
from scipy.spatial.distance import pdist, squareform
distances = squareform(pdist(npArray, lambda a,b: math.hypot(*(a-b))
Refer

Categories