Calculate euclidean distance with numpy - python

I have a point set which I have stored its coordinates in three different arrays (xa, ya, za). Now, I want to calculate the euclidean distance between each point of this point set (xa[0], ya[0], za[0] and so on) with all the points of an another point set (xb, yb, zb) and every time store the minimum distance in a new array.
Let's say that xa.shape = (11,), ya.shape = (11,), za.shape= (11,). Respectively, xb.shape = (13,), yb.shape = (13,), zb.shape = (13,). What I want to do is to take each time one xa[],ya[],za[], and calculate its distance with all the elements of xb, yb, zb, and at the end store the minimum value into an xfinal.shape = (11,) array.
Do you think that this would be possible with numpy?

A different solution would be to use the spatial module from scipy, the KDTree in particular.
This class learn from a set of data and can be interrogated given a new dataset:
from scipy.spatial import KDTree
# create some fake data
x = arange(20)
y = rand(20)
z = x**2
# put them togheter, should have a form [n_points, n_dimension]
data = np.vstack([x, y, z]).T
# create the KDTree
kd = KDTree(data)
now if you have a point you can ask the distance and the index of the closet point (or the N closest points) simply by doing:
kd.query([1, 2, 3])
# (1.8650720813822905, 2)
# your may differs
or, given an array of positions:
#bogus position
x2 = rand(20)*20
y2 = rand(20)*20
z2 = rand(20)*20
# join them togheter as the input
data2 = np.vstack([x2, y2, z2]).T
#query them
kd.query(data2)
#(array([ 14.96118553, 9.15924813, 16.08269197, 21.50037074,
# 18.14665096, 13.81840533, 17.464429 , 13.29368755,
# 20.22427196, 9.95286671, 5.326888 , 17.00112683,
# 3.66931946, 20.370496 , 13.4808055 , 11.92078034,
# 5.58668204, 20.20004206, 5.41354322, 4.25145521]),
#array([4, 3, 2, 4, 2, 2, 4, 2, 3, 3, 2, 3, 4, 4, 3, 3, 3, 4, 4, 4]))

You can calculate the difference from each xa to each xb with np.subtract.outer(xa, xb). The distance to the nearest xb is given by
np.min(np.abs(np.subtract.outer(xa, xb)), axis=1)
To extend this to 3D,
distances = np.sqrt(np.subtract.outer(xa, xb)**2 + \
np.subtract.outer(ya, yb)**2 + np.subtract.outer(za, zb)**2)
distance_to_nearest = np.min(distances, axis=1)
If you actually want to know which of the b points is the nearest, you use argmin in place of min.
index_of_nearest = np.argmin(distances, axis=1)

There is more than one way of doing this. Most importantly, there's a trade-off between memory-usage and speed. Here's the wasteful method:
s = (1, -1)
d = min((xa.reshape(s)-xb.reshape(s).T)**2
+ (ya.reshape(s)-yb.reshape(s).T)**2
+ (za.reshape(s)-zb.reshape(s).T)**2), axis=0)
The other method would be to iterate over the point set in b to avoid the expansion to the full blown matrix.

Related

Optimize the python function with numpy without using the for loop

I have the following python function:
def npnearest(u: np.ndarray, X: np.ndarray, Y: np.ndarray, distance: 'callbale'=npdistance):
'''
Finds x1 so that x1 is in X and u and x1 have a minimal distance (according to the
provided distance function) compared to all other data points in X. Returns the label of x1
Args:
u (np.ndarray): The vector (ndim=1) we want to classify
X (np.ndarray): A matrix (ndim=2) with training data points (vectors)
Y (np.ndarray): A vector containing the label of each data point in X
distance (callable): A function that receives two inputs and defines the distance function used
Returns:
int: The label of the data point which is closest to `u`
'''
xbest = None
ybest = None
dbest = float('inf')
for x, y in zip(X, Y):
d = distance(u, x)
if d < dbest:
ybest = y
xbest = x
dbest = d
return ybest
Where, npdistance simply gives distance between two points i.e.
def npdistance(x1, x2):
return(np.sum((x1-x2)**2))
I want to optimize npnearest by performing nearest neighbor search directly in numpy. This means that the function cannot use for/while loops.
Thanks
Since you don't need to use that exact function, you can simply change the sum to work over a particular axis. This will return a new list with the calculations and you can call argmin to get the index of the minimum value. Use that and lookup your label:
import numpy as np
def npdistance_idx(x1, x2):
return np.argmin(np.sum((x1-x2)**2, axis=1))
Y = ["label 0", "label 1", "label 2", "label 3"]
u = np.array([[1, 5.5]])
X = np.array([[1,2], [1, 5], [0, 0], [7, 7]])
idx = npdistance_idx(X, u)
print(Y[idx]) # label 1
Numpy supports vectorized operations (broadcasting)
This means you can pass in arrays and operations will be applied to entire arrays in an optimized way (SIMD - single instruction, multiple data)
You can then get the address of the array minimum using .argmin()
Hope this helps
In [9]: numbers = np.arange(10); numbers
Out[9]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [10]: numbers -= 5; numbers
Out[10]: array([-5, -4, -3, -2, -1, 0, 1, 2, 3, 4])
In [11]: numbers = np.power(numbers, 2); numbers
Out[11]: array([25, 16, 9, 4, 1, 0, 1, 4, 9, 16])
In [12]: numbers.argmin()
Out[12]: 5

Generating Position Vectors from Numpy Meshgrid

I'll try to explain my issue here without going into too much detail on the actual application so that we can stay grounded in the code. Basically, I need to do operations to a vector field. My first step is to generate the field as
x,y,z = np.meshgrid(np.linspace(-5,5,10),np.linspace(-5,5,10),np.linspace(-5,5,10))
Keep in mind that this is a generalized case, in the program, the bounds of the vector field are not all the same. In the general run of things, I would expect to say something along the lines of
u,v,w = f(x,y,z).
Unfortunately, this case requires so more difficult operations. I need to use a formula similar to
where the vector r is defined in the program as np.array([xgrid-x,ygrid-y,zgrid-z]) divided by its own norm. Basically, this is a vector pointing from every point in space to the position (x,y,z)
Now Numpy has implemented a cross product function using np.cross(), but I can't seem to create a "meshgrid of vectors" like I need.
I have a lambda function that is essentially
xgrid,ygrid,zgrid=np.meshgrid(np.linspace(-5,5,10),np.linspace(-5,5,10),np.linspace(-5,5,10))
B(x,y,z) = lambda x,y,z: np.cross(v,np.array([xgrid-x,ygrid-y,zgrid-z]))
Now the array v is imported from another class and seems to work just fine, but the second array, np.array([xgrid-x,ygrid-y,zgrid-z]) is not a proper shape because it is a "vector of meshgrids" instead of a "meshgrid of vectors". My big issue is that I cannot seem to find a method by which to format the meshgrid in such a way that the np.cross() function can use the position vector. Is there a way to do this?
Originally I thought that I could do something along the lines of:
x,y,z = np.meshgrid(np.linspace(-2,2,5),np.linspace(-2,2,5),np.linspace(-2,2,5))
A = np.array([x,y,z])
cross_result = np.cross(np.array(v),A)
This, however, returns the following error, which I cannot seem to circumvent:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\numpy\core\numeric.py", line 1682, in cross
raise ValueError(msg)
ValueError: incompatible dimensions for cross product
(dimension must be 2 or 3)
There's a work around with reshape and broadcasting:
A = np.array([x_grid, y_grid, z_grid])
# A.shape == (3,5,5,5)
def B(v, p):
'''
v.shape = (3,)
p.shape = (3,)
'''
shape = A.shape
Ap = A.reshape(3,-1) - p[:,None]
return np.cross(v[None,:], Ap.reshape(3,-1).T).reshape(shape)
print(B(v,p).shape)
# (3, 5, 5, 5)
I think your original attempt only lacks the specification of the axis along which the cross product should be executed.
x, y, z = np.meshgrid(np.linspace(-2, 2, 5),np.linspace(-2, 2, 5), np.linspace(-2, 2, 5))
A = np.array([x, y, z])
cross_result = np.cross(np.array(v), A, axis=0)
I tested this with the code below. As an alternative to np.array([x, y, z]), you can also use np.stack(x, y, z, axis=0), which clearly shows along which axis the meshgrids are stacked to form a meshgrid of vectors, the vectors being aligned with axis 0. I also printed the shape each time and used random input for testing. In the test, the output of the formula is compared at a random index to the cross product of the input-vector at the same index with vector v.
import numpy as np
x, y, z = np.meshgrid(np.linspace(-5, 5, 10), np.linspace(-5, 5, 10), np.linspace(-5, 5, 10))
p = np.random.rand(3) # random reference point
A = np.array([x-p[0], y-p[1], z-p[2]]) # vectors from positions to reference
A_bis = np.stack((x-p[0], y-p[1], z-p[2]), axis=0)
print(f"A equals A_bis? {np.allclose(A, A_bis)}") # the two methods of stacking yield the same
v = -1 + 2*np.random.rand(3) # random vector v
B = np.cross(v, A, axis=0) # cross-product for all points along correct axis
print(f"Shape of v: {v.shape}")
print(f"Shape of A: {A.shape}")
print(f"Shape of B: {B.shape}")
print("\nComparison for random locations: ")
point = np.random.randint(0, 9, 3) # generate random multi-index
a = A[:, point[0], point[1], point[2]] # look up input-vector corresponding to index
b = B[:, point[0], point[1], point[2]] # look up output-vector corresponding to index
print(f"A[:, {point[0]}, {point[1]}, {point[2]}] = {a}")
print(f"v = {v}")
print(f"Cross-product as v x a: {np.cross(v, a)}")
print(f"Cross-product from B (= v x A): {b}")
The resulting output looks like:
A equals A_bis? True
Shape of v: (3,)
Shape of A: (3, 10, 10, 10)
Shape of B: (3, 10, 10, 10)
Comparison for random locations:
A[:, 8, 1, 1] = [-4.03607312 3.72661831 -4.87453077]
v = [-0.90817859 0.10110274 -0.17848181]
Cross-product as v x a: [ 0.17230515 -3.70657882 -2.97637688]
Cross-product from B (= v x A): [ 0.17230515 -3.70657882 -2.97637688]

Getting eigenvalues from 3x3 matrix in Python using Power method

I'm trying to get all eigenvalues from a 3x3 matrix by using Power Method in Python. However my method returns diffrent eigenvalues from the correct ones for some reason.
My matrix: A = [[1, 2, 3], [2, 4, 5], [3, 5,-1]]
Correct eigenvalues: [ 8.54851285, -4.57408723, 0.02557437 ]
Eigenvalues returned by my method: [ 8.5485128481521926, 4.5740872291939381, 9.148174458392436 ]
So the first one is correct, second one has wrong sign and the third one is all wrong. I don't know what I'm doing wrong and I can't see where have I made mistake.
Here's my code:
import numpy as np
import numpy.linalg as la
eps = 1e-8 # Precision of eigenvalue
def trans(v): # translates vector (v^T)
v_1 = np.copy(v)
return v_1.reshape((-1, 1))
def power(A):
eig = []
Ac = np.copy(A)
lamb = 0
for i in range(3):
x = np.array([1, 1, 1])
while True:
x_1 = Ac.dot(x) # y_n = A*x_(n-1)
x_norm = la.norm(x_1)
x_1 = x_1/x_norm # x_n = y_n/||y_n||
if(abs(lamb - x_norm) <= eps): # If precision is reached, it returns eigenvalue
break
else:
lamb = x_norm
x = x_1
eig.append(lamb)
# Matrix Deflaction: A - Lambda * norm[V]*norm[V]^T
v = x_1/la.norm(x_1)
R = v * trans(v)
R = eig[i]*R
Ac = Ac - R
return eig
def main():
A = np.array([1, 2, 3, 2, 4, 5, 3, 5, -1]).reshape((3, 3))
print(power(A))
if __name__ == '__main__':
main()
PS. Is there a simpler way to get the second and third eigenvalue from power method instead of matrix deflaction?
With
lamb = x_norm
you ever only compute the absolute value of the eigenvalues. Better compute them as
lamb = dot(x,x_1)
where x is assumed to be normalized.
As you do not remove the negative eigenvalue -4.57408723, but effectively add it instead, the largest eigenvalue in the third stage is 2*-4.574.. = -9.148.. where you again computed the absolute value.
I didn't know this method, so I googled it and found here:
http://ergodic.ugr.es/cphys/LECCIONES/FORTRAN/power_method.pdf
that it is valid only for finding the leading (largest) eigenvalue, thus, it seems that it is working for you fine, and it is not guaranteed that the following eigenvalues will be correct.
Btw. numpy.linalg.eig() works faster than your code for this matrix, but I am guessing you implemented it as an exercise.

Get shortest path to a cell in a 2D array in Python

I have a 2D array, arr, where each cell in it has a value 1, 2 or 3, for example, arr[0][0] = 3, arr[2][1] = 2, and arr[0][4] = 1.
I want to know the shortest path from a given certain cell, for example, arr[5][5] to the closest cell which has value 2 where the path shouldn't contain any cells that have the value 1. How can I do this?
Below is a script for the BFS, but how can I make it accept a 2D array as a graph and starting point as a certain cell location in the array and then go to the nearest two from this cell avoiding cells with 1s, so that it looks like bfs(2darray, starting location, 2)?
def bfs(graph, start, end):
# Maintain a queue of paths
queue = []
# Push the first path into the queue
queue.append([start])
while queue:
# Get the first path from the queue
path = queue.pop(0)
# Get the last node from the path
node = path[-1]
# Path found
if node == end:
return path
# Enumerate all adjacent nodes, construct a new path and push it into the queue
for adjacent in graph.get(node, []):
new_path = list(path)
new_path.append(adjacent)
queue.append(new_path)
print bfs(graph, '1', '11')
You can use a simple breadth first search for this. Basically, each cell in your grid corresponds to a node in the graph, with edges between adjacent cells. Start at the starting position, and keep expanding passable cells until you find a goal cell.
def bfs(grid, start):
queue = collections.deque([[start]])
seen = set([start])
while queue:
path = queue.popleft()
x, y = path[-1]
if grid[y][x] == goal:
return path
for x2, y2 in ((x+1,y), (x-1,y), (x,y+1), (x,y-1)):
if 0 <= x2 < width and 0 <= y2 < height and grid[y2][x2] != wall and (x2, y2) not in seen:
queue.append(path + [(x2, y2)])
seen.add((x2, y2))
Grid setup and results: (Note that I'm using symbols instead of numbers, simply for the reason that it's easier to visually parse the grid this way and to verify the solution.)
wall, clear, goal = "#", ".", "*"
width, height = 10, 5
grid = ["..........",
"..*#...##.",
"..##...#*.",
".....###..",
"......*..."]
path = bfs(grid, (5, 2))
# [(5, 2), (4, 2), (4, 3), (4, 4), (5, 4), (6, 4)]
If the list is not too big, the easiest solution I find is using the where function of the NumPy library to find the cells which have the value you are looking for. So, you will need to convert your list into a NumPy array.
The code below might be simplified to make it shorter and more efficient, but in this way it will be clearer. By the way, you can compute two kind of distances: the typical Euclidean and the Manhattan.
If there is more than one target cell at the same distance of the origin cell, min_coords corresponds to the first cell found (first by rows, then by columns).
import numpy as np
# The list needs to be transformed into an array in order to use the np.where method
# arr = np.random.randint(5, size=(6, 6))
arr = np.array([[0, 0, 0, 1, 1, 3],
[0, 0, 2, 1, 1, 0],
[0, 0, 1, 1, 1, 1],
[3, 0, 3, 1, 1, 1], ])
# Origin cell to make the search
x0, y0 = (1, 1)
targetValue = 3
# This is the keypoint of the problem: find the positions of the cells containing the searched value
positions = np.where(arr == targetValue)
x, y = positions
dx = abs(x0 - x) # Horizontal distance
dy = abs(y0 - y) # Vertical distance
# There are different criteria to compute distances
euclidean_distance = np.sqrt(dx ** 2 + dy ** 2)
manhattan_distance = abs(dx + dy)
my_distance = euclidean_distance # Criterion choice
min_dist = min(my_distance)
print(min_dist)
min_pos = np.argmin(my_distance) # This method will only return the first occurrence (!)
min_coords = x[min_pos], y[min_pos]
print(min_coords)

Phase correlation for rotation registration using opencv

I'm trying to register two images that are a rotated and translated version of one another using opencv. Generally speaking, the procedure is (pseudo code):
a. IF1 = FFT2(I1); IF2 = FFT2(I2)
b. R_translation = (IF1).*(IF2_conjugate)
c. R_translation = R_translation./abs(R_translation)
d. r_translation = IFFT2(R_translation)
where the maximum of r_translation corresponds to the translation. Moving on to calculate the rotation, the abs value removes the translation part,
e. IF1_abs = abs(IF1); IF2_abs = abs(IF2)
Converting to Linear-Polar coordinates,
f. IF1_abs_pol = LINPOL(IF1_abs); IF2_abs_pol = LINPOL(IF2_abs)
f. IFF1 = FFT2(IF1_abs_pol); IFF2 = FFT2(IF2_abs_pol)
f. R_rot = (IFF1).*(IFF2_conjugate)
c. R_rot = R_rot./abs(R_rot)
d. r_rot = IFFT2(R_rot)
where the maximum of r_rotationn corresponds to the rotation. While for translation alone, the cv2.phaseCorrelate function returns expected results, for rotation, it returns odd results. So I had tried the following.
I took two numpy.array-s 5x5, which are a rotated version of one another like so:
a = numpy.array([[1, 2, 3, 4, 5], [1, 2, 3, 4, 5], [1, 2, 3, 4, 5], [1, 2, 3, 4, 5], [1, 2, 3, 4, 5]])
a = a.astype('float')/a.astype('float').max()
b = numpy.array([[5, 5, 5, 5, 5], [4, 4, 4, 4, 4], [3, 3, 3, 3, 3], [2, 2, 2, 2, 2], [1, 1, 1, 1, 1]])
b = b.astype('float') / b.astype('float').max()
First I calculated the phase correlation myself:
center_x = numpy.floor(a.shape[0] / 2.0)#the x center of rotation (= x center of image)
center_y = numpy.floor(a.shape[1] / 2.0)#the y center of rotation (= y center of image)
Mvalue = a.shape[1] / numpy.sqrt(
((a.shape[0] / 2.0) ** 2.0) + ((a.shape[1] / 2.0) ** 2.0)) # rotation radius
Calculating the FFT, taking the absolute value (losing the translation difference data if existed), and switching to Linear-Polar coordinates and normalizing:
a_polar = cv2.linearPolar(numpy.abs(numpy.fft.fft2(a)), (center_x, center_y), Mvalue, cv2.WARP_FILL_OUTLIERS)
b_polar = cv2.linearPolar(numpy.abs(numpy.fft.fft2(b)), (center_x, center_y), Mvalue, cv2.WARP_FILL_OUTLIERS)
a_polar = a_polar/a_polar.max()
b_polar = b_polar / b_polar.max()
Another FFT step, multiplying point wise, and IFFT back:
aff = numpy.fft.fft2(a_polar)
bff = numpy.fft.fft2(b_polar)
R = aff * numpy.ma.conjugate(bff)
R = R / numpy.absolute(R)
r = numpy.fft.ifft2(R).real
r = r/r.max()
yields,
Phase correlation for rotation, b with respect to a
According to cv2.linearPolar() the rows, span the angle (in this case with step size of 360/5 = 72degrees) and the columns span the radius (from 0 to the maximum radius given in Mvalue. The maximum is evident at the last row (corresponding to approximately -90degree shift). So far so good..
The second method is using cv2.phaseCorrelate() directly,
r_direct = cv2.phaseCorrelate(a_polar, b_polar)
which yields,
Phase correlation for rotation, b with respect to a direct method
The first tuple, is the X,Y correlation coefficient (in pixels?) and the third number is the fit grade. When it is close to unity, the correlation coefficient represents better the data (the blob around the maximum is more distinct).
Other than the fact that the result is not distinct enough (why?), the result is confusing...
Generally, The first FFT process in this 5x5 example was not necessary. If rotation is the only interference, one can immediately switch to Linear-Polar coordinates and use cv2.phaseCorrelate. In that case, the result is also confusing.
Any help would be appreciated :)
Thanks!
David

Categories