Python: Connected components on a sphere - python

I have been banging my head against this for some time now. My problem is very simple to explain:
I have data containing longitudes and latitudes. For simplicity, let us assume these are coordinates of cities. What I want is to separate these city coordinates into groups, so that all cities within a group lie within a given 'maximum distance' to it's nearest neighbour. All cities within a group must have at least one neighbour within this distance limit. The minimum distance between these separated groups is therefore greater than 'maximum distance' mentioned above.
My understanding is that this is a clustering problem (e.g. minimum spanning tree). The distance on the sphere can be calculated with the haversine distance, but I can't wrap my head around how to implement this...my restriction are that I can only use numpy, scipy, and scikit-learn.
I hope someone can help
thanks

Ok, so I have implemented a brute force approach to solve this. I am not 100% sure if the results are correct in all cases, though...if some of you have time to check this, it would be greatly appreciated.
import numpy as np
import matplotlib.pyplot as plt
# -------------------------------------------------------------------
def distance_sphere(lon1, lat1, lon2, lat2):
# Calculate distance on sphere
return np.degrees(np.arccos(np.sin(np.radians(lat1)) * np.sin(np.radians(lat2)) +
np.cos(np.radians(lat1)) * np.cos(np.radians(lat2)) *
np.cos(np.radians(lon1 - lon2))))
# -------------------------------------------------------------------
def distance_euclid(lon1, lat1, lon2, lat2):
# Calculate distance
return np.sqrt((lon1 - lon2)**2 + (lat1 - lat2)**2)
# -------------------------------------------------------------------
# Maximum allowed distance in degrees
max_distance = 10
# Generate city coordinates
lon_all = np.random.random(100) * 100
lat_all = np.random.random(100) * 100
# Start with as many groups as cities
group = np.arange(len(lon_all))
# Loop over all city coordinates
for lon, lat in zip(lon_all, lat_all):
# Calculate distance to all other cities
dis = distance_euclid(lon1=lon, lat1=lat, lon2=lon_all, lat2=lat_all)
# Get index of those which are within the given limits
idx = np.where(dis <= max_distance)[0]
# If there is no other city, we continue
if len(idx) == 0:
continue
# Set common group for all cities within the limits
for i in idx:
group[group == group[i]] = min(group[idx])
# Rewrite labels starting with 0
for old, new in zip(set(group), range(len(set(group)))):
idx = [i for i, j in enumerate(group) if j == old]
group[idx] = new
# -------------------------------------------------------------------
# Plot results
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=[10, 10])
for g, lon, lat in zip(group, lon_all, lat_all):
ax.annotate(str(g), xy=(lon, lat), xycoords="data", size=12, ha="center", va="center")
circ = plt.Circle((lon, lat), radius=max_distance/2, lw=0, color="gray")
ax.add_patch(circ)
ax.set_xlim(-10, 110)
ax.set_ylim(-10, 110)
plt.show()

From the graphical output as it stands in your answer, I believe that your clusters are being terminated prematurely. This is my approach to the problem; the code is ugly because really I just wanted to demonstrate the concept and I don't have time to think about the most elegant way to illustrate this. Also, it's not in numpy because then I could steal my old distance calculation function to save me some time. Hopefully the concept though is clear enough and you'll see how it could be made faster and cleaner e.g. not repeatedly rebuilding available_locations and maybe not re-scanning items in the cluster from previous iteration.
Edit: Illustrated behaviour:
1) Always converges on same solution for each DISTANCE_CAP regardless of all the randomisation in the initialisation and progression of the solution
2) Modifying DISTANCE_CAP can result in single-location clusters or a giant blob
import math
from random import choice, shuffle
DISTANCE_CAP = 20
def crow_flies(lat1, lon1, lat2, lon2):
dx1,dy1 = (lat1/180)*3.141593,(lon1/180)*3.141593
dx2,dy2 = (lat2/180)*3.141593,(lon2/180)*3.141593
dlat,dlon = abs(dx2-dx1),abs(dy2-dy1)
a = (math.sin(dlat/2))**2 + (math.cos(dx1) * math.cos(dx2)
* (math.sin(dlon/2))**2)
c = 2*(math.atan2(math.sqrt(a),math.sqrt(1-a)))
km = 6373 * c
return km
# Aim: separate these back out
manchester = [[53.486286, -2.251476, 1],
[53.483586, -2.254534, 2],
[53.475158, -2.248011, 3],
[53.397161, -2.509189, 4]]
stoke = [[53.037375, -2.262903, 5],
[53.031031, -2.199587, 6]]
birmingham = [[52.443368, -1.975714, 7],
[52.429641, -1.902849, 8],
[52.483326, -1.817483, 9]]
# Mix them all together
combined_list = [item for item in manchester]
for item in stoke:
combined_list.append(item)
for item in birmingham:
combined_list.append(item)
shuffle(combined_list)
# Build a matrix:
matrix = {}
for item in combined_list:
for pair_item in combined_list:
if item[2] != pair_item[2]:
distance = crow_flies(item[0], item[1], pair_item[0], pair_item[1])
matrix[(item[2], pair_item[2])] = distance
# pick a random starting location
available_locations = [combined_list[x][2] for x in range(len(combined_list))]
start_loc = choice(available_locations)
available_locations = [a for a in available_locations if a != start_loc]
all_clusters = []
single_cluster = []
single_cluster.append(start_loc)
# RECURSIVELY add items to our cluster until it cannot get larger, then start a
# new one
cluster_got_bigger = True
while available_locations:
if cluster_got_bigger == True:
cluster_got_bigger = False
for loc in single_cluster:
for item in available_locations:
distance = matrix[(loc, item)]
if distance < DISTANCE_CAP:
single_cluster.append(item)
available_locations = [a for a in available_locations if a != item]
cluster_got_bigger = True
if cluster_got_bigger == False:
all_clusters.append(single_cluster)
single_cluster = []
new_seed = choice(available_locations)
single_cluster.append(new_seed)
available_locations = [a for a in available_locations if a != new_seed]
cluster_got_bigger = True
if not available_locations:
all_clusters.append(single_cluster)
print all_clusters

May be my answer is too late.
But a quick solution is to construct a network data-structure from your cities and get the connected components of your graph:
Each city is a node
There is an edge between two cities if their inter-distance is lower than some threshold
Finally, use some python network module (i.e NetworkX).
The code will be something like this:
import networkx as nx
graph = nx.Graph()
# Add all vertices (cities) to the graph
for i, city in enumerate(cities):
graph.add_vertex(i)
# Add edges between cities that lie under a distance threshold
for i, city_one in enumerate(cities):
for j, city_two in enumerate(cities):
if j > i:
link_exists = calculate_distance(city_one, city_two) < threshold
if link_exists:
graph.add_edge(i,j)
# A list of sets, each set has the indices of cities
components = [c for c in sorted(nx.connected_components(G), reverse=False)]
The calculate_distance and threshold are supposed to be known, the first is a function and the second is the distance threshold.

Related

Order 2d points based on distance from each other [duplicate]

I have a list of (x,y)-coordinates that represent a line skeleton.
The list is obtained directly from a binary image:
import numpy as np
list=np.where(img_skeleton>0)
Now the points in the list are sorted according to their position in the image along one of the axes.
I would like to sort the list such that the order represents a smooth path along the line. (This is currently not the case where the line curves back).
Subsequently, I want to fit a spline to these points.
A similar problem has been described and solved using arcPy here. Is there a convenient way to achieve this using python, numpy, scipy, openCV (or another library?)
below is an example image. it results in a list of 59 (x,y)-coordinates.
when I send the list to scipy's spline fitting routine, I am running into a problem because the points aren't 'ordered' on the line:
I apologize for the long answer in advance :P (the problem is not that simple).
Lets start by rewording the problem. Finding a line that connects all the points, can be reformulated as a shortest path problem in a graph, where (1) the graph nodes are the points in the space, (2) each node is connected to its 2 nearest neighbors, and (3) the shortest path passes through each of the nodes only once. That last constrain is a very important (and quite hard one to optimize). Essentially, the problem is to find a permutation of length N, where the permutation refers to the order of each of the nodes (N is the total number of nodes) in the path.
Finding all the possible permutations and evaluating their cost is too expensive (there are N! permutations if I'm not wrong, which is too big for problems). Bellow I propose an approach that finds the N best permutations (the optimal permutation for each of the N points) and then find the permutation (from those N) that minimizes the error/cost.
1. Create a random problem with unordered points
Now, lets start to create a sample problem:
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 2 * np.pi, 100)
y = np.sin(x)
plt.plot(x, y)
plt.show()
And here, the unsorted version of the points [x, y] to simulate a random points in space connected in a line:
idx = np.random.permutation(x.size)
x = x[idx]
y = y[idx]
plt.plot(x, y)
plt.show()
The problem is then to order those points to recover their original order so that the line is plotted properly.
2. Create 2-NN graph between nodes
We can first rearrange the points in a [N, 2] array:
points = np.c_[x, y]
Then, we can start by creating a nearest neighbour graph to connect each of the nodes to its 2 nearest neighbors:
from sklearn.neighbors import NearestNeighbors
clf = NearestNeighbors(2).fit(points)
G = clf.kneighbors_graph()
G is a sparse N x N matrix, where each row represents a node, and the non-zero elements of the columns the euclidean distance to those points.
We can then use networkx to construct a graph from this sparse matrix:
import networkx as nx
T = nx.from_scipy_sparse_matrix(G)
3. Find shortest path from source
And, here begins the magic: we can extract the paths using dfs_preorder_nodes, which will essentially create a path through all the nodes (passing through each of them exactly once) given a starting node (if not given, the 0 node will be selected).
order = list(nx.dfs_preorder_nodes(T, 0))
xx = x[order]
yy = y[order]
plt.plot(xx, yy)
plt.show()
Well, is not too bad, but we can notice that the reconstruction is not optimal. This is because the point 0 in the unordered list lays in the middle of the line, that is way it first goes in one direction, and then comes back and finishes in the other direction.
4. Find the path with smallest cost from all sources
So, in order to obtain the optimal order, we can just get the best order for all the nodes:
paths = [list(nx.dfs_preorder_nodes(T, i)) for i in range(len(points))]
Now that we have the optimal path starting from each of the N = 100 nodes, we can discard them and find the one that minimizes the distances between the connections (optimization problem):
mindist = np.inf
minidx = 0
for i in range(len(points)):
p = paths[i] # order of nodes
ordered = points[p] # ordered nodes
# find cost of that order by the sum of euclidean distances between points (i) and (i+1)
cost = (((ordered[:-1] - ordered[1:])**2).sum(1)).sum()
if cost < mindist:
mindist = cost
minidx = i
The points are ordered for each of the optimal paths, and then a cost is computed (by calculating the euclidean distance between all pairs of points i and i+1). If the path starts at the start or end point, it will have the smallest cost as all the nodes will be consecutive. On the other hand, if the path starts at a node that lies in the middle of the line, the cost will be very high at some point, as it will need to travel from the end (or beginning) of the line to the initial position to explore the other direction. The path that minimizes that cost, is the path starting in an optimal point.
opt_order = paths[minidx]
Now, we can reconstruct the order properly:
xx = x[opt_order]
yy = y[opt_order]
plt.plot(xx, yy)
plt.show()
One possible solution is to use a nearest neighbours approach, possible by using a KDTree. Scikit-learn has an nice interface. This can then be used to build a graph representation using networkx. This will only really work if the line to be drawn should go through the nearest neighbours:
from sklearn.neighbors import KDTree
import numpy as np
import networkx as nx
G = nx.Graph() # A graph to hold the nearest neighbours
X = [(0, 1), (1, 1), (3, 2), (5, 4)] # Some list of points in 2D
tree = KDTree(X, leaf_size=2, metric='euclidean') # Create a distance tree
# Now loop over your points and find the two nearest neighbours
# If the first and last points are also the start and end points of the line you can use X[1:-1]
for p in X
dist, ind = tree.query(p, k=3)
print ind
# ind Indexes represent nodes on a graph
# Two nearest points are at indexes 1 and 2.
# Use these to form edges on graph
# p is the current point in the list
G.add_node(p)
n1, l1 = X[ind[0][1]], dist[0][1] # The next nearest point
n2, l2 = X[ind[0][2]], dist[0][2] # The following nearest point
G.add_edge(p, n1)
G.add_edge(p, n2)
print G.edges() # A list of all the connections between points
print nx.shortest_path(G, source=(0,1), target=(5,4))
>>> [(0, 1), (1, 1), (3, 2), (5, 4)] # A list of ordered points
Update: If the start and end points are unknown and your data is reasonably well separated, you can find the ends by looking for cliques in the graph. The start and end points will form a clique. If the longest edge is removed from the clique it will create a free end in the graph which can be used as a start and end point. For example, the start and end points in this list appear in the middle:
X = [(0, 1), (0, 0), (2, 1), (3, 2), (9, 4), (5, 4)]
After building the graph, now its a case of removing the longest edge from the cliques to find the free ends of the graph:
def find_longest_edge(l):
e1 = G[l[0]][l[1]]['weight']
e2 = G[l[0]][l[2]]['weight']
e3 = G[l[1]][l[2]]['weight']
if e2 < e1 > e3:
return (l[0], l[1])
elif e1 < e2 > e3:
return (l[0], l[2])
elif e1 < e3 > e2:
return (l[1], l[2])
end_cliques = [i for i in list(nx.find_cliques(G)) if len(i) == 3]
edge_lengths = [find_longest_edge(i) for i in end_cliques]
G.remove_edges_from(edge_lengths)
edges = G.edges()
start_end = [n for n,nbrs in G.adjacency_iter() if len(nbrs.keys()) == 1]
print nx.shortest_path(G, source=start_end[0], target=start_end[1])
>>> [(0, 0), (0, 1), (2, 1), (3, 2), (5, 4), (9, 4)] # The correct path
I had the exact same problem. If you have two arrays of scattered x and y values that are not too curvy, then you can transform the points into PCA space, sort them in PCA space, and then transform them back. (I've also added in some bonus smoothing functionality).
import numpy as np
from scipy.signal import savgol_filter
from sklearn.decomposition import PCA
def XYclean(x,y):
xy = np.concatenate((x.reshape(-1,1), y.reshape(-1,1)), axis=1)
# make PCA object
pca = PCA(2)
# fit on data
pca.fit(xy)
#transform into pca space
xypca = pca.transform(xy)
newx = xypca[:,0]
newy = xypca[:,1]
#sort
indexSort = np.argsort(x)
newx = newx[indexSort]
newy = newy[indexSort]
#add some more points (optional)
f = interpolate.interp1d(newx, newy, kind='linear')
newX=np.linspace(np.min(newx), np.max(newx), 100)
newY = f(newX)
#smooth with a filter (optional)
window = 43
newY = savgol_filter(newY, window, 2)
#return back to old coordinates
xyclean = pca.inverse_transform(np.concatenate((newX.reshape(-1,1), newY.reshape(-1,1)), axis=1) )
xc=xyclean[:,0]
yc = xyclean[:,1]
return xc, yc
I agree with Imanol_Luengo Imanol Luengo's solution, but if you know the index of the first point, then there is a considerably easier solution that uses only NumPy:
def order_points(points, ind):
points_new = [ points.pop(ind) ] # initialize a new list of points with the known first point
pcurr = points_new[-1] # initialize the current point (as the known point)
while len(points)>0:
d = np.linalg.norm(np.array(points) - np.array(pcurr), axis=1) # distances between pcurr and all other remaining points
ind = d.argmin() # index of the closest point
points_new.append( points.pop(ind) ) # append the closest point to points_new
pcurr = points_new[-1] # update the current point
return points_new
This approach appears to work well with the sine curve example, especially because it is easy to define the first point as either the leftmost or rightmost point.
For the img_skeleton data cited in the question, it would be similarly easy to algorithmically obtain the first point, for example as the topmost point.
# create sine curve:
x = np.linspace(0, 2 * np.pi, 100)
y = np.sin(x)
# shuffle the order of the x and y coordinates:
idx = np.random.permutation(x.size)
xs,ys = x[idx], y[idx] # shuffled points
# find the leftmost point:
ind = xs.argmin()
# assemble the x and y coordinates into a list of (x,y) tuples:
points = [(xx,yy) for xx,yy in zip(xs,ys)]
# order the points based on the known first point:
points_new = order_points(points, ind)
# plot:
fig,ax = plt.subplots(1, 2, figsize=(10,4))
xn,yn = np.array(points_new).T
ax[0].plot(xs, ys) # original (shuffled) points
ax[1].plot(xn, yn) # new (ordered) points
ax[0].set_title('Original')
ax[1].set_title('Ordered')
plt.tight_layout()
plt.show()
I am working on a similar problem, but it has an important constraint (much like the example given by the OP) which is that each pixel has either one or two neighboring pixel, in the 8-connected sense. With this constraint, there is a very simple solution.
def sort_to_form_line(unsorted_list):
"""
Given a list of neighboring points which forms a line, but in random order,
sort them to the correct order.
IMPORTANT: Each point must be a neighbor (8-point sense)
to a least one other point!
"""
sorted_list = [unsorted_list.pop(0)]
while len(unsorted_list) > 0:
i = 0
while i < len(unsorted_list):
if are_neighbours(sorted_list[0], unsorted_list[i]):
#neighbours at front of list
sorted_list.insert(0, unsorted_list.pop(i))
elif are_neighbours(sorted_list[-1], unsorted_list[i]):
#neighbours at rear of list
sorted_list.append(unsorted_list.pop(i))
else:
i = i+1
return sorted_list
def are_neighbours(pt1, pt2):
"""
Check if pt1 and pt2 are neighbours, in the 8-point sense
pt1 and pt2 has integer coordinates
"""
return (np.abs(pt1[0]-pt2[0]) < 2) and (np.abs(pt1[1]-pt2[1]) < 2)
Modifying upon Toddp's answer , you can find end-points of arbitrarily shaped lines using this code and then order the points as Toddp stated, this is much faster than Imanol Luengo's answer, the only constraint is that the line must have only 2 end-points :
def order_points(points):
if isinstance(points,np.ndarray):
assert points.shape[1]==2
points = points.tolist()
exts = get_end_points(points)
assert len(exts) ==2
ind = points.index(exts[0])
points_new = [ points.pop(ind) ] # initialize a new list of points with the known first point
pcurr = points_new[-1] # initialize the current point (as the known point)
while len(points)>0:
d = np.linalg.norm(np.array(points) - np.array(pcurr), axis=1) # distances between pcurr and all other remaining points
ind = d.argmin() # index of the closest point
points_new.append( points.pop(ind) ) # append the closest point to points_new
pcurr = points_new[-1] # update the current point
return points_new
def get_end_points(ptsxy):
#source : https://stackoverflow.com/a/67145008/10998081
if isinstance(ptsxy,list): ptsxy = np.array(ptsxy)
assert ptsxy.shape[1]==2
#translate to (0,0)for faster excution
xx,yy,w,h = cv2.boundingRect(ptsxy)
pts_translated = ptsxy -(xx,yy)
bim = np.zeros((h+1,w+1))
bim[[*np.flip(pts_translated).T]]=255
extremes = []
for p in pts_translated:
x = p[0]
y = p[1]
n = 0
n += bim[y - 1,x]
n += bim[y - 1,x - 1]
n += bim[y - 1,x + 1]
n += bim[y,x - 1]
n += bim[y,x + 1]
n += bim[y + 1,x]
n += bim[y + 1,x - 1]
n += bim[y + 1,x + 1]
n /= 255
if n == 1:
extremes.append(p)
extremes = np.array(extremes)+(xx,yy)
return extremes.tolist()

The area & center of gravity of a polygon having non-uniform density of vertices? (in Python)

I would like to calculate the COG of a polygon shaped exactly like the contour map of my town. However, using the available database of borderpoints would produce a rigged result, since some places have much bigger density of borderpoints than others, so the center of gravity would be skewed towards these regions. I tried to equalise the density of vertices by producing this Python code:
import numpy as np
punkty = open("borderpoints.txt","r", encoding = "utf8")
tempp = []
a = []
for line in punkty:
for c in line:
if c != " ":
tempp.append(c)
else:
p = "".join(tempp)
a.append(p)
tempp = []
i = 0
x= []
y = []
fx = open("outx1.txt", "w")
fy = open("outy1.txt", "w")
while i<len(a)-1:
x.append(a[i])
fx.write(a[i])
fx.write("\n")
y.append(a[i+1])
fy.write(a[i+1])
fy.write("\n")
i= i+2
j = 0
jump = 20
newxs = []
newys = []
fnx = open("newxs.txt","w")
fny = open("newys.txt", "w")
while j<len(x):
L = np.sqrt(pow((float(y[j+1])-float(y[j])),2)+pow((float(x[j+1])-float(x[j])),2))
n = jump*L
interval = (float(y[j+1])-float(y[j]))/n
k = 1
slope = (float(x[j+1])-float(x[j]))/(float(y[j+1])-float(y[j]))
inters = float(x[j+1])-slope*float(y[j+1])
while k<n+1:
g = float(y[j])+k*interval
newxs.append(g)
fnx.write(str(g))
fnx.write("\n")
g = (slope*(float(y[j])+k*interval)+inters)
newys.append(g)
fny.write(str(g))
fny.write("\n")
k = k+1
j = j+2
k=1
newxs.append(x)
newys.append(y)
but in the result, the points were denser everywhere except places that were previously empty and were supposed to get populated by the algorithm.
The graphs of the map before the application of the algorithm and
after (some proportions may vary but the main problem is the empty spot).
What is the approach that I could use in solving this problem? How to make the points distributed equally or maybe it's possible to calculate the COG with some other method?
My aim is that the amount of points shouldn't determine the COG, but rather determine the position of polygon sides - these are most important here, but obviously there is no database for them and it's harder to calculate the COG having a lot of linear functions and their ranges.

An algorithm to sort top and bottom slices of curved surfaces

I try to do:
Cut STL file https://www.dropbox.com/s/pex20yqfgmxgt0w/wing_fish.stl?dl=0 at Z-coordinate using PyVsita https://docs.pyvista.org/ )
Extract point's coordinates X, Y at given section Z
Sort points to Upper and Down groups for further manipulation
Here is my code:
import pyvista as pv
import matplotlib.pylab as plt
import numpy as np
import math
mesh = pv.read('wing_fish.stl')
z_slice = [0, 0, 1] # normal to cut at
single_slice = mesh.slice(normal=z_slice, origin=[0, 0, 200]) # slicing
a = single_slice.points # choose only points
# p = pv.Plotter() #show section
# p.add_mesh(single_slice)
# p.show()
a = a[a[:,0].astype(float).argsort()] # sort all points by Х coord
# X min of all points
x0 = a[0][0]
# Y min of all points
y0 = a[0][1]
# X tail 1 of 2
xn = a[-1][0]
# Y tail 1 of 2
yn = a[-1][1]
# X tail 2 of 2
xn2 = a[-2][0]
# Y tail 2 of 2
yn2 = a[-2][1]
def line_y(x, x0, y0, xn, yn):
# return y coord at arbitary x coord of x0, y0 xn, yn LINE
return ((x - x0)*(yn-y0))/(xn-x0)+y0
def line_c(x0, y0, xn, yn):
# return x, y middle points of LINE
xc = (x0+xn)/2
yc = (y0+yn)/2
return xc, yc
def chord(P1, P2):
return math.sqrt((P2[1] - P1[1])**2 + (P2[0] - P1[0])**2)
xc_end, yc_end = line_c(xn, yn, xn2, yn2) # return midle at trailing edge
midLine = np.array([[x0,y0],[xc_end,yc_end]],dtype='float32')
c_temp_x_d = []
c_temp_y_d = []
c_temp_x_u = []
c_temp_y_u = []
isUp = None
isDown = None
for i in a:
if i[1] == line_y(i[0], x0=x0, y0=y0, xn=xc_end, yn=yc_end):
continue
elif i[1] < line_y(i[0], x0=x0, y0=y0, xn=xc_end, yn=yc_end):
c_temp_y_d.append(i[1])
c_temp_x_d.append(i[0])
isDown = True
else:
c_temp_y_u.append(i[1])
c_temp_x_u.append(i[0])
isUp = True
if len(c_temp_y_d) != 0 and len(c_temp_y_u) != 0:
print(c_temp_y_d[-1])
plt.plot(c_temp_x_d, c_temp_y_d, label='suppose to be down points')
plt.plot(c_temp_x_u, c_temp_y_u, label='suppose to be upper points')
plt.plot(midLine[:,0], midLine[:,1], label='Chord')
plt.scatter(a[:,0],a[:,1], label='raw points')
plt.legend();plt.grid();plt.show()
What I have:
What I want:
I would highly appreciate for any help and advises!
Thanks in advance!
You are discarding precious connectivity information that is already there in your STL mesh and in your slice!
I couldn't think of a more idiomatic solution within PyVista, but at worst you can take the cell (line) information from the slice and start walking your shape (that is topologically equivalent to a circle) from its left side to its right, and vice versa. Here's one way:
import numpy as np
import matplotlib.pyplot as plt
import pyvista as pv
mesh = pv.read('../wing_fish.stl')
z_slice = [0, 0, 1] # normal to cut at
single_slice = mesh.slice(normal=z_slice, origin=[0, 0, 200]) # slicing
# find points with smallest and largest x coordinate
points = single_slice.points
left_ind = points[:, 0].argmin()
right_ind = points[:, 0].argmax()
# sanity check for what we're about to do:
# 1. all cells are lines
assert single_slice.n_cells == single_slice.n_points
assert (single_slice.lines[::3] == 2).all()
# 2. all points appear exactly once as segment start and end
lines = single_slice.lines.reshape(-1, 3) # each row: [2, i_from, i_to]
assert len(set(lines[:, 1])) == lines.shape[0]
# create an auxiliary dict with from -> to index mappings
conn = dict(lines[:, 1:])
# and a function that walks this connectivity graph
def walk_connectivity(connectivity, start, end):
this_ind = start
path_inds = [this_ind]
while True:
next_ind = connectivity[this_ind]
path_inds.append(next_ind)
this_ind = next_ind
if this_ind == end:
# we're done
return path_inds
# start walking at point left_ind, walk until right_ind
first_side_inds = walk_connectivity(conn, left_ind, right_ind)
# now walk forward for the other half curve
second_side_inds = walk_connectivity(conn, right_ind, left_ind)
# get the point coordinates for plotting
first_side_points = points[first_side_inds, :-1]
second_side_points = points[second_side_inds, :-1]
# plot the two sides
fig, ax = plt.subplots()
ax.plot(*first_side_points.T)
ax.plot(*second_side_points.T)
plt.show()
In order to avoid using an O(n^2) algorithm, I defined an auxiliary dict that maps line segment start indices to end indices. In order for this to work we need some sanity checks, namely that the cells are all simple line segments, and that each segment has the same orientation (i.e. each start point is unique, and each end point is unique). Once we have this it's easy to start from the left edge of your wing profile and walk each line segment until we find the right edge.
The nature of this approach implies that we can't know a priori whether the path from left to right goes on the upper or the lower path. This needs experimentation on your part; name the two paths in whatever way you see fit.
And of course there's always room for fine tuning. For instance, the above implementation creates two paths that both start and end with the left and right-side boundary points of the mesh. If you want the top and bottom curves to share no points, you'll have to adjust the algorithm accordingly. And if the end point is not found on the path then the current implementation will give you an infinite loop with a list growing beyond all available memory. Consider adding some checks in the implementation to avoid this.
Anyway, this is what we get from the above:

Calculate distance from one point to all others

I am working with a list of ID, X, and Y data for fire hydrant locations. I am trying to find the three closest fire hydrants for each fire hydrant in the list.
a = [[ID, X, Y],[ID, X, Y]]
I have tried implementing this using a for loop but I am having trouble because I cannot keep the original point data the same while iterating through the list of points.
Is there a strait forward way to calculate the distance from one point to each of the other points and iterate this for each point in the list? I am very new to python and have not seen anything about how to do this online.
Any help would be greatly appreciated.
You do not have to calculate all distances of all points to all others to get the three nearest neighbours for all points.
A kd-tree search will be much more efficient due to its O(log n) complexity instead of a O(n**2) time complexity for the brute force method (calculating all distances).
Example
import numpy as np
from scipy import spatial
#Create some coordinates and indices
#It is assumed that the coordinates are unique (only one entry per hydrant)
Coords=np.random.rand(1000*2).reshape(1000,2)
Coords*=100
Indices=np.arange(1000) #Indices
def get_indices_of_nearest_neighbours(Coords,Indices):
tree=spatial.cKDTree(Coords)
#k=4 because the first entry is the nearest neighbour
# of a point with itself
res=tree.query(Coords, k=4)[1][:,1:]
return Indices[res]
Here you go. Let's say you have an input list with this format [[ID, X, Y],[ID, X, Y]].
You can simply loop through each hydrant when looping through each hydrant and calculate the min distance between them. You just need to have some variable to store the min distance for each hydrant and the ID of the closest hydrant.
import math # for sqrt calculation
def distance(p0, p1):
""" Calculate the distance between two hydrant """
return math.sqrt((p0[1] - p1[1])**2 + (p0[2] - p1[2])**2)
input = [[0, 1, 2], [1, 2, -3], [2, -3, 5]] # your input list of hydrant
for current_hydrant in input: # loop through each hydrant
min_distance = 999999999999999999999999
closest_hydrant = 0
for other_hydrant in input: # loop through each other hydrant
if current_hydrant != other_hydrant:
curr_distance = distance(current_hydrant, other_hydrant) # call the distance function
if curr_distance < min_distance: # find the closet hydrant
min_distance = curr_distance
closest_hydrant = other_hydrant[0]
print("Closest fire hydrants to the", current_hydrant[0], "is the hydrants",
closest_hydrant, "with the distance of", min_distance) # print the closet hydrant
Since the distance function is not very complicated i rewrite it, you can use some other function in scipy or numpy library to get the distance.
Hope this can help ;)
If you have geolocation, we can perform simple distance calculation(https://en.m.wikipedia.org/wiki/Haversine_formula) to get kilometers distance between two locations. This code is NOT meant to be efficient. If this is what you want we can use numpy to speed it up:
import math
def distance(lat,lon, lat2,lon2):
R = 6372.8 # Earth radius in kilometers
# change lat and lon to radians to find diff
rlat = math.radians(lat)
rlat2 = math.radians(lat2)
rlon = math.radians(lon)
rlon2 = math.radians(lon2)
dlat = math.radians(lat2 - lat)
dlon = math.radians(lon2 - lon)
m = math.sin(dlat/2)**2 + \
math.cos(rlat)*math.cos(rlat2)*math.sin(dlon/2)**2
return 2 * R * math.atan2(math.sqrt(m),
math.sqrt(1 - m))
a = [['ID1', 52.5170365, 13.3888599],
['ID2', 54.5890365, 12.5865499],
['ID3', 50.5170365, 10.3888599],
]
b = []
for id, lat, lon in a:
for id2, lat2, lon2 in a:
if id != id2:
d = distance(lat,lon,lat2,lon2)
b.append([id,id2,d])
print(b)

Generating multiple random (x, y) coordinates, excluding duplicates?

I want to generate a bunch (x, y) coordinates from 0 to 2500 that excludes points that are within 200 of each other without recursion.
Right now I have it check through a list of all previous values to see if any are far enough from all the others. This is really inefficient and if I need to generate a large number of points it takes forever.
So how would I go about doing this?
This is a variant on Hank Ditton's suggestion that should be more efficient time- and memory-wise, especially if you're selecting relatively few points out of all possible points. The idea is that, whenever a new point is generated, everything within 200 units of it is added to a set of points to exclude, against which all freshly-generated points are checked.
import random
radius = 200
rangeX = (0, 2500)
rangeY = (0, 2500)
qty = 100 # or however many points you want
# Generate a set of all points within 200 of the origin, to be used as offsets later
# There's probably a more efficient way to do this.
deltas = set()
for x in range(-radius, radius+1):
for y in range(-radius, radius+1):
if x*x + y*y <= radius*radius:
deltas.add((x,y))
randPoints = []
excluded = set()
i = 0
while i<qty:
x = random.randrange(*rangeX)
y = random.randrange(*rangeY)
if (x,y) in excluded: continue
randPoints.append((x,y))
i += 1
excluded.update((x+dx, y+dy) for (dx,dy) in deltas)
print randPoints
I would overgenerate the points, target_N < input_N, and filter them using a KDTree. For example:
import numpy as np
from scipy.spatial import KDTree
N = 20
pts = 2500*np.random.random((N,2))
tree = KDTree(pts)
print tree.sparse_distance_matrix(tree, 200)
Would give me points that are "close" to each other. From here it should be simple to apply any filter:
(11, 0) 60.843426339
(0, 11) 60.843426339
(1, 3) 177.853472309
(3, 1) 177.853472309
Some options:
Use your algorithm but implement it with a kd-tree that would speed up nearest neighbours look-up
Build a regular grid over the [0, 2500]^2 square and 'shake' all points randomly with a bi-dimensional normal distribution centered on each intersection in the grid
Draw a larger number of random points then apply a k-means algorithm and only keep the centroids. They will be far away from one another and the algorithm, though iterative, could converge more quickly than your algorithm.
This has been answered, but it's very tangentially related to my work so I took a stab at it. I implemented the algorithm described in this note which I found linked from this blog post. Unfortunately it's not faster than the other proposed methods, but I'm sure there are optimizations to be made.
import numpy as np
import matplotlib.pyplot as plt
def lonely(p,X,r):
m = X.shape[1]
x0,y0 = p
x = y = np.arange(-r,r)
x = x + x0
y = y + y0
u,v = np.meshgrid(x,y)
u[u < 0] = 0
u[u >= m] = m-1
v[v < 0] = 0
v[v >= m] = m-1
return not np.any(X[u[:],v[:]] > 0)
def generate_samples(m=2500,r=200,k=30):
# m = extent of sample domain
# r = minimum distance between points
# k = samples before rejection
active_list = []
# step 0 - initialize n-d background grid
X = np.ones((m,m))*-1
# step 1 - select initial sample
x0,y0 = np.random.randint(0,m), np.random.randint(0,m)
active_list.append((x0,y0))
X[active_list[0]] = 1
# step 2 - iterate over active list
while active_list:
i = np.random.randint(0,len(active_list))
rad = np.random.rand(k)*r+r
theta = np.random.rand(k)*2*np.pi
# get a list of random candidates within [r,2r] from the active point
candidates = np.round((rad*np.cos(theta)+active_list[i][0], rad*np.sin(theta)+active_list[i][1])).astype(np.int32).T
# trim the list based on boundaries of the array
candidates = [(x,y) for x,y in candidates if x >= 0 and y >= 0 and x < m and y < m]
for p in candidates:
if X[p] < 0 and lonely(p,X,r):
X[p] = 1
active_list.append(p)
break
else:
del active_list[i]
return X
X = generate_samples(2500, 200, 10)
s = np.where(X>0)
plt.plot(s[0],s[1],'.')
And the results:
Per the link, the method from aganders3 is known as Poisson Disc Sampling. You might be able to find more efficient implementations that use a local grid search to find 'overlaps.' For example Poisson Disc Sampling. Because you are constraining the system, it cannot be completely random. The maximum packing for circles with uniform radii in a plane is ~90% and is achieved when the circles are arranged in a perfect hexagonal array. As the number of points you request approaches the theoretical limit, the generated arrangement will become more hexagonal. In my experience, it is difficult to get above ~60% packing with uniform circles using this approach.
the following method uses list comprehension, but I am generating integers you can use different random generators for different datatypes
arr = [[random.randint(-4, 4), random.randint(-4, 4)] for i in range(40)]

Categories