I have trajectory data, where each trajectory consists of a sequence of coordinates(x, y points) and each trajectory is identified by a unique ID.
These trajectories are in x - y plane, and I want to divide the whole plane into equal sized grid (square grid). This grid is obviously invisible but is used to divide trajectories into sub-segments. Whenever a trajectory intersects with a grid line, it is segmented there and becomes a new sub-trajectory with new_id.
I have included a simple handmade graph to make clear what I am expecting.
It can be seen how the trajectory is divided at the intersections of the grid lines, and each of these segments has new unique id.
I am working on Python, and seek some python implementation links, suggestions, algorithms, or even a pseudocode for the same.
Please let me know if anything is unclear.
UPDATE
In order to divide the plane into grid , cell indexing is done as following:
#finding cell id for each coordinate
#cellid = (coord / cellSize).astype(int)
cellid = (coord / 0.5).astype(int)
cellid
Out[] : array([[1, 1],
[3, 1],
[4, 2],
[4, 4],
[5, 5],
[6, 5]])
#Getting x-cell id and y-cell id separately
x_cellid = cellid[:,0]
y_cellid = cellid[:,1]
#finding total number of cells
xmax = df.xcoord.max()
xmin = df.xcoord.min()
ymax = df.ycoord.max()
ymin = df.ycoord.min()
no_of_xcells = math.floor((xmax-xmin)/ 0.5)
no_of_ycells = math.floor((ymax-ymin)/ 0.5)
total_cells = no_of_xcells * no_of_ycells
total_cells
Out[] : 25
Since the plane is now divided into 25 cells each with a cellid. In order to find intersections, maybe I could check the next coordinate in the trajectory, if the cellid remains the same, then that segment of the trajectory is in the same cell and has no intersection with grid. Say, if x_cellid[2] is greater than x_cellid[0], then segment intersects vertical grid lines. Though, I am still unsure how to find the intersections with the grid lines and segment the trajectory on intersections giving them new id.
This can be solved by shapely:
%matplotlib inline
import pylab as pl
from shapely.geometry import MultiLineString, LineString
import numpy as np
from matplotlib.collections import LineCollection
x0, y0, x1, y1 = -10, -10, 10, 10
n = 11
lines = []
for x in np.linspace(x0, x1, n):
lines.append(((x, y0), (x, y1)))
for y in np.linspace(y0, y1, n):
lines.append(((x0, y), (x1, y)))
grid = MultiLineString(lines)
x = np.linspace(-9, 9, 200)
y = np.sin(x)*x
line = LineString(np.c_[x, y])
fig, ax = pl.subplots()
for i, segment in enumerate(line.difference(grid)):
x, y = segment.xy
pl.plot(x, y)
pl.text(np.mean(x), np.mean(y), str(i))
lc = LineCollection(lines, color="gray", lw=1, alpha=0.5)
ax.add_collection(lc);
The result:
To not use shapely, and do it yourself:
import pylab as pl
import numpy as np
from matplotlib.collections import LineCollection
x0, y0, x1, y1 = -10, -10, 10, 10
n = 11
xgrid = np.linspace(x0, x1, n)
ygrid = np.linspace(y0, y1, n)
x = np.linspace(-9, 9, 200)
y = np.sin(x)*x
t = np.arange(len(x))
idx_grid, idx_t = np.where((xgrid[:, None] - x[None, :-1]) * (xgrid[:, None] - x[None, 1:]) <= 0)
tx = idx_t + (xgrid[idx_grid] - x[idx_t]) / (x[idx_t+1] - x[idx_t])
idx_grid, idx_t = np.where((ygrid[:, None] - y[None, :-1]) * (ygrid[:, None] - y[None, 1:]) <= 0)
ty = idx_t + (ygrid[idx_grid] - y[idx_t]) / (y[idx_t+1] - y[idx_t])
t2 = np.sort(np.r_[t, tx, tx, ty, ty])
x2 = np.interp(t2, t, x)
y2 = np.interp(t2, t, y)
loc = np.where(np.diff(t2) == 0)[0] + 1
xlist = np.split(x2, loc)
ylist = np.split(y2, loc)
fig, ax = pl.subplots()
for i, (xp, yp) in enumerate(zip(xlist, ylist)):
pl.plot(xp, yp)
pl.text(np.mean(xp), np.mean(yp), str(i))
lines = []
for x in np.linspace(x0, x1, n):
lines.append(((x, y0), (x, y1)))
for y in np.linspace(y0, y1, n):
lines.append(((x0, y), (x1, y)))
lc = LineCollection(lines, color="gray", lw=1, alpha=0.5)
ax.add_collection(lc);
You're asking a lot. You should attack most of the design and coding yourself, once you have a general approach. Algorithm identification is reasonable for Stack Overflow; asking for design and reference links is not.
I suggest that you put the point coordinates into a list. use the NumPy and SciKit capabilities to interpolate the grid intersections. You can store segments in a list (of whatever defines a segment in your data design). Consider making a dictionary that allows you to retrieve the segments by grid coordinates. For instance, if segments are denoted only by the endpoints, and points are a class of yours, you might have something like this, using the lower-left corner of each square as its defining point:
grid_seg = {
(0.5, 0.5): [p0, p1],
(1.0, 0.5): [p1, p2],
(1.0, 1.0): [p2, p3],
...
}
where p0, p1, etc. are the interpolated crossing points.
Each trajectory is composed of a series of straight line segments. You therefore need a routine to break each line segment into sections that lie completely within a grid cell. The basis for such a routine would be the Digital Differential Analyzer (DDA) algorithm, though you'll need to modify the basic algorithm since you need endpoints of the line within each cell, not just which cells are visited.
A couple of things you have to be careful of:
1) If you're working with floating point numbers, beware of rounding errors in the calculation of the step values, as these can cause the algorithm to fail. For this reason many people choose to convert to an integer grid, obviously with a loss of precision. This is a good discussion of the issues, with some working code (though not python).
2) You'll need to decide which of the 4 grid lines surrounding a cell belong to the cell. One convention would be to use the bottom and left edges. You can see the issue if you consider a horizontal line segment that falls on a grid line - does its segments belong to the cell above or the cell below?
Cheers
data = list of list of coordinates
For point_id, point_coord in enumerate(point_coord_list):
if current point & last point stayed in same cell:
append point's index to last list of data
else:
append a new empty list to data
interpolate the two points and add a new point
that is on the grid lines.
Data stores all trajectories. Each list within the data is a trajectory.
The cell index along x and y axes (x_cell_id, y_cell_id) can be found by dividing coordinate of point by dimension of cell, then round to integer. If the cell indices of current point are same as that of last points, then these two points are in the same cell. list is good for inserting new points but it is not as memory efficient as arrays.
Might be a good idea to create a class for trajectory. Or use a memory buffer and sparse data structure instead of list and list and an array for the x-y coordinates if the list of coordinates wastes too much memory.
Inserting new points into array is slow, so we can use another array for new points.
Warning: I haven't thought too much about the things below. It probably has bugs, and someone needs to fill in the gaps.
# coord n x 2 numpy array.
# columns 0, 1 are x and y coordinate.
# row n is for point n
# cell_size length of one side of the square cell.
# n_ycells number of cells along the y axis
import numpy as np
cell_id_2d = (coord / cell_size).astype(int)
x_cell_id = cell_id_2d[:,0]
y_cell_id = cell_id_2d[:,1]
cell_id_1d = x_cell_id + y_cell_id*n_x_cells
# if the trajectory exits a cell, its cell id changes
# and the delta_cell_id is not zero.
delta_cell_id = cell_id_1d[1:] - cell_id_1d[:-1]
# The nth trajectory should contains the points from
# the (crossing_id[n])th to the (crossing_id[n + 1] - 1)th
w = np.where(delta_cell_id != 0)[0]
crossing_ids = np.empty(w.size + 1)
crossing_ids[1:] = w
crossing_ids[0] = 0
# need to interpolate when the trajectory cross cell boundary.
# probably can replace this loop with numpy functions/indexing
new_points = np.empty((w.size, 2))
for i in range(1, n):
st = coord[crossing_ids[i]]
en = coord[crossing_ids[i+1]]
# 1. check which boundary of the cell is crossed
# 2. interpolate
# 3. put points into new_points
# Each trajectory contains some points from coord array and 2 points
# in the new_points array.
For retrieval, make a sparse array that contains the index of the starting point in the coord array.
Linear interpolation can look bad if the cell size is large.
Further explanation:
Description of the grid
For n_xcells = 4, n_ycells = 3, the grid is:
0 1 2 3 4
0 [ ][ ][ ][ ][ ]
1 [ ][ ][ ][* ][ ]
2 [ ][ ][ ][ ][ ]
[* ] has an x_index of 3 and a y_index of 1.
There are (n_x_cells * n_y_cells) cells in the grid.
Relationship between point and cell
The cell that contains the ith point of the trajectory has an x_index of x_cell_id[i] and a y_index of x_cell_id[i]. I get this by discretization through dividing the xy-coordinates of the points by the length of the cell and then truncate to integers.
The cell_id_1d of the cells are the number in [ ]
0 1 2 3 4
0 [0 ][1 ][2 ][3 ][4 ]
1 [5 ][6 ][7 ][8 ][9 ]
2 [10][11][12][13][14]
cell_id_1d[i] = x_cell_id[i] + y_cell_id[i]*n_x_cells
I converted the pair of cell indices (x_cell_id[i], y_cell_id[i]) for the ith point to a single index called cell_id_1d.
How to find if trajectory exit a cell at the ith point
Now, the ith and (i + 1)th points are in same cell, if and only if (x_cell_id[i], y_cell_id[i]) == (x_cell_id[i + 1], y_cell_id[i + 1]) and also cell_id_1d[i] == cell_id[i + 1], and cell_id[i + 1] - cell_id[i] == 0. delta_cell_ids[i] = cell_id_1d[i + 1] - cell_id[i], which is zero if and only the ith and (i + 1)th points are in the same cell.
Related
I have a list of (x,y)-coordinates that represent a line skeleton.
The list is obtained directly from a binary image:
import numpy as np
list=np.where(img_skeleton>0)
Now the points in the list are sorted according to their position in the image along one of the axes.
I would like to sort the list such that the order represents a smooth path along the line. (This is currently not the case where the line curves back).
Subsequently, I want to fit a spline to these points.
A similar problem has been described and solved using arcPy here. Is there a convenient way to achieve this using python, numpy, scipy, openCV (or another library?)
below is an example image. it results in a list of 59 (x,y)-coordinates.
when I send the list to scipy's spline fitting routine, I am running into a problem because the points aren't 'ordered' on the line:
I apologize for the long answer in advance :P (the problem is not that simple).
Lets start by rewording the problem. Finding a line that connects all the points, can be reformulated as a shortest path problem in a graph, where (1) the graph nodes are the points in the space, (2) each node is connected to its 2 nearest neighbors, and (3) the shortest path passes through each of the nodes only once. That last constrain is a very important (and quite hard one to optimize). Essentially, the problem is to find a permutation of length N, where the permutation refers to the order of each of the nodes (N is the total number of nodes) in the path.
Finding all the possible permutations and evaluating their cost is too expensive (there are N! permutations if I'm not wrong, which is too big for problems). Bellow I propose an approach that finds the N best permutations (the optimal permutation for each of the N points) and then find the permutation (from those N) that minimizes the error/cost.
1. Create a random problem with unordered points
Now, lets start to create a sample problem:
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 2 * np.pi, 100)
y = np.sin(x)
plt.plot(x, y)
plt.show()
And here, the unsorted version of the points [x, y] to simulate a random points in space connected in a line:
idx = np.random.permutation(x.size)
x = x[idx]
y = y[idx]
plt.plot(x, y)
plt.show()
The problem is then to order those points to recover their original order so that the line is plotted properly.
2. Create 2-NN graph between nodes
We can first rearrange the points in a [N, 2] array:
points = np.c_[x, y]
Then, we can start by creating a nearest neighbour graph to connect each of the nodes to its 2 nearest neighbors:
from sklearn.neighbors import NearestNeighbors
clf = NearestNeighbors(2).fit(points)
G = clf.kneighbors_graph()
G is a sparse N x N matrix, where each row represents a node, and the non-zero elements of the columns the euclidean distance to those points.
We can then use networkx to construct a graph from this sparse matrix:
import networkx as nx
T = nx.from_scipy_sparse_matrix(G)
3. Find shortest path from source
And, here begins the magic: we can extract the paths using dfs_preorder_nodes, which will essentially create a path through all the nodes (passing through each of them exactly once) given a starting node (if not given, the 0 node will be selected).
order = list(nx.dfs_preorder_nodes(T, 0))
xx = x[order]
yy = y[order]
plt.plot(xx, yy)
plt.show()
Well, is not too bad, but we can notice that the reconstruction is not optimal. This is because the point 0 in the unordered list lays in the middle of the line, that is way it first goes in one direction, and then comes back and finishes in the other direction.
4. Find the path with smallest cost from all sources
So, in order to obtain the optimal order, we can just get the best order for all the nodes:
paths = [list(nx.dfs_preorder_nodes(T, i)) for i in range(len(points))]
Now that we have the optimal path starting from each of the N = 100 nodes, we can discard them and find the one that minimizes the distances between the connections (optimization problem):
mindist = np.inf
minidx = 0
for i in range(len(points)):
p = paths[i] # order of nodes
ordered = points[p] # ordered nodes
# find cost of that order by the sum of euclidean distances between points (i) and (i+1)
cost = (((ordered[:-1] - ordered[1:])**2).sum(1)).sum()
if cost < mindist:
mindist = cost
minidx = i
The points are ordered for each of the optimal paths, and then a cost is computed (by calculating the euclidean distance between all pairs of points i and i+1). If the path starts at the start or end point, it will have the smallest cost as all the nodes will be consecutive. On the other hand, if the path starts at a node that lies in the middle of the line, the cost will be very high at some point, as it will need to travel from the end (or beginning) of the line to the initial position to explore the other direction. The path that minimizes that cost, is the path starting in an optimal point.
opt_order = paths[minidx]
Now, we can reconstruct the order properly:
xx = x[opt_order]
yy = y[opt_order]
plt.plot(xx, yy)
plt.show()
One possible solution is to use a nearest neighbours approach, possible by using a KDTree. Scikit-learn has an nice interface. This can then be used to build a graph representation using networkx. This will only really work if the line to be drawn should go through the nearest neighbours:
from sklearn.neighbors import KDTree
import numpy as np
import networkx as nx
G = nx.Graph() # A graph to hold the nearest neighbours
X = [(0, 1), (1, 1), (3, 2), (5, 4)] # Some list of points in 2D
tree = KDTree(X, leaf_size=2, metric='euclidean') # Create a distance tree
# Now loop over your points and find the two nearest neighbours
# If the first and last points are also the start and end points of the line you can use X[1:-1]
for p in X
dist, ind = tree.query(p, k=3)
print ind
# ind Indexes represent nodes on a graph
# Two nearest points are at indexes 1 and 2.
# Use these to form edges on graph
# p is the current point in the list
G.add_node(p)
n1, l1 = X[ind[0][1]], dist[0][1] # The next nearest point
n2, l2 = X[ind[0][2]], dist[0][2] # The following nearest point
G.add_edge(p, n1)
G.add_edge(p, n2)
print G.edges() # A list of all the connections between points
print nx.shortest_path(G, source=(0,1), target=(5,4))
>>> [(0, 1), (1, 1), (3, 2), (5, 4)] # A list of ordered points
Update: If the start and end points are unknown and your data is reasonably well separated, you can find the ends by looking for cliques in the graph. The start and end points will form a clique. If the longest edge is removed from the clique it will create a free end in the graph which can be used as a start and end point. For example, the start and end points in this list appear in the middle:
X = [(0, 1), (0, 0), (2, 1), (3, 2), (9, 4), (5, 4)]
After building the graph, now its a case of removing the longest edge from the cliques to find the free ends of the graph:
def find_longest_edge(l):
e1 = G[l[0]][l[1]]['weight']
e2 = G[l[0]][l[2]]['weight']
e3 = G[l[1]][l[2]]['weight']
if e2 < e1 > e3:
return (l[0], l[1])
elif e1 < e2 > e3:
return (l[0], l[2])
elif e1 < e3 > e2:
return (l[1], l[2])
end_cliques = [i for i in list(nx.find_cliques(G)) if len(i) == 3]
edge_lengths = [find_longest_edge(i) for i in end_cliques]
G.remove_edges_from(edge_lengths)
edges = G.edges()
start_end = [n for n,nbrs in G.adjacency_iter() if len(nbrs.keys()) == 1]
print nx.shortest_path(G, source=start_end[0], target=start_end[1])
>>> [(0, 0), (0, 1), (2, 1), (3, 2), (5, 4), (9, 4)] # The correct path
I had the exact same problem. If you have two arrays of scattered x and y values that are not too curvy, then you can transform the points into PCA space, sort them in PCA space, and then transform them back. (I've also added in some bonus smoothing functionality).
import numpy as np
from scipy.signal import savgol_filter
from sklearn.decomposition import PCA
def XYclean(x,y):
xy = np.concatenate((x.reshape(-1,1), y.reshape(-1,1)), axis=1)
# make PCA object
pca = PCA(2)
# fit on data
pca.fit(xy)
#transform into pca space
xypca = pca.transform(xy)
newx = xypca[:,0]
newy = xypca[:,1]
#sort
indexSort = np.argsort(x)
newx = newx[indexSort]
newy = newy[indexSort]
#add some more points (optional)
f = interpolate.interp1d(newx, newy, kind='linear')
newX=np.linspace(np.min(newx), np.max(newx), 100)
newY = f(newX)
#smooth with a filter (optional)
window = 43
newY = savgol_filter(newY, window, 2)
#return back to old coordinates
xyclean = pca.inverse_transform(np.concatenate((newX.reshape(-1,1), newY.reshape(-1,1)), axis=1) )
xc=xyclean[:,0]
yc = xyclean[:,1]
return xc, yc
I agree with Imanol_Luengo Imanol Luengo's solution, but if you know the index of the first point, then there is a considerably easier solution that uses only NumPy:
def order_points(points, ind):
points_new = [ points.pop(ind) ] # initialize a new list of points with the known first point
pcurr = points_new[-1] # initialize the current point (as the known point)
while len(points)>0:
d = np.linalg.norm(np.array(points) - np.array(pcurr), axis=1) # distances between pcurr and all other remaining points
ind = d.argmin() # index of the closest point
points_new.append( points.pop(ind) ) # append the closest point to points_new
pcurr = points_new[-1] # update the current point
return points_new
This approach appears to work well with the sine curve example, especially because it is easy to define the first point as either the leftmost or rightmost point.
For the img_skeleton data cited in the question, it would be similarly easy to algorithmically obtain the first point, for example as the topmost point.
# create sine curve:
x = np.linspace(0, 2 * np.pi, 100)
y = np.sin(x)
# shuffle the order of the x and y coordinates:
idx = np.random.permutation(x.size)
xs,ys = x[idx], y[idx] # shuffled points
# find the leftmost point:
ind = xs.argmin()
# assemble the x and y coordinates into a list of (x,y) tuples:
points = [(xx,yy) for xx,yy in zip(xs,ys)]
# order the points based on the known first point:
points_new = order_points(points, ind)
# plot:
fig,ax = plt.subplots(1, 2, figsize=(10,4))
xn,yn = np.array(points_new).T
ax[0].plot(xs, ys) # original (shuffled) points
ax[1].plot(xn, yn) # new (ordered) points
ax[0].set_title('Original')
ax[1].set_title('Ordered')
plt.tight_layout()
plt.show()
I am working on a similar problem, but it has an important constraint (much like the example given by the OP) which is that each pixel has either one or two neighboring pixel, in the 8-connected sense. With this constraint, there is a very simple solution.
def sort_to_form_line(unsorted_list):
"""
Given a list of neighboring points which forms a line, but in random order,
sort them to the correct order.
IMPORTANT: Each point must be a neighbor (8-point sense)
to a least one other point!
"""
sorted_list = [unsorted_list.pop(0)]
while len(unsorted_list) > 0:
i = 0
while i < len(unsorted_list):
if are_neighbours(sorted_list[0], unsorted_list[i]):
#neighbours at front of list
sorted_list.insert(0, unsorted_list.pop(i))
elif are_neighbours(sorted_list[-1], unsorted_list[i]):
#neighbours at rear of list
sorted_list.append(unsorted_list.pop(i))
else:
i = i+1
return sorted_list
def are_neighbours(pt1, pt2):
"""
Check if pt1 and pt2 are neighbours, in the 8-point sense
pt1 and pt2 has integer coordinates
"""
return (np.abs(pt1[0]-pt2[0]) < 2) and (np.abs(pt1[1]-pt2[1]) < 2)
Modifying upon Toddp's answer , you can find end-points of arbitrarily shaped lines using this code and then order the points as Toddp stated, this is much faster than Imanol Luengo's answer, the only constraint is that the line must have only 2 end-points :
def order_points(points):
if isinstance(points,np.ndarray):
assert points.shape[1]==2
points = points.tolist()
exts = get_end_points(points)
assert len(exts) ==2
ind = points.index(exts[0])
points_new = [ points.pop(ind) ] # initialize a new list of points with the known first point
pcurr = points_new[-1] # initialize the current point (as the known point)
while len(points)>0:
d = np.linalg.norm(np.array(points) - np.array(pcurr), axis=1) # distances between pcurr and all other remaining points
ind = d.argmin() # index of the closest point
points_new.append( points.pop(ind) ) # append the closest point to points_new
pcurr = points_new[-1] # update the current point
return points_new
def get_end_points(ptsxy):
#source : https://stackoverflow.com/a/67145008/10998081
if isinstance(ptsxy,list): ptsxy = np.array(ptsxy)
assert ptsxy.shape[1]==2
#translate to (0,0)for faster excution
xx,yy,w,h = cv2.boundingRect(ptsxy)
pts_translated = ptsxy -(xx,yy)
bim = np.zeros((h+1,w+1))
bim[[*np.flip(pts_translated).T]]=255
extremes = []
for p in pts_translated:
x = p[0]
y = p[1]
n = 0
n += bim[y - 1,x]
n += bim[y - 1,x - 1]
n += bim[y - 1,x + 1]
n += bim[y,x - 1]
n += bim[y,x + 1]
n += bim[y + 1,x]
n += bim[y + 1,x - 1]
n += bim[y + 1,x + 1]
n /= 255
if n == 1:
extremes.append(p)
extremes = np.array(extremes)+(xx,yy)
return extremes.tolist()
I am trying to make a hexagonal fill by the voronoi diagram. One problem I find is that although the plot it produces is a hexagon diagram, the distances between the points vary.
The first function is to give a voronoi diagram by exact hexagons. Then I am trying to assign a universal initial distance between each cells as a spring rest length between them.
Now my problem is that the initial hexagonal diagram gives non-universal length between cells. We can see it by the printed result given by the line "print(a)" in the code. However, I assigned the coordinates of the points by 'x = (col + (0.5 * (row % 2))) * np.sqrt(3)' and 'y = row * 0.5', which should give exact hexagons. I don't understand how I am getting different distances between points.
The following is my code, and mostly the second function part is about finding neighbors to each cell and computing distances between each cell and its neighbors. I am printing the distances by 'print(a)' line.
import numpy as np
import freud
import matplotlib.pyplot as plt
from scipy.spatial import Delaunay
from collections import defaultdict
import itertools
# Source: https://freud.readthedocs.io/en/v2.10.0/gettingstarted/examples/module_intros/locality.Voronoi.html
def hexagonal_lattice(rows=3, cols=3, noise=.0, seed=None):
if seed is not None:
np.random.seed(seed)
# Assemble a hexagonal lattice
points = []
for row in range(rows * 2):
for col in range(cols):
x = (col + (0.5 * (row % 2))) * np.sqrt(3)
y = row * 0.5 # These x,y are allocated to produce exact hexagons
points.append((x, y, 0))
points = np.asarray(points)
points += np.random.multivariate_normal(
mean=np.zeros(3), cov=np.eye(3) * noise, size=points.shape[0]
)
# Set z=0 again for all points after adding Gaussian noise
# points[:, 2] = 0 # do not see the need. Seems wrap later changes z coordi to 0
# Wrap the points into the box
box = freud.box.Box(Lx=cols * np.sqrt(3), Ly=rows, is2D=True)
points = box.wrap(points) # 주어진 그림박스 안으로 periodic bdy 써서 넣어주는 역할
return box, points
# Compute the Voronoi diagram and plot
box1, pts1 = hexagonal_lattice(rows=12, cols=12, seed=2) # Noise = 0
voro = freud.locality.Voronoi()
voro.compute((box1, pts1))
plt.figure()
ax = plt.gca()
voro.plot(ax=ax, cmap="RdBu")
ax.scatter(pts1[:, 0], pts1[:, 1], s=2, c='k')
plt.show()
# This part is for the stability check of the initial exact hexagons diagram
def cell_movement(box, points, time_length, Lambda=0.01):
time = 1
while time <= time_length:
# 2D projection + neighboring cells
points_2d = []
for point in points:
points_2d.append([point[0], point[1]]) # projection to 2d for neighbor list
points_2d = np.asarray(points_2d)
tri = Delaunay(points_2d)
neiList = defaultdict(set) # Neighbor list for each cell
for p in tri.vertices:
for i, j in itertools.combinations(p, 2):
neiList[i].add(j)
neiList[j].add(i)
neiborList = sorted(neiList.items()) # Sorted neighbor array
spring = np.ones((len(points[:, 0]), len(points[:, 0]))) # Initial spring rest length
rintervec = np.empty((len(points[:, 0]), len(points[:, 0]), 2)) # spring length array
for i in range(len(neiborList)):
for j in list(neiborList[i][1]):
j = int(j)
rintervec[i, j] = points_2d[i] - points_2d[j] # Distance vector between i,j cells
a = np.linalg.norm(rintervec[i, j]) # Distances between neighboring cells
if a != 0:
print(a) # These are the printed numbers
spring[i, j] = np.linalg.norm(rintervec[i, j]) # Assign a as spring rest length
points_2d[i] += Lambda * rintervec[i, j] * ( # moves points by equation (8)
spring[i, j] - np.linalg.norm(rintervec[i, j])) / np.linalg.norm(rintervec[i, j])
points[i] = np.append(points_2d[i], np.array([0]))
# diagram
points = box.wrap(points) # 주어진 그림박스 안으로 periodic bdy 써서 넣어주는 역할
voro.compute((box, points)) # Computing the Voronoi diagram
# figure
plt.figure()
ax = plt.gca()
voro.plot(ax=ax, cmap="RdBu")
ax.scatter(points[:, 0], points[:, 1], s=2, c='k')
plt.savefig("C:\\doit\\pythonPractice\\At time %s.png" % time) # saves diagrams
plt.show()
time = time + 1
cell_movement(box1, pts1, time_length=5)
I try to do:
Cut STL file https://www.dropbox.com/s/pex20yqfgmxgt0w/wing_fish.stl?dl=0 at Z-coordinate using PyVsita https://docs.pyvista.org/ )
Extract point's coordinates X, Y at given section Z
Sort points to Upper and Down groups for further manipulation
Here is my code:
import pyvista as pv
import matplotlib.pylab as plt
import numpy as np
import math
mesh = pv.read('wing_fish.stl')
z_slice = [0, 0, 1] # normal to cut at
single_slice = mesh.slice(normal=z_slice, origin=[0, 0, 200]) # slicing
a = single_slice.points # choose only points
# p = pv.Plotter() #show section
# p.add_mesh(single_slice)
# p.show()
a = a[a[:,0].astype(float).argsort()] # sort all points by Х coord
# X min of all points
x0 = a[0][0]
# Y min of all points
y0 = a[0][1]
# X tail 1 of 2
xn = a[-1][0]
# Y tail 1 of 2
yn = a[-1][1]
# X tail 2 of 2
xn2 = a[-2][0]
# Y tail 2 of 2
yn2 = a[-2][1]
def line_y(x, x0, y0, xn, yn):
# return y coord at arbitary x coord of x0, y0 xn, yn LINE
return ((x - x0)*(yn-y0))/(xn-x0)+y0
def line_c(x0, y0, xn, yn):
# return x, y middle points of LINE
xc = (x0+xn)/2
yc = (y0+yn)/2
return xc, yc
def chord(P1, P2):
return math.sqrt((P2[1] - P1[1])**2 + (P2[0] - P1[0])**2)
xc_end, yc_end = line_c(xn, yn, xn2, yn2) # return midle at trailing edge
midLine = np.array([[x0,y0],[xc_end,yc_end]],dtype='float32')
c_temp_x_d = []
c_temp_y_d = []
c_temp_x_u = []
c_temp_y_u = []
isUp = None
isDown = None
for i in a:
if i[1] == line_y(i[0], x0=x0, y0=y0, xn=xc_end, yn=yc_end):
continue
elif i[1] < line_y(i[0], x0=x0, y0=y0, xn=xc_end, yn=yc_end):
c_temp_y_d.append(i[1])
c_temp_x_d.append(i[0])
isDown = True
else:
c_temp_y_u.append(i[1])
c_temp_x_u.append(i[0])
isUp = True
if len(c_temp_y_d) != 0 and len(c_temp_y_u) != 0:
print(c_temp_y_d[-1])
plt.plot(c_temp_x_d, c_temp_y_d, label='suppose to be down points')
plt.plot(c_temp_x_u, c_temp_y_u, label='suppose to be upper points')
plt.plot(midLine[:,0], midLine[:,1], label='Chord')
plt.scatter(a[:,0],a[:,1], label='raw points')
plt.legend();plt.grid();plt.show()
What I have:
What I want:
I would highly appreciate for any help and advises!
Thanks in advance!
You are discarding precious connectivity information that is already there in your STL mesh and in your slice!
I couldn't think of a more idiomatic solution within PyVista, but at worst you can take the cell (line) information from the slice and start walking your shape (that is topologically equivalent to a circle) from its left side to its right, and vice versa. Here's one way:
import numpy as np
import matplotlib.pyplot as plt
import pyvista as pv
mesh = pv.read('../wing_fish.stl')
z_slice = [0, 0, 1] # normal to cut at
single_slice = mesh.slice(normal=z_slice, origin=[0, 0, 200]) # slicing
# find points with smallest and largest x coordinate
points = single_slice.points
left_ind = points[:, 0].argmin()
right_ind = points[:, 0].argmax()
# sanity check for what we're about to do:
# 1. all cells are lines
assert single_slice.n_cells == single_slice.n_points
assert (single_slice.lines[::3] == 2).all()
# 2. all points appear exactly once as segment start and end
lines = single_slice.lines.reshape(-1, 3) # each row: [2, i_from, i_to]
assert len(set(lines[:, 1])) == lines.shape[0]
# create an auxiliary dict with from -> to index mappings
conn = dict(lines[:, 1:])
# and a function that walks this connectivity graph
def walk_connectivity(connectivity, start, end):
this_ind = start
path_inds = [this_ind]
while True:
next_ind = connectivity[this_ind]
path_inds.append(next_ind)
this_ind = next_ind
if this_ind == end:
# we're done
return path_inds
# start walking at point left_ind, walk until right_ind
first_side_inds = walk_connectivity(conn, left_ind, right_ind)
# now walk forward for the other half curve
second_side_inds = walk_connectivity(conn, right_ind, left_ind)
# get the point coordinates for plotting
first_side_points = points[first_side_inds, :-1]
second_side_points = points[second_side_inds, :-1]
# plot the two sides
fig, ax = plt.subplots()
ax.plot(*first_side_points.T)
ax.plot(*second_side_points.T)
plt.show()
In order to avoid using an O(n^2) algorithm, I defined an auxiliary dict that maps line segment start indices to end indices. In order for this to work we need some sanity checks, namely that the cells are all simple line segments, and that each segment has the same orientation (i.e. each start point is unique, and each end point is unique). Once we have this it's easy to start from the left edge of your wing profile and walk each line segment until we find the right edge.
The nature of this approach implies that we can't know a priori whether the path from left to right goes on the upper or the lower path. This needs experimentation on your part; name the two paths in whatever way you see fit.
And of course there's always room for fine tuning. For instance, the above implementation creates two paths that both start and end with the left and right-side boundary points of the mesh. If you want the top and bottom curves to share no points, you'll have to adjust the algorithm accordingly. And if the end point is not found on the path then the current implementation will give you an infinite loop with a list growing beyond all available memory. Consider adding some checks in the implementation to avoid this.
Anyway, this is what we get from the above:
I have a list of x and y values for two curves, both having weird shapes, and I don't have a function for any of them. I need to do two things:
Plot it and shade the area between the curves like the image below.
Find the total area of this shaded region between the curves.
I'm able to plot and shade the area between those curves with fill_between and fill_betweenx in matplotlib, but I have no idea on how to calculate the exact area between them, specially because I don't have a function for any of those curves.
Any ideas?
I looked everywhere and can't find a simple solution for this. I'm quite desperate, so any help is much appreciated.
Thank you very much!
EDIT: For future reference (in case anyone runs into the same problem), here is how I've solved this: connected the first and last node/point of each curve together, resulting in a big weird-shaped polygon, then used shapely to calculate the polygon's area automatically, which is the exact area between the curves, no matter which way they go or how nonlinear they are. Works like a charm! :)
Here is my code:
from shapely.geometry import Polygon
x_y_curve1 = [(0.121,0.232),(2.898,4.554),(7.865,9.987)] #these are your points for curve 1 (I just put some random numbers)
x_y_curve2 = [(1.221,1.232),(3.898,5.554),(8.865,7.987)] #these are your points for curve 2 (I just put some random numbers)
polygon_points = [] #creates a empty list where we will append the points to create the polygon
for xyvalue in x_y_curve1:
polygon_points.append([xyvalue[0],xyvalue[1]]) #append all xy points for curve 1
for xyvalue in x_y_curve2[::-1]:
polygon_points.append([xyvalue[0],xyvalue[1]]) #append all xy points for curve 2 in the reverse order (from last point to first point)
for xyvalue in x_y_curve1[0:1]:
polygon_points.append([xyvalue[0],xyvalue[1]]) #append the first point in curve 1 again, to it "closes" the polygon
polygon = Polygon(polygon_points)
area = polygon.area
print(area)
EDIT 2: Thank you for the answers. Like Kyle explained, this only works for positive values. If your curves go below 0 (which is not my case, as showed in the example chart), then you would have to work with absolute numbers.
The area calculation is straightforward in blocks where the two curves don't intersect: thats the trapezium as has been pointed out above. If they intersect, then you create two triangles between x[i] and x[i+1], and you should add the area of the two. If you want to do it directly, you should handle the two cases separately. Here's a basic working example to solve your problem. First, I will start with some fake data:
#!/usr/bin/python
import numpy as np
# let us generate fake test data
x = np.arange(10)
y1 = np.random.rand(10) * 20
y2 = np.random.rand(10) * 20
Now, the main code. Based on your plot, looks like you have y1 and y2 defined at the same X points. Then we define,
z = y1-y2
dx = x[1:] - x[:-1]
cross_test = np.sign(z[:-1] * z[1:])
cross_test will be negative whenever the two graphs cross. At these points, we want to calculate the x coordinate of the crossover. For simplicity, I will calculate x coordinates of the intersection of all segments of y. For places where the two curves don't intersect, they will be useless values, and we won't use them anywhere. This just keeps the code easier to understand.
Suppose you have z1 and z2 at x1 and x2, then we are solving for x0 such that z = 0:
# (z2 - z1)/(x2 - x1) = (z0 - z1) / (x0 - x1) = -z1/(x0 - x1)
# x0 = x1 - (x2 - x1) / (z2 - z1) * z1
x_intersect = x[:-1] - dx / (z[1:] - z[:-1]) * z[:-1]
dx_intersect = - dx / (z[1:] - z[:-1]) * z[:-1]
Where the curves don't intersect, area is simply given by:
areas_pos = abs(z[:-1] + z[1:]) * 0.5 * dx # signs of both z are same
Where they intersect, we add areas of both triangles:
areas_neg = 0.5 * dx_intersect * abs(z[:-1]) + 0.5 * (dx - dx_intersect) * abs(z[1:])
Now, the area in each block x[i] to x[i+1] is to be selected, for which I use np.where:
areas = np.where(cross_test < 0, areas_neg, areas_pos)
total_area = np.sum(areas)
That is your desired answer. As has been pointed out above, this will get more complicated if the both the y graphs were defined at different x points. If you want to test this, you can simply plot it (in my test case, y range will be -20 to 20)
negatives = np.where(cross_test < 0)
positives = np.where(cross_test >= 0)
plot(x, y1)
plot(x, y2)
plot(x, z)
plt.vlines(x_intersect[negatives], -20, 20)
Define your two curves as functions f and g that are linear by segment, e.g. between x1 and x2, f(x) = f(x1) + ((x-x1)/(x2-x1))*(f(x2)-f(x1)).
Define h(x)=abs(g(x)-f(x)). Then use scipy.integrate.quad to integrate h.
That way you don't need to bother about the intersections. It will do the "trapeze summing" suggested by ch41rmn automatically.
Your set of data is quite "nice" in the sense that the two sets of data share the same set of x-coordinates. You can therefore calculate the area using a series of trapezoids.
e.g. define the two functions as f(x) and g(x), then, between any two consecutive points in x, you have four points of data:
(x1, f(x1))-->(x2, f(x2))
(x1, g(x1))-->(x2, g(x2))
Then, the area of the trapezoid is
A(x1-->x2) = ( f(x1)-g(x1) + f(x2)-g(x2) ) * (x2-x1)/2 (1)
A complication arises that equation (1) only works for simply-connected regions, i.e. there must not be a cross-over within this region:
|\ |\/|
|_| vs |/\|
The area of the two sides of the intersection must be evaluated separately. You will need to go through your data to find all points of intersections, then insert their coordinates into your list of coordinates. The correct order of x must be maintained. Then, you can loop through your list of simply connected regions and obtain a sum of the area of trapezoids.
EDIT:
For curiosity's sake, if the x-coordinates for the two lists are different, you can instead construct triangles. e.g.
.____.
| / \
| / \
| / \
|/ \
._________.
Overlap between triangles must be avoided, so you will again need to find points of intersections and insert them into your ordered list. The lengths of each side of the triangle can be calculated using Pythagoras' formula, and the area of the triangles can be calculated using Heron's formula.
The area_between_two_curves function in pypi library similaritymeasures (released in 2018) might give you what you need. I tried a trivial example on my side, comparing the area between a function and a constant value and got pretty close tie-back to Excel (within 2%). Not sure why it doesn't give me 100% tie-back, maybe I am doing something wrong. Worth considering though.
I had the same problem.The answer below is based on an attempt by the question author. However, shapely will not directly give the area of the polygon in purple. You need to edit the code to break it up into its component polygons and then get the area of each. After-which you simply add them up.
Area Between two lines
Consider the lines below:
Sample Two lines
If you run the code below you will get zero for area because it takes the clockwise and subtracts the anti clockwise area:
from shapely.geometry import Polygon
x_y_curve1 = [(1,1),(2,1),(3,3),(4,3)] #these are your points for curve 1
x_y_curve2 = [(1,3),(2,3),(3,1),(4,1)] #these are your points for curve 2
polygon_points = [] #creates a empty list where we will append the points to create the polygon
for xyvalue in x_y_curve1:
polygon_points.append([xyvalue[0],xyvalue[1]]) #append all xy points for curve 1
for xyvalue in x_y_curve2[::-1]:
polygon_points.append([xyvalue[0],xyvalue[1]]) #append all xy points for curve 2 in the reverse order (from last point to first point)
for xyvalue in x_y_curve1[0:1]:
polygon_points.append([xyvalue[0],xyvalue[1]]) #append the first point in curve 1 again, to it "closes" the polygon
polygon = Polygon(polygon_points)
area = polygon.area
print(area)
The solution is therefore to split the polygon into smaller pieces based on where the lines intersect. Then use a for loop to add these up:
from shapely.geometry import Polygon
x_y_curve1 = [(1,1),(2,1),(3,3),(4,3)] #these are your points for curve 1
x_y_curve2 = [(1,3),(2,3),(3,1),(4,1)] #these are your points for curve 2
polygon_points = [] #creates a empty list where we will append the points to create the polygon
for xyvalue in x_y_curve1:
polygon_points.append([xyvalue[0],xyvalue[1]]) #append all xy points for curve 1
for xyvalue in x_y_curve2[::-1]:
polygon_points.append([xyvalue[0],xyvalue[1]]) #append all xy points for curve 2 in the reverse order (from last point to first point)
for xyvalue in x_y_curve1[0:1]:
polygon_points.append([xyvalue[0],xyvalue[1]]) #append the first point in curve 1 again, to it "closes" the polygon
polygon = Polygon(polygon_points)
area = polygon.area
x,y = polygon.exterior.xy
# original data
ls = LineString(np.c_[x, y])
# closed, non-simple
lr = LineString(ls.coords[:] + ls.coords[0:1])
lr.is_simple # False
mls = unary_union(lr)
mls.geom_type # MultiLineString'
Area_cal =[]
for polygon in polygonize(mls):
Area_cal.append(polygon.area)
Area_poly = (np.asarray(Area_cal).sum())
print(Area_poly)
A straightforward application of the area of a general polygon (see Shoelace formula) makes for a super-simple and fast, vectorized calculation:
def area(p):
# for p: 2D vertices of a polygon:
# area = 1/2 abs(sum(p0 ^ p1 + p1 ^ p2 + ... + pn-1 ^ p0))
# where ^ is the cross product
return np.abs(np.cross(p, np.roll(p, 1, axis=0)).sum()) / 2
Application to area between two curves. In this example, we don't even have matching x coordinates!
np.random.seed(0)
n0 = 10
n1 = 15
xy0 = np.c_[np.linspace(0, 10, n0), np.random.uniform(0, 10, n0)]
xy1 = np.c_[np.linspace(0, 10, n1), np.random.uniform(0, 10, n1)]
p = np.r_[xy0, xy1[::-1]]
>>> area(p)
4.9786...
Plot:
plt.plot(*xy0.T, 'b-')
plt.plot(*xy1.T, 'r-')
p = np.r_[xy0, xy1[::-1]]
plt.fill(*p.T, alpha=.2)
Speed
For both curves having 1 million points:
n = 1_000_000
xy0 = np.c_[np.linspace(0, 10, n), np.random.uniform(0, 10, n)]
xy1 = np.c_[np.linspace(0, 10, n), np.random.uniform(0, 10, n)]
%timeit area(np.r_[xy0, xy1[::-1]])
# 42.9 ms ± 140 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Simple viz of polygon area calculation
# say:
p = np.array([[0, 3], [1, 0], [3, 3], [1, 3], [1, 2]])
p_closed = np.r_[p, p[:1]]
fig, axes = plt.subplots(ncols=2, figsize=(10, 5), subplot_kw=dict(box_aspect=1), sharex=True)
ax = axes[0]
ax.set_aspect('equal')
ax.plot(*p_closed.T, '.-')
ax.fill(*p_closed.T, alpha=0.6)
center = p.mean(0)
txtkwargs = dict(ha='center', va='center')
ax.text(*center, f'{area(p):.2f}', **txtkwargs)
ax = axes[1]
ax.set_aspect('equal')
for a, b in zip(p_closed, p_closed[1:]):
ar = 1/2 * np.cross(a, b)
pos = ar >= 0
tri = np.c_[(0,0), a, b, (0,0)].T
# shrink a bit to make individual triangles easier to visually identify
center = tri.mean(0)
tri = (tri - center)*0.95 + center
c = 'b' if pos else 'r'
ax.plot(*tri.T, 'k')
ax.fill(*tri.T, c, alpha=0.2, zorder=2 - pos)
t = ax.text(*center, f'{ar:.1f}', color=c, fontsize=8, **txtkwargs)
t.set_bbox(dict(facecolor='white', alpha=0.8, edgecolor='none'))
plt.tight_layout()
I have an array of points in 3D Cartesian space:
P = np.random.random((10, 3))
Now I'd like to find their distances to a given axis and on that given axis
Ax_support = array([3, 2, 1])
Ax_direction = array([1, 2, 3])
I've found a solution that first finds the vector from each point that is perpendicular to the direction vector... I feel however, that it is really complicated and that for such a standard problem there would be a numpy or scipy routine already out there (as there is to find the distance between points in scipy.spatial.distance)
I would be surprised to see such an operation among the standard operations of numpy/scipy. What you are looking for is the projection distance onto your line. Start by subtracting Ax_support:
P_centered = P - Ax_support
The points on the line through 0 with direction Ax_direction with the shortest distance in the L2 sense to each element of P_centered is given by
P_projected = P_centered.dot(np.linalg.pinv(
Ax_direction[np.newaxis, :])).dot(Ax_direction[np.newaxis, :])
Thus the formula you are looking for is
distances = np.sqrt(((P_centered - P_projected) ** 2).sum(axis=1))
Yes, this is exactly what you propose, in a vectorized way of doing things, so it should be pretty fast for reasonably many data points.
Note: If anybody knows a built-in function for this, I'd be very interested!
It's been bothering me that something like this does not exist, so I added haggis.math.segment_distance to the library of utilities I maintain, haggis.
One catch is that this function expects a line defined by two points rather than a support and direction. You can supply as many points and lines as you want, as long as the dimensions broadcast. The usual formats are many points projected on one line (as you have), or one point projected on many lines.
distances = haggis.math.segment_distance(
P, Ax_support, Ax_support + Ax_direction,
axis=1, segment=False)
Here is a reproducible example:
np.random.seed(0xBEEF)
P = np.random.random((10,3))
Ax_support = np.array([3, 2, 1])
Ax_direction = np.array([1, 2, 3])
d, t = haggis.math.segment_distance(
P, Ax_support, Ax_support + Ax_direction,
axis=1, return_t=True, segment=False)
return_t returns the location of the normal points along the line as a ratio of the line length from the support (i.e., Ax_support + t * Ax_direction is the location of the projected points).
>>> d
array([2.08730838, 2.73314321, 2.1075711 , 2.5672012 , 1.96132443,
2.53325436, 2.15278454, 2.77763701, 2.50545181, 2.75187883])
>>> t
array([-0.47585462, -0.60843258, -0.46755277, -0.4273361 , -0.53393468,
-0.58564737, -0.38732655, -0.53212317, -0.54956548, -0.41748691])
This allows you to make the following plot:
fig, ax = plt.subplots(subplot_kw={'projection': '3d'})
ax.plot(*Ax_support, 'ko')
ax.plot(*Ax_support + Ax_direction, 'ko')
start = min(t.min(), 0)
end = max(t.max(), 1)
margin = 0.1 * (end - start)
start -= margin
end += margin
ax.plot(*Ax_support[:, None] + [start, end] * Ax_direction[:, None], 'k--')
for s, te in zip(P, t):
pt = Ax_support + te * Ax_direction
ax.plot(*np.stack((s, pt), axis=-1), 'k-')
ax.plot(*pt, 'ko', markerfacecolor='w')
ax.plot(*P.T, 'ko', markerfacecolor='r')
plt.show()