Packing problem : Fit cuboids into a larger cuboid with constraints - python

I have a 3D numpy array which represents a mask. The "1" elements of this mask are areas where no calculation is done (blue areas in the figure below). The "0" elements of the mask are areas for which a calculation is made.
Each calculation is imperatively realized on a cuboid block of minimal edges Nmin. So I try to find the best distribution of cuboids to fill the "0" regions of this mask.
Here is an example of mask : mask.npy
And its 3D representation :
mask = np.load('mask.npy')
fig = plt.figure()
ax1 = fig.add_subplot(projection='3d')
It is a packing problem. The goal is to pack various cuboids into a larger cuboid with some constraints (the blue areas). There are a lot of possible combinations.
The way I see it to solve this problem is to determine first the largest "0" area and then repeat this process until the mask is completely filled.
I already did this kind of thing in 2D (see fdgrid), but I'm stuck on this 3D problem. how can I find the largest block (cuboid) of '0' ?
I have a draft solution that is a bit brutal and very slow with this 50x50x50 grid.
import time
def cuboid_list(mask, Nmin=3):
""" List of all cuboids than could fit in the domain.
List is formated as (volume, (Lx, Ly, Lz)). """
size = mask[mask==0].size
cuboids = [( i*j*k, (i, j, k)) for i, j, k in itertools.product(range(Nmin, nx), range(Nmin, ny), range(Nmin, nz)) if i*j*k <= size]
return cuboids
def search_biggest(mask, cuboids):
"""Search biggest cuboid. """
t0 = time.perf_counter()
tested = 0
origin, size = None, None
for _, (Lx, Ly, Lz) in cuboids:
for o in itertools.product(range(nx-Lx), range(ny-Ly), range(nz-Lz)):
tested += 1
if (mask[o[0]:o[0]+Lx, o[1]:o[1]+Ly, o[2]:o[2]+Lz] == 0).all():
origin, size = o, (Lx, Ly, Lz)
if origin:
print(f'{tested} configurations tested in {time.perf_counter() - t0:.3f} s.')
return origin, size
# Load mask
mask = np.load('mask.npy')
# List of all cuboids
cuboids = cuboid_list(mask)
# Search biggest
sub = search_biggest(mask, cuboids)
In this particular case, this leads to 2795610 configurations tested in 164.984 s ! The largest cuboid has origin=(16, 16, 0) and size=(33, 33, 49).
To make the algorithm usable for larger geometries (e.g. 1024x1024x512 grid), I need to speed up this process.
def search_biggest(mask, cuboids):
"""Search biggest cuboid. """
# List of nonzero indexes
forbid = set([tuple(i) for i in np.argwhere(mask != 0)])
t0 = time.perf_counter()
tested = 0
origin, size = None, None
for _, (Lx, Ly, Lz) in cuboids:
# remove cuboid with origin in another cuboid and iterate
for o in set(itertools.product(range(nx-Lx), range(ny-Ly), range(nz-Lz))).difference(forbid):
tested += 1
if (mask[o[0]:o[0]+Lx, o[1]:o[1]+Ly, o[2]:o[2]+Lz] == 0).all():
origin, size = o, (Lx, Ly, Lz)
if origin:
print(f'{tested} configurations tested in {time.perf_counter() - t0:.3f} s.')
return origin, size
Reducing the possibilities leads to 32806 configurations tested in ~2.6s.
Any ideas ?
def search_biggest(mask, cuboids):
"""Search biggest cuboid. """
# List of nonzero indexes
forbid = set([tuple(i) for i in np.argwhere(mask != 0)])
t0 = time.perf_counter()
tested = 0
origin, size = None, None
for _, (Lx, Ly, Lz) in cuboids:
tested += 1
tmp = [i for i in itertools.product(range(nx-Lx), range(ny-Ly), range(nz-Lz)) if i not in forbid]
tmp = [i for i in tmp if
all(x not in forbid for x in itertools.product(range(i[0], i[0]+Lx), range(i[1], i[1]+Ly), range(i[2], i[2]+Lz)))]
if tmp:
origin, size = tmp[0], (Lx, Ly, Lz)
print(f'{tested} configurations tested in {time.perf_counter() - t0:.3f} s.')
return origin, size
Simple is better than complex ! Alternative implementation of search_biggest leads to 5991 configurations tested in 1.017 s. !
Maybe, I can reduce the list of cuboids to speed up things, but I think this is not the way to go. I tried with a 500x500x500 grid and stopped the script after an hour without a result...
I have two other possible formulations in mind:
Fill the cuboid with elementary cuboids of size Nmin x Nmin x Nmin then merge the cuboids having common faces. The problem is that in some situations, it is not possible to fill an empty region only with cuboids of this size.
Focus the search for empty rectangles on a plane (x, y, z=0) then extend them along z until a constraint is encountered. Repeat this operation for z $\in$ [1, nz-1] then merge cuboids that can be merged. The problem is that I can end up with some empty regions smaller than an elementary cuboid.
Always in search for a better way to solve this problem...

Given that the constraints are cuboids, we can simplify checking the possibilities by skipping the rows/columns/layers where nothing changes. For example, a 2D mask might look like this, with the redundant rows/columns marked:
v v
which can be reduced to
We just need to keep track of the true lengths of each reduced cell, and use that to calculate the real volume.
However, this means we need to take a different approach to finding the largest cuboid; we can't (easily) pre-generate them all and try in order of volume, since the true sizes vary depending on the point we're testing from.
The code below just tries all possible cuboids and keeps track of the largest one, and seems plenty fast.
There are also some off-by-one errors in the original code, which I've avoided or fixed: range(Nmin, nx) -> range(Nmin, nx+1) in cuboid_list(), and range(nx-Lx) -> range(nx-Lx+1) in search_biggest(). The largest cuboid actually has size=(34, 34, 50).
import itertools
import time
import numpy as np
class ReducedSolver:
def __init__(self, mask):
self.splits = tuple(
# find the edges of the constraint regions in each axis
[0, mask.shape[ax], *np.where(np.diff(mask, axis=ax, prepend=0))[ax]]
for ax in range(3)
# extract exactly one value (the lowest corner) for each reduced region
self.mask = np.zeros([len(split) - 1 for split in self.splits], dtype=mask.dtype)
for i, x in enumerate(self.splits[0][:-1]):
for j, y in enumerate(self.splits[1][:-1]):
for k, z in enumerate(self.splits[2][:-1]):
self.mask[i, j, k] = mask[x, y, z]
# less readable:
# self.mask = mask[np.ix_(self.splits[0][:-1], self.splits[1][:-1], self.splits[2][:-1])]
# true sizes of each region
self.sizes = [np.diff(split) for split in self.splits]
def solve(self):
"""Return list of cuboids in the format (origin, size), using a greedy approach."""
nx, ny, nz = self.mask.shape
mask = self.mask.copy()
result = []
while np.any(mask == 0):
t0 = time.perf_counter()
tested = 0
max_volume = 0
# first corner of the cuboid
for x1, y1, z1 in itertools.product(range(nx), range(ny), range(nz)):
if mask[x1, y1, z1]:
# opposite corner of the cuboid
for x2, y2, z2 in itertools.product(range(x1, nx), range(y1, ny), range(z1, nz)):
tested += 1
# slices are exclusive on the the end point
slc = (slice(x1, x2 + 1), slice(y1, y2 + 1), slice(z1, z2 + 1))
# np.any doesn't short-circuit
if any(np.nditer(mask[slc])):
true_size = tuple(np.sum(size[s]) for size, s in zip(self.sizes, slc))
volume =
if volume > max_volume:
max_volume = volume
origin = (
# keep track of the region in the reduced mask with `slc`
biggest = ((origin, true_size), slc)
print(f"{tested} configurations tested in {(time.perf_counter() - t0)*1000:.2f} ms.")
# mark this cuboid as filled
mask[biggest[1]] = 1
return result
# Load mask
mask = np.load("mask.npy")
# Solve on the simplified grid
reduced = ReducedSolver(mask)
sol = reduced.solve()
On my machine, finding the biggest cuboid on the example mask takes ~1s with search_biggest(), but is ~4ms with my code, and the entire solve takes ~20ms for 16 cuboids.


Order 2d points based on distance from each other [duplicate]

I have a list of (x,y)-coordinates that represent a line skeleton.
The list is obtained directly from a binary image:
import numpy as np
Now the points in the list are sorted according to their position in the image along one of the axes.
I would like to sort the list such that the order represents a smooth path along the line. (This is currently not the case where the line curves back).
Subsequently, I want to fit a spline to these points.
A similar problem has been described and solved using arcPy here. Is there a convenient way to achieve this using python, numpy, scipy, openCV (or another library?)
below is an example image. it results in a list of 59 (x,y)-coordinates.
when I send the list to scipy's spline fitting routine, I am running into a problem because the points aren't 'ordered' on the line:
I apologize for the long answer in advance :P (the problem is not that simple).
Lets start by rewording the problem. Finding a line that connects all the points, can be reformulated as a shortest path problem in a graph, where (1) the graph nodes are the points in the space, (2) each node is connected to its 2 nearest neighbors, and (3) the shortest path passes through each of the nodes only once. That last constrain is a very important (and quite hard one to optimize). Essentially, the problem is to find a permutation of length N, where the permutation refers to the order of each of the nodes (N is the total number of nodes) in the path.
Finding all the possible permutations and evaluating their cost is too expensive (there are N! permutations if I'm not wrong, which is too big for problems). Bellow I propose an approach that finds the N best permutations (the optimal permutation for each of the N points) and then find the permutation (from those N) that minimizes the error/cost.
1. Create a random problem with unordered points
Now, lets start to create a sample problem:
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 2 * np.pi, 100)
y = np.sin(x)
plt.plot(x, y)
And here, the unsorted version of the points [x, y] to simulate a random points in space connected in a line:
idx = np.random.permutation(x.size)
x = x[idx]
y = y[idx]
plt.plot(x, y)
The problem is then to order those points to recover their original order so that the line is plotted properly.
2. Create 2-NN graph between nodes
We can first rearrange the points in a [N, 2] array:
points = np.c_[x, y]
Then, we can start by creating a nearest neighbour graph to connect each of the nodes to its 2 nearest neighbors:
from sklearn.neighbors import NearestNeighbors
clf = NearestNeighbors(2).fit(points)
G = clf.kneighbors_graph()
G is a sparse N x N matrix, where each row represents a node, and the non-zero elements of the columns the euclidean distance to those points.
We can then use networkx to construct a graph from this sparse matrix:
import networkx as nx
T = nx.from_scipy_sparse_matrix(G)
3. Find shortest path from source
And, here begins the magic: we can extract the paths using dfs_preorder_nodes, which will essentially create a path through all the nodes (passing through each of them exactly once) given a starting node (if not given, the 0 node will be selected).
order = list(nx.dfs_preorder_nodes(T, 0))
xx = x[order]
yy = y[order]
plt.plot(xx, yy)
Well, is not too bad, but we can notice that the reconstruction is not optimal. This is because the point 0 in the unordered list lays in the middle of the line, that is way it first goes in one direction, and then comes back and finishes in the other direction.
4. Find the path with smallest cost from all sources
So, in order to obtain the optimal order, we can just get the best order for all the nodes:
paths = [list(nx.dfs_preorder_nodes(T, i)) for i in range(len(points))]
Now that we have the optimal path starting from each of the N = 100 nodes, we can discard them and find the one that minimizes the distances between the connections (optimization problem):
mindist = np.inf
minidx = 0
for i in range(len(points)):
p = paths[i] # order of nodes
ordered = points[p] # ordered nodes
# find cost of that order by the sum of euclidean distances between points (i) and (i+1)
cost = (((ordered[:-1] - ordered[1:])**2).sum(1)).sum()
if cost < mindist:
mindist = cost
minidx = i
The points are ordered for each of the optimal paths, and then a cost is computed (by calculating the euclidean distance between all pairs of points i and i+1). If the path starts at the start or end point, it will have the smallest cost as all the nodes will be consecutive. On the other hand, if the path starts at a node that lies in the middle of the line, the cost will be very high at some point, as it will need to travel from the end (or beginning) of the line to the initial position to explore the other direction. The path that minimizes that cost, is the path starting in an optimal point.
opt_order = paths[minidx]
Now, we can reconstruct the order properly:
xx = x[opt_order]
yy = y[opt_order]
plt.plot(xx, yy)
One possible solution is to use a nearest neighbours approach, possible by using a KDTree. Scikit-learn has an nice interface. This can then be used to build a graph representation using networkx. This will only really work if the line to be drawn should go through the nearest neighbours:
from sklearn.neighbors import KDTree
import numpy as np
import networkx as nx
G = nx.Graph() # A graph to hold the nearest neighbours
X = [(0, 1), (1, 1), (3, 2), (5, 4)] # Some list of points in 2D
tree = KDTree(X, leaf_size=2, metric='euclidean') # Create a distance tree
# Now loop over your points and find the two nearest neighbours
# If the first and last points are also the start and end points of the line you can use X[1:-1]
for p in X
dist, ind = tree.query(p, k=3)
print ind
# ind Indexes represent nodes on a graph
# Two nearest points are at indexes 1 and 2.
# Use these to form edges on graph
# p is the current point in the list
n1, l1 = X[ind[0][1]], dist[0][1] # The next nearest point
n2, l2 = X[ind[0][2]], dist[0][2] # The following nearest point
G.add_edge(p, n1)
G.add_edge(p, n2)
print G.edges() # A list of all the connections between points
print nx.shortest_path(G, source=(0,1), target=(5,4))
>>> [(0, 1), (1, 1), (3, 2), (5, 4)] # A list of ordered points
Update: If the start and end points are unknown and your data is reasonably well separated, you can find the ends by looking for cliques in the graph. The start and end points will form a clique. If the longest edge is removed from the clique it will create a free end in the graph which can be used as a start and end point. For example, the start and end points in this list appear in the middle:
X = [(0, 1), (0, 0), (2, 1), (3, 2), (9, 4), (5, 4)]
After building the graph, now its a case of removing the longest edge from the cliques to find the free ends of the graph:
def find_longest_edge(l):
e1 = G[l[0]][l[1]]['weight']
e2 = G[l[0]][l[2]]['weight']
e3 = G[l[1]][l[2]]['weight']
if e2 < e1 > e3:
return (l[0], l[1])
elif e1 < e2 > e3:
return (l[0], l[2])
elif e1 < e3 > e2:
return (l[1], l[2])
end_cliques = [i for i in list(nx.find_cliques(G)) if len(i) == 3]
edge_lengths = [find_longest_edge(i) for i in end_cliques]
edges = G.edges()
start_end = [n for n,nbrs in G.adjacency_iter() if len(nbrs.keys()) == 1]
print nx.shortest_path(G, source=start_end[0], target=start_end[1])
>>> [(0, 0), (0, 1), (2, 1), (3, 2), (5, 4), (9, 4)] # The correct path
I had the exact same problem. If you have two arrays of scattered x and y values that are not too curvy, then you can transform the points into PCA space, sort them in PCA space, and then transform them back. (I've also added in some bonus smoothing functionality).
import numpy as np
from scipy.signal import savgol_filter
from sklearn.decomposition import PCA
def XYclean(x,y):
xy = np.concatenate((x.reshape(-1,1), y.reshape(-1,1)), axis=1)
# make PCA object
pca = PCA(2)
# fit on data
#transform into pca space
xypca = pca.transform(xy)
newx = xypca[:,0]
newy = xypca[:,1]
indexSort = np.argsort(x)
newx = newx[indexSort]
newy = newy[indexSort]
#add some more points (optional)
f = interpolate.interp1d(newx, newy, kind='linear')
newX=np.linspace(np.min(newx), np.max(newx), 100)
newY = f(newX)
#smooth with a filter (optional)
window = 43
newY = savgol_filter(newY, window, 2)
#return back to old coordinates
xyclean = pca.inverse_transform(np.concatenate((newX.reshape(-1,1), newY.reshape(-1,1)), axis=1) )
yc = xyclean[:,1]
return xc, yc
I agree with Imanol_Luengo Imanol Luengo's solution, but if you know the index of the first point, then there is a considerably easier solution that uses only NumPy:
def order_points(points, ind):
points_new = [ points.pop(ind) ] # initialize a new list of points with the known first point
pcurr = points_new[-1] # initialize the current point (as the known point)
while len(points)>0:
d = np.linalg.norm(np.array(points) - np.array(pcurr), axis=1) # distances between pcurr and all other remaining points
ind = d.argmin() # index of the closest point
points_new.append( points.pop(ind) ) # append the closest point to points_new
pcurr = points_new[-1] # update the current point
return points_new
This approach appears to work well with the sine curve example, especially because it is easy to define the first point as either the leftmost or rightmost point.
For the img_skeleton data cited in the question, it would be similarly easy to algorithmically obtain the first point, for example as the topmost point.
# create sine curve:
x = np.linspace(0, 2 * np.pi, 100)
y = np.sin(x)
# shuffle the order of the x and y coordinates:
idx = np.random.permutation(x.size)
xs,ys = x[idx], y[idx] # shuffled points
# find the leftmost point:
ind = xs.argmin()
# assemble the x and y coordinates into a list of (x,y) tuples:
points = [(xx,yy) for xx,yy in zip(xs,ys)]
# order the points based on the known first point:
points_new = order_points(points, ind)
# plot:
fig,ax = plt.subplots(1, 2, figsize=(10,4))
xn,yn = np.array(points_new).T
ax[0].plot(xs, ys) # original (shuffled) points
ax[1].plot(xn, yn) # new (ordered) points
I am working on a similar problem, but it has an important constraint (much like the example given by the OP) which is that each pixel has either one or two neighboring pixel, in the 8-connected sense. With this constraint, there is a very simple solution.
def sort_to_form_line(unsorted_list):
Given a list of neighboring points which forms a line, but in random order,
sort them to the correct order.
IMPORTANT: Each point must be a neighbor (8-point sense)
to a least one other point!
sorted_list = [unsorted_list.pop(0)]
while len(unsorted_list) > 0:
i = 0
while i < len(unsorted_list):
if are_neighbours(sorted_list[0], unsorted_list[i]):
#neighbours at front of list
sorted_list.insert(0, unsorted_list.pop(i))
elif are_neighbours(sorted_list[-1], unsorted_list[i]):
#neighbours at rear of list
i = i+1
return sorted_list
def are_neighbours(pt1, pt2):
Check if pt1 and pt2 are neighbours, in the 8-point sense
pt1 and pt2 has integer coordinates
return (np.abs(pt1[0]-pt2[0]) < 2) and (np.abs(pt1[1]-pt2[1]) < 2)
Modifying upon Toddp's answer , you can find end-points of arbitrarily shaped lines using this code and then order the points as Toddp stated, this is much faster than Imanol Luengo's answer, the only constraint is that the line must have only 2 end-points :
def order_points(points):
if isinstance(points,np.ndarray):
assert points.shape[1]==2
points = points.tolist()
exts = get_end_points(points)
assert len(exts) ==2
ind = points.index(exts[0])
points_new = [ points.pop(ind) ] # initialize a new list of points with the known first point
pcurr = points_new[-1] # initialize the current point (as the known point)
while len(points)>0:
d = np.linalg.norm(np.array(points) - np.array(pcurr), axis=1) # distances between pcurr and all other remaining points
ind = d.argmin() # index of the closest point
points_new.append( points.pop(ind) ) # append the closest point to points_new
pcurr = points_new[-1] # update the current point
return points_new
def get_end_points(ptsxy):
#source :
if isinstance(ptsxy,list): ptsxy = np.array(ptsxy)
assert ptsxy.shape[1]==2
#translate to (0,0)for faster excution
xx,yy,w,h = cv2.boundingRect(ptsxy)
pts_translated = ptsxy -(xx,yy)
bim = np.zeros((h+1,w+1))
extremes = []
for p in pts_translated:
x = p[0]
y = p[1]
n = 0
n += bim[y - 1,x]
n += bim[y - 1,x - 1]
n += bim[y - 1,x + 1]
n += bim[y,x - 1]
n += bim[y,x + 1]
n += bim[y + 1,x]
n += bim[y + 1,x - 1]
n += bim[y + 1,x + 1]
n /= 255
if n == 1:
extremes = np.array(extremes)+(xx,yy)
return extremes.tolist()

An algorithm to sort top and bottom slices of curved surfaces

I try to do:
Cut STL file at Z-coordinate using PyVsita )
Extract point's coordinates X, Y at given section Z
Sort points to Upper and Down groups for further manipulation
Here is my code:
import pyvista as pv
import matplotlib.pylab as plt
import numpy as np
import math
mesh ='wing_fish.stl')
z_slice = [0, 0, 1] # normal to cut at
single_slice = mesh.slice(normal=z_slice, origin=[0, 0, 200]) # slicing
a = single_slice.points # choose only points
# p = pv.Plotter() #show section
# p.add_mesh(single_slice)
a = a[a[:,0].astype(float).argsort()] # sort all points by Х coord
# X min of all points
x0 = a[0][0]
# Y min of all points
y0 = a[0][1]
# X tail 1 of 2
xn = a[-1][0]
# Y tail 1 of 2
yn = a[-1][1]
# X tail 2 of 2
xn2 = a[-2][0]
# Y tail 2 of 2
yn2 = a[-2][1]
def line_y(x, x0, y0, xn, yn):
# return y coord at arbitary x coord of x0, y0 xn, yn LINE
return ((x - x0)*(yn-y0))/(xn-x0)+y0
def line_c(x0, y0, xn, yn):
# return x, y middle points of LINE
xc = (x0+xn)/2
yc = (y0+yn)/2
return xc, yc
def chord(P1, P2):
return math.sqrt((P2[1] - P1[1])**2 + (P2[0] - P1[0])**2)
xc_end, yc_end = line_c(xn, yn, xn2, yn2) # return midle at trailing edge
midLine = np.array([[x0,y0],[xc_end,yc_end]],dtype='float32')
c_temp_x_d = []
c_temp_y_d = []
c_temp_x_u = []
c_temp_y_u = []
isUp = None
isDown = None
for i in a:
if i[1] == line_y(i[0], x0=x0, y0=y0, xn=xc_end, yn=yc_end):
elif i[1] < line_y(i[0], x0=x0, y0=y0, xn=xc_end, yn=yc_end):
isDown = True
isUp = True
if len(c_temp_y_d) != 0 and len(c_temp_y_u) != 0:
plt.plot(c_temp_x_d, c_temp_y_d, label='suppose to be down points')
plt.plot(c_temp_x_u, c_temp_y_u, label='suppose to be upper points')
plt.plot(midLine[:,0], midLine[:,1], label='Chord')
plt.scatter(a[:,0],a[:,1], label='raw points')
What I have:
What I want:
I would highly appreciate for any help and advises!
Thanks in advance!
You are discarding precious connectivity information that is already there in your STL mesh and in your slice!
I couldn't think of a more idiomatic solution within PyVista, but at worst you can take the cell (line) information from the slice and start walking your shape (that is topologically equivalent to a circle) from its left side to its right, and vice versa. Here's one way:
import numpy as np
import matplotlib.pyplot as plt
import pyvista as pv
mesh ='../wing_fish.stl')
z_slice = [0, 0, 1] # normal to cut at
single_slice = mesh.slice(normal=z_slice, origin=[0, 0, 200]) # slicing
# find points with smallest and largest x coordinate
points = single_slice.points
left_ind = points[:, 0].argmin()
right_ind = points[:, 0].argmax()
# sanity check for what we're about to do:
# 1. all cells are lines
assert single_slice.n_cells == single_slice.n_points
assert (single_slice.lines[::3] == 2).all()
# 2. all points appear exactly once as segment start and end
lines = single_slice.lines.reshape(-1, 3) # each row: [2, i_from, i_to]
assert len(set(lines[:, 1])) == lines.shape[0]
# create an auxiliary dict with from -> to index mappings
conn = dict(lines[:, 1:])
# and a function that walks this connectivity graph
def walk_connectivity(connectivity, start, end):
this_ind = start
path_inds = [this_ind]
while True:
next_ind = connectivity[this_ind]
this_ind = next_ind
if this_ind == end:
# we're done
return path_inds
# start walking at point left_ind, walk until right_ind
first_side_inds = walk_connectivity(conn, left_ind, right_ind)
# now walk forward for the other half curve
second_side_inds = walk_connectivity(conn, right_ind, left_ind)
# get the point coordinates for plotting
first_side_points = points[first_side_inds, :-1]
second_side_points = points[second_side_inds, :-1]
# plot the two sides
fig, ax = plt.subplots()
In order to avoid using an O(n^2) algorithm, I defined an auxiliary dict that maps line segment start indices to end indices. In order for this to work we need some sanity checks, namely that the cells are all simple line segments, and that each segment has the same orientation (i.e. each start point is unique, and each end point is unique). Once we have this it's easy to start from the left edge of your wing profile and walk each line segment until we find the right edge.
The nature of this approach implies that we can't know a priori whether the path from left to right goes on the upper or the lower path. This needs experimentation on your part; name the two paths in whatever way you see fit.
And of course there's always room for fine tuning. For instance, the above implementation creates two paths that both start and end with the left and right-side boundary points of the mesh. If you want the top and bottom curves to share no points, you'll have to adjust the algorithm accordingly. And if the end point is not found on the path then the current implementation will give you an infinite loop with a list growing beyond all available memory. Consider adding some checks in the implementation to avoid this.
Anyway, this is what we get from the above:

Find closest point in 2D mashed array

To give y'all some context, I'm doing this inversion technique where I am trying to reproduce a profile using the integrated values. To do that I need to find the value within an array along a certain line(s). To exemplify my issue I have the following code:
fig, ax = plt.subplots(1, figsize = (10,10))
#Create the grid (different grid spacing):
X = np.arange(0,10.01,0.25)
Y = np.arange(0,10.01,1.00)
#Create the 2D array to be plotted
Z = []
for i in range(np.size(X)):
Zaux = []
for j in range(np.size(Y)):
Zaux.append(i*j + j)
ax.scatter(X[i],Y[j], color = 'red', s = 0.25)
#Mesh the 1D grids:
Ymesh, Xmesh = np.meshgrid(Y, X)
#Plot the color plot:
ax.pcolor(Y,X, Z, cmap='viridis', vmin=np.nanmin(Z), vmax=np.nanmax(Z))
#Plot the points in the grid of the color plot:
for i in range(np.size(X)):
for j in range(np.size(Y)):
ax.scatter(Y[j],X[i], color = 'red', s = 3)
#Create a set of lines:
for i in np.linspace(0,2,5):
X_line = np.linspace(0,10,256)
Y_line = i*X_line*3.1415-4
#Plot each line:
ax.plot(X_line,Y_line, color = 'blue')
That outputs this graph:
I need to find the closest points in Z that are being crossed by each of the lines. The idea is to integrate the values in Z that are crossed by the blue lines and plot that as a function of slope of the lines. Anyone has a good solution for it? I've tried a set of for loops, but I think it's kind of clunky.
Anyway, thanks for your time...
I am not sure about the closest points thing. That seems "clunky" too. What if it passes exactly in the middle between two points? Also I already had written code that weighs the four neighbor pixels by their closeness for an other project so I am going with that. Also I take the liberty of not rescaling the picture.
i,j = np.meshgrid(np.arange(41),np.arange(11))
Z = i*j + j
class Image_knn():
def fit(self, image):
self.image = image.astype('float')
def predict(self, x, y):
image = self.image
weights_x = [1-(x % 1), x % 1]
weights_y = [1-(y % 1), y % 1]
start_x = np.floor(x).astype('int')
start_y = np.floor(y).astype('int')
return sum([image[np.clip(np.floor(start_x + x), 0, image.shape[0]-1).astype('int'),
np.clip(np.floor(start_y + y), 0, image.shape[1]-1).astype('int')] * weights_x[x]*weights_y[y]
for x,y in itertools.product(range(2),range(2))])
And a little sanity check it returns the picture if we give it it's coordinates.
image_model = Image_knn()
assert np.allclose(image_model.predict(*np.where(np.ones(Z.shape, dtype='bool'))).reshape((11,41)), Z)
I generate m=100 lines and scale the points on them so that they are evenly spaced. Here is a plot of every 10th of them.
n = 1000
m = 100
slopes = np.linspace(1e-10,10,m)
t, slope = np.meshgrid(np.linspace(0,1,n), slopes)
x_max, y_max = Z.shape[0]-1, Z.shape[1]-1
lines_x = t
lines_y = t*slope
scales = np.broadcast_to(np.stack([x_max/lines_x[:,-1], y_max/lines_y[:,-1]]).min(axis=0), (n,m)).T
lines_x *= scales
lines_y *= scales
And finally I can get the "points" consisting of slope and "integral" and draw it. You probably should take a closer look at the "integral" it's just a ruff guess of mine.
points = np.array([(slope, np.mean(image_model.predict(lines_x[i],lines_y[i]))
for i,slope in enumerate(slopes)])
Notice the %%timeit in the last block. This takes ~38.3 ms on my machine and therefore wasn't optimized. As Donald Knuth puts it "premature optimization is the root of all evil". If you were to optimize this you would remove the for loop, shove all the coordinates for line points in the model at once by reshaping and reshaping back and then organize them with the slopes. But I saw no reason to put myself threw that for a few ms.
And finally we get a nice cusp as a reward. Notice that it makes sense that the maximum is at 4 since the diagonal is at a slope of 4 for our 40 by 10 picture. The intuition for the cusp is a bit harder to explain but I guess you probably have that already. For the length it comes down to the function (x,y) -> sqrt(x^2+y^2) having different directional differentials when going up and when going left on the rectangle.

Generating multiple random (x, y) coordinates, excluding duplicates?

I want to generate a bunch (x, y) coordinates from 0 to 2500 that excludes points that are within 200 of each other without recursion.
Right now I have it check through a list of all previous values to see if any are far enough from all the others. This is really inefficient and if I need to generate a large number of points it takes forever.
So how would I go about doing this?
This is a variant on Hank Ditton's suggestion that should be more efficient time- and memory-wise, especially if you're selecting relatively few points out of all possible points. The idea is that, whenever a new point is generated, everything within 200 units of it is added to a set of points to exclude, against which all freshly-generated points are checked.
import random
radius = 200
rangeX = (0, 2500)
rangeY = (0, 2500)
qty = 100 # or however many points you want
# Generate a set of all points within 200 of the origin, to be used as offsets later
# There's probably a more efficient way to do this.
deltas = set()
for x in range(-radius, radius+1):
for y in range(-radius, radius+1):
if x*x + y*y <= radius*radius:
randPoints = []
excluded = set()
i = 0
while i<qty:
x = random.randrange(*rangeX)
y = random.randrange(*rangeY)
if (x,y) in excluded: continue
i += 1
excluded.update((x+dx, y+dy) for (dx,dy) in deltas)
print randPoints
I would overgenerate the points, target_N < input_N, and filter them using a KDTree. For example:
import numpy as np
from scipy.spatial import KDTree
N = 20
pts = 2500*np.random.random((N,2))
tree = KDTree(pts)
print tree.sparse_distance_matrix(tree, 200)
Would give me points that are "close" to each other. From here it should be simple to apply any filter:
(11, 0) 60.843426339
(0, 11) 60.843426339
(1, 3) 177.853472309
(3, 1) 177.853472309
Some options:
Use your algorithm but implement it with a kd-tree that would speed up nearest neighbours look-up
Build a regular grid over the [0, 2500]^2 square and 'shake' all points randomly with a bi-dimensional normal distribution centered on each intersection in the grid
Draw a larger number of random points then apply a k-means algorithm and only keep the centroids. They will be far away from one another and the algorithm, though iterative, could converge more quickly than your algorithm.
This has been answered, but it's very tangentially related to my work so I took a stab at it. I implemented the algorithm described in this note which I found linked from this blog post. Unfortunately it's not faster than the other proposed methods, but I'm sure there are optimizations to be made.
import numpy as np
import matplotlib.pyplot as plt
def lonely(p,X,r):
m = X.shape[1]
x0,y0 = p
x = y = np.arange(-r,r)
x = x + x0
y = y + y0
u,v = np.meshgrid(x,y)
u[u < 0] = 0
u[u >= m] = m-1
v[v < 0] = 0
v[v >= m] = m-1
return not np.any(X[u[:],v[:]] > 0)
def generate_samples(m=2500,r=200,k=30):
# m = extent of sample domain
# r = minimum distance between points
# k = samples before rejection
active_list = []
# step 0 - initialize n-d background grid
X = np.ones((m,m))*-1
# step 1 - select initial sample
x0,y0 = np.random.randint(0,m), np.random.randint(0,m)
X[active_list[0]] = 1
# step 2 - iterate over active list
while active_list:
i = np.random.randint(0,len(active_list))
rad = np.random.rand(k)*r+r
theta = np.random.rand(k)*2*np.pi
# get a list of random candidates within [r,2r] from the active point
candidates = np.round((rad*np.cos(theta)+active_list[i][0], rad*np.sin(theta)+active_list[i][1])).astype(np.int32).T
# trim the list based on boundaries of the array
candidates = [(x,y) for x,y in candidates if x >= 0 and y >= 0 and x < m and y < m]
for p in candidates:
if X[p] < 0 and lonely(p,X,r):
X[p] = 1
del active_list[i]
return X
X = generate_samples(2500, 200, 10)
s = np.where(X>0)
And the results:
Per the link, the method from aganders3 is known as Poisson Disc Sampling. You might be able to find more efficient implementations that use a local grid search to find 'overlaps.' For example Poisson Disc Sampling. Because you are constraining the system, it cannot be completely random. The maximum packing for circles with uniform radii in a plane is ~90% and is achieved when the circles are arranged in a perfect hexagonal array. As the number of points you request approaches the theoretical limit, the generated arrangement will become more hexagonal. In my experience, it is difficult to get above ~60% packing with uniform circles using this approach.
the following method uses list comprehension, but I am generating integers you can use different random generators for different datatypes
arr = [[random.randint(-4, 4), random.randint(-4, 4)] for i in range(40)]

Fast 3 to 7 D interpolation on non uniform & non rectangular grid

I'm looking for a way of interpolating large set of data in dimensions ranging from 3 to 7.
The data is, by nature, on a non rectangular grid and non uniformly spaced.
I looked every option I could think of (griddata, KDTree + magic, linear interpolation, reworked map_coordinates...): the fastest and most usable tool seems to be Scipy's LinearNDInterpolator function. Linear interpolation in such high dimensions space is fine and should be precise enough.
However, there is one big shortcoming with this class: data with gaps or "concave regions" will produce extrapolated results when I want only interpolation.
This is best seen with some pictures (2-D test). In the following I produce some randomly generated data for X, and VALUE, while Y is upper-bounded by a function of X (just so I create gaps).
After rescaling data (mostly done using pieces of code of the LinearNDIterpolator from the master, ie. development, branch), Delaunay triangulation will produce a Convex Hull that includes the gap, and will "extrapolate" in this region. The term "extrapolate" is not really correct here, in a technical sense, but I think would be appropriate given the fact original data is assumed to be sufficiently well sampled so that the big gaps means "no data allowed" (not physical).
To start handling the problem, I "tagged" every Delaunay (hyper-)triangles whose (hyper-)volume is higher than a user-defined threshold (by default the volume equivalent to 5% of the data extent in each dimension).
Generating random data, and evaluating the values using this technique would produce the following figure:
Black dots (with red or white rings) is the randomly generated data to be evaluated. Red rings indicates points that are being rejected (ie. value = NaN) by my custom class based on LinearNDInterpolator, and white rings show accepted points.
For clarity I've plotted triangles that have been rejected from the original Delaunay triangulation.
As you can see, there are still some white rings points that fall in the gap, which I do not want. This is because the simplex to which they belong has a volume less than the authorized maximum volume (some of these triangles even appear as lines on the figure, so it is hard to see)
My question is: how could I improve from here? What could be done?
I was thinking of grabbing all points that fall in a small ball around each evaluated point, and see if there are points in that. But this is not a good solution since it would be resource-consuming and not precise enough (eg. what about points very close to the bottom of the gap, but yet outside the upper envelope?)
Here is my custom interpolation module I used:
#!/usr/bin/env python
Custom N-D linear interpolation class, based on scipy's LinearNDInterpolator.
The main differences are:
- auto-scaling
- interpolation: inside convex hull (normal behavior), and "close enough" to original data.
This rejects points that would normally be interpolated by LinearNDInterpolator.
# ================
# Python modules
# ================
import cPickle
import numpy as np
from scipy.spatial import Delaunay
from scipy.interpolate import LinearNDInterpolator
from scipy.misc import factorial
# =======================
# Convenience functions
# =======================
def _inv_log10(x):
return 10**x
def det(coords): #, n):
Return the determinant of the given coordinates (not the usual determinant, but the one used to compute
the hyper-volume of an hyper-triangle)
From a Delaunay triangulation, the coordinates of one simplex (ie. hyper-triangle) is given by:
coords_i = tri.points[simplex_i]
tri = Delaunay(points)
simplex_i = tri.simplices[i]
In an N-dimensional space, the simplex will have N+1 points, each one of them of dimension N.
Eg. in 3D, a points i has coordinates pi = (xi, yi, zi). Therefore p1 = points 1 = (x1, y1, z1)
|x1 x2 x3 x4|
|y1 y2 y3 y4| |(x1-x4) (x2-x4) (x3-x4)|
det = |z1 z2 z3 z4| = |(y1-y4) (y2-y4) (y3-y4)|
|1 1 1 1 | |(z1-z4) (z2-z4) (z3-z4)|
# assert n == len(coords[0]), 'number of dimensions of coordinates (%d) != %d' % (len(coords[0]), n)
q = coords[:-1, :] - coords[-1, None, :]
sign, logdet = np.linalg.slogdet(q)
return sign * np.exp(logdet)
# ==============================
# LinearNDInterpolator wrapper
# ==============================
class Interp(object):
Simple wrapper around LinearNDInterpolator.
def __init__(self, points, values, **kwargs):
:param points: list of coordinates (eg. [(0, 1), (0, 3), (4, 4.5)] for 3 points in 2-D)
:param values: list of associated value(s) for each point (eg. [1, 2, 3] for 3 points of single value)
:keyword rescale: rescale data points so that the final extents is [0, 1] in every dimensions
:keyword transform: transform data points (prior to rescaling). If True, automatically transform dimension coordinates
if extents span more than 2 order of magnitudes. It can also be a list of tuples of
(transformation function, inverse function), that will be applied whenever needed.
:keyword fill_value: outside bounds interpolation values (default: np.nan)
points = np.asanyarray(points, dtype=np.float64)
values = np.asanyarray(values, dtype=np.float64)
except ValueError:
raise ValueError('Cannot convert input points to an array of floats')
# dimensions / number of points and values
self.ndim = points.shape[1]
self.nvalues = values.shape[1]
self.npoints = points.shape[0]
# locals
self._idims = range(self.ndim)
# extents
self.minis = np.min(points, axis=0)
self.maxis = np.max(points, axis=0)
self.ranges = self.maxis - self.minis
self.magnitudes = self.maxis / self.minis
# options
rescale = kwargs.pop('rescale', True)
transform = kwargs.pop('transform', True)
fill_value = kwargs.pop('fill_value', np.nan)
# transformation
if transform:
transforms = []
if transform is True:
# automatic transformation -> if extent >= 2 order of magnitudes: f(x) = log10(x)
for i, e in enumerate(self.magnitudes):
if e >= 100.:
transforms.append((np.log10, _inv_log10))
if not transforms:
transforms = None
err_msg = 'transform: both the transformation function and its inverse must be given in a tuple'
if not isinstance(transform, (tuple, list)):
raise ValueError(err_msg)
if (self.ndim > 1) and (len(transform) != self.ndim):
raise ValueError('transform: None or transformations tuple must be given for every dimension')
for t in transform:
if not isinstance(t, (tuple, list)):
raise ValueError(err_msg)
elif t is None:
self.transforms = transforms
self.transforms = None
points = self._transform(points)
# scaling
self.offset = 0.
self.scale = 1.
self.rescale = rescale
if rescale:
self.offset = np.mean(points, axis=0)
self.scale = (points - self.offset).ptp(axis=0)
self.scale[~(self.scale > 0)] = 1.0 # avoid division by 0
points = self._rescale(points)
# triangulation
self.tri = self._triangulate(points)
# volumes
self.fact = 1. / factorial(self.ndim)
self.volume_max = np.product(self.tri.points.ptp(axis=0) * 0.05) # 5% peak-to-peak in each dimension
self.rej_idx = None
self.rej_vol = None
self.cached_rej = False
# linear interpolation
self.fill_value = fill_value
self.func = LinearNDInterpolator(self.tri, values, fill_value=fill_value)
def _triangulate(self, points, **kwargs):
Delaunay triangulation
return Delaunay(points, **kwargs)
def _get_volume_simplex(self, point):
Compute the simplex volume of the given point
i = self.tri.find_simplex(point)
idx = self.tri.simplices[i]
return np.abs(self.fact * det(self.tri.points[idx]))
def cache_rejected_triangles(self, p=None, check_min=False):
Cache the indexes of rejected triangles.
p -- peak-to-peak percentage in each dimension for the maximum volume calculation
Default: None (default at __init__: p = 0.05)
Type: float (0 < p <= 1)
Type: list of floats (length = # dimensions)
check_min -- check that the minimum spacing in each dimension is at least equal to p * extent
Default: False
Warning: *p* must be given
self.cached_rej = True
if p is not None:
p = np.array(p)
# update the maximum hyper-triangle volume (p % of the extent in each dimension)
self.volume_max = np.product(self.tri.points.ptp(axis=0) * p)
if check_min:
assert p is not None, 'You must give *p* parameter for checking minimum volume of hyper-triangle'
ptps = self.tri.points.ptp(axis=0)
ps = np.ones(self.ndim) * p
n_up = 0
for i in self._idims:
_x = np.unique(self.tri.points[:, i])
mini = np.min(_x[1:] - _x[:-1])
if mini > (ptps[i] * ps[i]):
n_up += 1
print 'WARNING: changed max. volume axis of dim. %d from %.3g to %.3g' % (i+1, ps[i], mini)
ps[i] = mini
if n_up:
new_vol = np.product(ptps * ps)
print 'CHANGE: old volume was = %.3g, and is now = %.3g' % (self.volume_max, new_vol)
self.volume_max = new_vol
rej_idx = []
rej_vol = []
for i, simplex in enumerate(self.tri.simplices):
vol = np.abs(self.fact * det(self.tri.points[simplex]))
if vol > self.volume_max:
self.rej_idx = np.array(rej_idx)
self.rej_vol = np.array(rej_vol)
def _transform(self, points, inverse=False):
Transform point coordinates using functions. Set 'inverse' to True to transform back.
if self.transforms is not None:
j = 1 - int(inverse)
for i in self._idims:
t = self.transforms[i]
if t is None:
points[:, i] = t[j](points[:, i])
return points
def _rescale(self, points, inverse=False):
Rescale point coordinates so that extents in each dimensions span [0, 1]. Set 'inverse' to True to scale back.
if self.rescale:
if inverse:
points = points * self.scale + self.offset
points = (points - self.offset) / self.scale
return points
def _check(self, x, res):
Check that interpolation results are close enough to real data and have not been extrapolated.
points = np.asanyarray(x)
if points.ndim == 1:
# only 1 point
values = np.asanyarray(res).reshape(1, self.ndim)
# more than 1 point
values = np.asanyarray(res).reshape(points.shape[0], self.ndim)
if self.cached_rej:
idx = np.unique(np.where(np.isfinite(values))[0])
ui_tri, uii = np.unique(self.tri.find_simplex(points[idx]), return_inverse=True)
umask = np.lib.arraysetops.in1d(ui_tri, self.rej_idx, assume_unique=True)
mask = umask[uii]
values[idx[mask], :] = self.fill_value
for i, v in enumerate(values):
if not np.isnan(v[0]):
vol = self._get_volume_simplex(points[i])
if vol > self.volume_max:
# reject
values[i][:] = self.fill_value
return values.reshape(res.shape)
def __call__(self, x, check=False):
Interpolate. If 'check' is True, check that interpolated points are close enough to real data.
_x = self._rescale(self._transform(x))
res = self.func(_x)
if check:
res = self._check(_x, res)
return res
def ev(self, x, check=False):
Alias for __call__
return self.__call__(x, check=check)
def get_original_points(self):
Return original points
return self._transform(self._rescale(self.func.points, inverse=True), inverse=True)
def get_original_values(self):
Return original values
return self.func.values
# ===========================
# Save / load interpolation
# ===========================
def save(filename, interp):
Dump the Interp instance to a binary file with cPickle (protocol 2)
with open(filename, 'wb') as f:
cPickle.dump(interp, f, protocol=2)
def load(filename):
Load a previously saved (cPickled with save_interp function) Interp instance
with open(filename, 'rb') as f:
interp = cPickle.load(f)
return interp
And the test script:
#!/usr/bin/env python
Test the custom interpolation class (see
import sys
import numpy as np
from interp import Interp
import matplotlib.pyplot as plt
# generate random data
n = 2000 # number of generated points
x = np.random.random(n)
def f(v):
maxi = v ** (1/(v+1e-5)) * (v - 5.) ** 2 - np.exp(v-7) + 1
return np.random.random() * maxi
y = map(f, x * 10)
z = np.random.random(n)
points = np.array((x, y)).T
values = np.random.random(points.shape)
# create interpolation function
func = Interp(points, values, transform=False)
func.cache_rejected_triangles(p=0.05, check_min=True)
# generate random data + evaluate
pts = np.random.random((500, points.shape[1]))
pts *= points.ptp(0)
pts += points.min(0)
res = func(pts, check=True)
# rejected points indexes
idx_rej = np.unique(np.where(np.isnan(res))[0])
n_rej = len(idx_rej)
print '%d points (%.0f%%) have been rejected' % (n_rej, 100.*n_rej/pts.shape[0])
# plot rejected triangles
fig = plt.figure()
ax = plt.gca()
for i in func.rej_idx:
_x = [p for p in points[func.tri.simplices[i], 0]]
_x += [points[func.tri.simplices[i][0], 0]]
_y = [p for p in points[func.tri.simplices[i], 1]]
_y += [points[func.tri.simplices[i][0], 1]]
ax.plot(_x, _y, c='k', ls='-', zorder=100)
# plot original data
ax.scatter(points[:, 0], points[:, 1], c='b', linewidths=0, s=20, zorder=50)
# plot all points (both accepted and rejected): in white
ax.scatter(pts[:, 0], pts[:, 1], c='k', edgecolors='w', linewidths=1, zorder=150, s=30)
# re-plot rejected points: in red
ax.scatter(pts[idx_rej, 0], pts[idx_rej, 1], c='k', edgecolors='r', linewidths=1, zorder=200, s=30)
fig.savefig('img_tri.png', transparent=True, dpi=300)
