Speed up computation for Distance Transform on Image in Python - python

I would like to find the find the distance transform of a binary image in the fastest way possible without using the scipy package distance_trnsform_edt(). The image is 256 by 256. The reason I don't want to use scipy is because using it is difficult in tensorflow. Evry time I want to use this package I need to start a new session and this takes a lot of time. So I would like to make a custom function that only utilizes numpy.
My approach is as follows: Find the coordinated for all the ones and all the zeros in the image. Find the euclidian distance between each of the zero pixels (a) and the one pixels (b) and then the value at each (a) position is the minimum distance to a (b) pixel. I do this for each 0 pixel. The resultant image has the same dimensions as the original binary map. My attempt at doing this is shown below.
I tried to do this as fast as possible using no loops and only vectorization. But my function still can't work as fast as the scipy package can. When I timed the code it looks like the assignment to the variable "a" is taking the longest time. But I do not know if there is a way to speed this up.
If anyone has any other suggestions for different algorithms to solve this problem of distance transforms or can direct me to other implementations in python, it would be very appreciated.
def get_dst_transform_img(og): #og is a numpy array of original image
ones_loc = np.where(og == 1)
ones = np.asarray(ones_loc).T # coords of all ones in og
zeros_loc = np.where(og == 0)
zeros = np.asarray(zeros_loc).T # coords of all zeros in og
a = -2 * np.dot(zeros, ones.T)
b = np.sum(np.square(ones), axis=1)
c = np.sum(np.square(zeros), axis=1)[:,np.newaxis]
dists = a + b + c
dists = np.sqrt(dists.min(axis=1)) # min dist of each zero pixel to one pixel
x = og.shape[0]
y = og.shape[1]
dist_transform = np.zeros((x,y))
dist_transform[zeros[:,0], zeros[:,1]] = dists
plt.figure()
plt.imshow(dist_transform)

The implementation in the OP is a brute-force approach to the distance transform. This algorithm is O(n2), as it computes the distance from each background pixel to each foreground pixel. Furthermore, because of the way it is vectorized, it requires a lot of memory. On my computer it couldn't compute the distance transform of a 256x256 image without thrashing. Many other algorithms are described in the literature, below I'll discuss two O(n) algorithms.
Note: Typically, the distance transform is computed for object pixels (value 1) to the nearest background pixel (value 0). The code in the OP does the reverse, and so the code I've pasted below follows OP's convention, not the more common convention.
The easiest to implement, IMO, is the chamfer distance algorithm. This is a recursive algorithm that does two passes over the image: one left to right and top to bottom, and one right to left and bottom to top. In each pass, the distance computed for previous pixels is propagated. This algorithm can be implemented using integer distances or floating-point distances between neighbors. The latter yields smaller errors, of course. But in both cases the errors can be reduced significantly by increasing the number of neighbors queried in this propagation. The algorithm is older, but G. Borgefors analyzed it and proposed suitable neighbor distances (G. Borgefors, Distance Transformations in Digital Images, Computer Vision, Graphics, and Image Processing 34:344-371, 1986).
Here is an implementation using 3-4 distance (distance to edge-connected neighbors is 3, distance to vertex-connected neighbors is 4):
def chamfer_distance(img):
w, h = img.shape
dt = np.zeros((w,h), np.uint32)
# Forward pass
x = 0
y = 0
if img[x,y] == 0:
dt[x,y] = 65535 # some large value
for x in range(1, w):
if img[x,y] == 0:
dt[x,y] = 3 + dt[x-1,y]
for y in range(1, h):
x = 0
if img[x,y] == 0:
dt[x,y] = min(3 + dt[x,y-1], 4 + dt[x+1,y-1])
for x in range(1, w-1):
if img[x,y] == 0:
dt[x,y] = min(4 + dt[x-1,y-1], 3 + dt[x,y-1], 4 + dt[x+1,y-1], 3 + dt[x-1,y])
x = w-1
if img[x,y] == 0:
dt[x,y] = min(4 + dt[x-1,y-1], 3 + dt[x,y-1], 3 + dt[x-1,y])
# Backward pass
for x in range(w-2, -1, -1):
y = h-1
if img[x,y] == 0:
dt[x,y] = min(dt[x,y], 3 + dt[x+1,y])
for y in range(h-2, -1, -1):
x = w-1
if img[x,y] == 0:
dt[x,y] = min(dt[x,y], 3 + dt[x,y+1], 4 + dt[x-1,y+1])
for x in range(1, w-1):
if img[x,y] == 0:
dt[x,y] = min(dt[x,y], 4 + dt[x+1,y+1], 3 + dt[x,y+1], 4 + dt[x-1,y+1], 3 + dt[x+1,y])
x = 0
if img[x,y] == 0:
dt[x,y] = min(dt[x,y], 4 + dt[x+1,y+1], 3 + dt[x,y+1], 3 + dt[x+1,y])
return dt
Note that a lot of the complication here is to avoid indexing out of bounds, but still computing distances all the way to the edges of the image. If we simply skip the pixels around the border of the image, the code becomes much simpler.
Because it is a recursive algorithm, it is not possible to vectorize its implementation. The Python code will not be very efficient. But programmed in C or the like will yield a very fast algorithm that yields a fairly good approximation to the Euclidean distance.
OpenCV's cv.distanceTransform implements this algorithm.
Another very efficient algorithm computes the square of the distance transform. The square distance is separable (i.e. can be computed independently for each axis and added). This leads to an algorithm that is easy to parallelize. For each image row, the algorithm does a forward and a backward pass. For each column in the result, the algorithm then does another forward and backward pass. This process leads to an exact Euclidean distance transform.
This algorithm was first proposed by R. van den Boomgaard in his Ph.D. thesis in 1992. Unfortunately this went unnoticed. The algorithm was then again proposed by A. Meijster, J.B.T.M. Roerdink and W.H. Hesselink (A General Algorithm for Computing Distance Transforms in Linear Time, Mathematical Morphology and its Applications to Image and Signal Processing, pp 331-340, 2002), and again by P. Felzenszwalb and D. Huttenlocher (Distance transforms of sampled functions, Technical report, Cornell University, 2004).
This is the most efficient algorithm known, in part because it is the only one that can be easily and efficiently parallelized (computation on each image row, and later on each image column, is independent of other rows/columns).
Unfortunately I don't have any Python code for this one to share, but you can find implementations online. For example OpenCV's cv.distanceTransform implements this algorithm, and DIPlib's dip.EuclideanDistanceTransform does too.

Related

Is there a faster way to perform this neighbour finding operation

I'm trying to calculate Moran's I in Python (This is the underlying equation). My inputs are a coords Nx3 array containing the coordinates of each point and a Nx3 array z which contains the values minus the overall mean. The operation requires each value of z to be multiplied with every point within a set distance (here set to 1.99). My problem is that in my case N=~2 Million and so the find_neighbours operation is very slow. Is there a way I could speed this up?
def find_neighbours(coords,idx,k):
distances = np.sqrt(np.power(coords - coords[idx], 2).sum(axis=1))
distances[idx] = np.inf
return np.argwhere(distances<=k)
z = x - np.mean(x)
n = len(coords)
A = 0
B = np.sum([z[idx]**2 for idx,coord in enumerate(coords)])
S_0 = 0
for idx in range(len(coords)):
neighbours = find_neighbours(coords,idx,1.99)
S_0 += len(neighbours)
A += np.sum([(z[neighbour]*z[idx]) for neighbour in neighbours])
I = (n/S_0)*(A/B)
This is a classical problem with plenty of literature about. It's called Radius Neighbor Search in Three-dimensional Point Clouds . You need to store your points in a better data structure to do the search faster. I would suggest an octree.
Check python code here and adapt to your case.
For explanations, check this paper.

Finding a vector that is approximately equally distant from all vectors in a set

I have a set of 3 million vectors (300 dimensions each), and I'm looking for a new point in this 300 dim space that is approximately equally distant from all the other points(vectors)
What I could do is initialize a random vector v, and run an optimization over v with the objective:
Where d_xy is the distance between vector x and vector y, but this would be very computationally expensive.
I'm looking for an approximate solution vector for this problem that can be found quickly over very large sets of vectors. (Or any libraries that will do something like this for me- any language)
I agree that in general this is a pretty tough optimization problem, especially at the scale you're describing. Each objective function evaluation requires O(nm + n^2) work for n points of dimension m -- O(nm) to compute distances from each point to the new point and O(n^2) to compute the objective given the distances. This is pretty scary when m=300 and n=3M. Thus even one function evaluation is probably intractable, not to mention solving the full optimization problem.
One approach that has been mentioned in the other answer is to take the centroid of the points, which can be computed efficiently -- O(nm). A downside of this approach is that it could do terribly at the proposed objective. For instance, consider a situation in 1-dimensional space with 3 million points with value 1 and 1 point with value 0. By inspection, the optimal solution is v=0.5 with objective value 0 (it's equidistant from every point), but the centroid will select v=1 (well, a tiny bit smaller than that) with objective value 3 million.
An approach that I think will do better than the centroid is to optimize each dimension separately (ignoring the existence of the other dimensions). While the objective function is still expensive to compute in this case, a bit of algebra shows that the derivative of the objective is quite easy to compute. It is the sum over all pairs (i, j) where i < v and j > v of the value 4*((v-i)+(v-j)). Remember we're optimizing a single dimension so the points i and j are 1-dimensional, as is v. For each dimension we therefore can sort the data (O(n lg n)) and then compute the derivative for a value v in O(n) time using a binary search and basic algebra. We can then use scipy.optimize.newton to find the zero of the derivative, which will be the optimal value for that dimension. Iterating over all dimensions, we'll have an approximate solution to our problem.
First consider the proposed approach versus the centroid method in a simple setting, with 1-dimensional data points {0, 3, 3}:
import bisect
import scipy.optimize
def fulldist(x, data):
dists = [sum([(x[i]-d[i])*(x[i]-d[i]) for i in range(len(x))])**0.5 for d in data]
obj = 0.0
for i in range(len(data)-1):
for j in range(i+1, len(data)):
obj += (dists[i]-dists[j]) * (dists[i]-dists[j])
return obj
def f1p(x, d):
lownum = bisect.bisect_left(d, x)
highnum = len(d) - lownum
lowsum = highnum * (x*lownum - sum([d[i] for i in range(lownum)]))
highsum = lownum * (x*highnum - sum([d[i] for i in range(lownum, len(d))]))
return 4.0 * (lowsum + highsum)
data = [(0.0,), (3.0,), (3.0,)]
opt = []
centroid = []
for d in range(len(data[0])):
thisdim = [x[d] for x in data]
meanval = sum(thisdim) / len(thisdim)
centroid.append(meanval)
thisdim.sort()
opt.append(scipy.optimize.newton(f1p, meanval, args=(thisdim,)))
print "Proposed", opt, "objective", fulldist(opt, data)
# Proposed [1.5] objective 0.0
print "Centroid", centroid, "objective", fulldist(centroid, data)
# Centroid [2.0] objective 2.0
The proposed approach finds the exact optimal solution, while the centroid method misses by a bit.
Consider a slightly larger example with 1000 points of dimension 300, with each point drawn from a gaussian mixture. Each point's value is normally distributed with mean 0 and variance 1 with probability 0.1 and normally distributed with mean 100 and variance 1 with probability 0.9:
data = []
for n in range(1000):
d = []
for m in range(300):
if random.random() <= 0.1:
d.append(random.normalvariate(0.0, 1.0))
else:
d.append(random.normalvariate(100.0, 1.0))
data.append(d)
The resulting objective values were 1.1e6 for the proposed approach and 1.6e9 for the centroid approach, meaning the proposed approach decreased the objective by more than 99.9%. Obviously the differences in the objective value are heavily affected by the distribution of the points.
Finally, to test the scaling (removing the final objective value calculations, since they're in general intractable), I get the following scaling with m=300: 0.9 seconds for 1,000 points, 7.1 seconds for 10,000 points, and 122.3 seconds for 100,000 points. Therefore I expect this should take about 1-2 hours for your full dataset with 3 million points.
From this question on the Math StackExchange:
There is no point that is equidistant from 4 or more points in general
position in the plane, or n+2 points in n dimensions.
Criteria for representing a collection of points by one point are
considered in statistics, machine learning, and computer science. The
centroid is the optimal choice in the least-squares sense, but there
are many other possibilities.
The centroid is the point C in the the plane for which the sum of
squared distances $\sum |CP_i|^2$ is minimum. One could also optimize
a different measure of centrality, or insist that the representative
be one of the points (such as a graph-theoretic center of a weighted
spanning tree), or assign weights to the points in some fashion and
take the centroid of those.
Note, specifically, "the centroid is the optimal choice in the least-squares sense", so the optimal solution to your cost function (which is a least-squares cost) is simply to average all the coordinates of your points (which will give you the centroid).

Generate random points on a surface of the cylinder

I want to generate random points on the surface of cylinder such that distance between the points fall in a range of 230 and 250. I used the following code to generate random points on surface of cylinder:
import random,math
H=300
R=20
s=random.random()
#theta = random.random()*2*math.pi
for i in range(0,300):
theta = random.random()*2*math.pi
z = random.random()*H
r=math.sqrt(s)*R
x=r*math.cos(theta)
y=r*math.sin(theta)
z=z
print 'C' , x,y,z
How can I generate random points such that they fall with in the range(on the surfaceof cylinder)?
This is not a complete solution, but an insight that should help. If you "unroll" the surface of the cylinder into a rectangle of width w=2*pi*r and height h, the task of finding distance between points is simplified. You have not explained how to measure "distance along the surface" between points on the top of the cylinder and the side- this is a slightly tricky bit of geometry.
As for computing the distance along the surface when we created an artificial "seam", just use both (x1-x2) and (w -x1+x2) - whichever gives the shorter distance is the one you want.
I do think that #VincentNivoliers' suggestion to use Poisson disk sampling is very good, but with the constraints of h=300 and r=20 you will get terrible results no matter what.
The basic way of creating a set of random points with constraints in the positions between them, is to have a function that modulates the probability of points being placed at a certain location. this function starts out being a constant, and whenever a point is placed, forbidden areas surrounding the point are set to zero. That is difficult to do with continuous variables, but reasonably easy if you discretize your problem.
The other thing to be careful about is the being on a cylinder part. It may be easier to think of it as random points on a rectangular area that repeats periodically. This can be handled in two different ways:
the simplest is to take into consideration not only the rectangular tile where you are placing the points, but also its neighbouring ones. Whenever you place a point in your main tile, you also place one in the neighboring ones and compute their effect on the probability function inside your tile.
A more sophisticated approach considers the probability function then convolution of a kernel that encodes forbidden areas, with a sum of delta functions, corresponding to the points already placed. If this is computed using FFTs, the periodicity is anatural by product.
The first approach can be coded as follows:
from __future__ import division
import numpy as np
r, h = 20, 300
w = 2*np.pi*r
int_w = int(np.rint(w))
mult = 10
pdf = np.ones((h*mult, int_w*mult), np.bool)
points = []
min_d, max_d = 230, 250
available_locs = pdf.sum()
while available_locs:
new_idx = np.random.randint(available_locs)
new_idx = np.nonzero(pdf.ravel())[0][new_idx]
new_point = np.array(np.unravel_index(new_idx, pdf.shape))
points += [new_point]
min_mask = np.ones_like(pdf)
if max_d is not None:
max_mask = np.zeros_like(pdf)
else:
max_mask = True
for p in [new_point - [0, int_w*mult], new_point +[0, int_w*mult],
new_point]:
rows = ((np.arange(pdf.shape[0]) - p[0]) / mult)**2
cols = ((np.arange(pdf.shape[1]) - p[1]) * 2*np.pi*r/int_w/mult)**2
dist2 = rows[:, None] + cols[None, :]
min_mask &= dist2 > min_d*min_d
if max_d is not None:
max_mask |= dist2 < max_d*max_d
pdf &= min_mask & max_mask
available_locs = pdf.sum()
points = np.array(points) / [mult, mult*int_w/(2*np.pi*r)]
If you run it with your values, the output is usually just one or two points, as the large minimum distance forbids all others. but if you run it with more reasonable values, e.g.
min_d, max_d = 50, 200
Here's how the probability function looks after placing each of the first 5 points:
Note that the points are returned as pairs of coordinates, the first being the height, the second the distance along the cylinder's circumference.

Efficient processing of pixel + neighborhood in numpy image

I have a range image of a scene. I traverse the image and calculate the average change in depth under the detection window. The detection windows changes size based on the average depth of the surrounding pixels of the current location. I accumulate the average change to produce a simple response image.
Most of the time is spent in the for loop, it is taking about 40+s for a 512x52 image on my machine. I was hoping for some speed up. Is there a more efficient/faster way to traverse the image? Is there a better pythonic/numpy/scipy way to visit each pixel? Or shall I go learn cython?
EDIT: I have reduced running time to about 18s by using scipy.misc.imread() instead of skimage.io.imread(). Not sure what the difference is, I will try to investigate.
Here is a simplified version of the code:
import matplotlib.pylab as plt
import numpy as np
from skimage.io import imread
from skimage.transform import integral_image, integrate
import time
def intersect(a, b):
'''Determine the intersection of two rectangles'''
rect = (0,0,0,0)
r0 = max(a[0],b[0])
c0 = max(a[1],b[1])
r1 = min(a[2],b[2])
c1 = min(a[3],b[3])
# Do we have a valid intersection?
if r1 > r0 and c1 > c0:
rect = (r0,c0,r1,c1)
return rect
# Setup data
depth_src = imread("test.jpg", as_grey=True)
depth_intg = integral_image(depth_src) # integrate to find sum depth in region
depth_pts = integral_image(depth_src > 0) # integrate to find num points which have depth
boundary = (0,0,depth_src.shape[0]-1,depth_src.shape[1]-1) # rectangle to intersect with
# Image to accumulate response
out_img = np.zeros(depth_src.shape)
# Average dimensions of bbox/detection window per unit length of depth
model = (0.602,2.044) # width, height
start_time = time.time()
for (r,c), junk in np.ndenumerate(depth_src):
# Find points around current pixel
r0, c0, r1, c1 = intersect((r-1, c-1, r+1, c+1), boundary)
# Calculate average of depth of points around current pixel
scale = integrate(depth_intg, r0, c0, r1, c1) * 255 / 9.0
# Based on average depth, create the detection window
r0 = r - (model[0] * scale/2)
c0 = c - (model[1] * scale/2)
r1 = r + (model[0] * scale/2)
c1 = c + (model[1] * scale/2)
# Used scale optimised detection window to extract features
r0, c0, r1, c1 = intersect((r0,c0,r1,c1), boundary)
depth_count = integrate(depth_pts,r0,c0,r1,c1)
if depth_count:
depth_sum = integrate(depth_intg,r0,c0,r1,c1)
avg_change = depth_sum / depth_count
# Accumulate response
out_img[r0:r1,c0:c1] += avg_change
print time.time() - start_time, " seconds"
plt.imshow(out_img)
plt.gray()
plt.show()
Michael, interesting question. It seems that the main performance problem you have is that each pixel in the image has two integrate() functions computed on it, one of size 3x3 and the other of a size which is not known in advance. Calculating individual integrals in this way is extremely inefficient, regardless of what numpy functions you use; it's an algorithmic issue, not an implementation issue. Consider an image of size NN. You can calculate all integrals of any size KK in that image using only approximately 4*NN operations, not (as one might naively expect) NNKK. The way you do that is first calculate an image of sliding sums over a window K in each row, and then sliding sums over the result in each column. Updating each sliding sum to move to the next pixel requires only adding the newest pixel in the current window and subtracting the oldest pixel in the previous window, thus two operations per pixel regardless of window size. We do have to do that twice (for rows and columns), therefore 4 operations per pixel.
I am not sure if there is a sliding window sum built into numpy, but this answer suggests a couple of ways to do it, using stride tricks: https://stackoverflow.com/a/12713297/1828289. You can certainly accomplish the same with one loop over columns and one loop over rows (taking slices to extract a row/column).
Example:
# img is a 2D ndarray
# K is the size of sums to calculate using sliding window
row_sums = numpy.zeros_like(img)
for i in range( img.shape[0] ):
if i > K:
row_sums[i,:] = row_sums[i-1,:] - img[i-K-1,:] + img[i,:]
elif i > 1:
row_sums[i,:] = row_sums[i-1,:] + img[i,:]
else: # i == 0
row_sums[i,:] = img[i,:]
col_sums = numpy.zeros_like(img)
for j in range( img.shape[1] ):
if j > K:
col_sums[:,j] = col_sums[:,j-1] - row_sums[:,j-K-1] + row_sums[:,j]
elif j > 1:
col_sums[:,j] = col_sums[:,j-1] + row_sums[:,j]
else: # j == 0
col_sums[:,j] = row_sums[:,j]
# here col_sums[i,j] should be equal to numpy.sum(img[i-K:i, j-K:j]) if i >=K and j >= K
# first K rows and columns in col_sums contain partial sums and can be ignored
How do you best apply that to your case? I think you might want to pre-compute the integrals for 3x3 (average depth) and also for several larger sizes, and use the value of the 3x3 to select one of the larger sizes for the detection window (assuming I understand the intent of your algorithm). The range of larger sizes you need might be limited, or artificially limiting it might still work acceptably well, just pick the nearest size. Calculating all integrals together using sliding sums is so much more efficient that I am almost certain it is worth calculating them for a lot of sizes you would never use at a particular pixel, especially if some of the sizes are large.
P.S. This is a minor addition, but you may want to avoid calling intersect() for every pixel: either (a) only process pixels which are farther from the edge than the max integral size, or (b) add margins to the image of the max integral size on all sides, filling the margins with either zeros or nans, or (c) (best approach) use slices to take care of this automatically: a slice index outside the boundary of an ndarray is automatically limited to the boundary, except of course negative indexes are wrapped around.
EDIT: added example of sliding window sums

Representing and solving a maze given an image

What is the best way to represent and solve a maze given an image?
Given an JPEG image (as seen above), what's the best way to read it in, parse it into some data structure and solve the maze? My first instinct is to read the image in pixel by pixel and store it in a list (array) of boolean values: True for a white pixel, and False for a non-white pixel (the colours can be discarded). The issue with this method, is that the image may not be "pixel perfect". By that I simply mean that if there is a white pixel somewhere on a wall it may create an unintended path.
Another method (which came to me after a bit of thought) is to convert the image to an SVG file - which is a list of paths drawn on a canvas. This way, the paths could be read into the same sort of list (boolean values) where True indicates a path or wall, False indicating a travel-able space. An issue with this method arises if the conversion is not 100% accurate, and does not fully connect all of the walls, creating gaps.
Also an issue with converting to SVG is that the lines are not "perfectly" straight. This results in the paths being cubic bezier curves. With a list (array) of boolean values indexed by integers, the curves would not transfer easily, and all the points that line on the curve would have to be calculated, but won't exactly match to list indices.
I assume that while one of these methods may work (though probably not) that they are woefully inefficient given such a large image, and that there exists a better way. How is this best (most efficiently and/or with the least complexity) done? Is there even a best way?
Then comes the solving of the maze. If I use either of the first two methods, I will essentially end up with a matrix. According to this answer, a good way to represent a maze is using a tree, and a good way to solve it is using the A* algorithm. How would one create a tree from the image? Any ideas?
TL;DR
Best way to parse? Into what data structure? How would said structure help/hinder solving?
UPDATE
I've tried my hand at implementing what #Mikhail has written in Python, using numpy, as #Thomas recommended. I feel that the algorithm is correct, but it's not working as hoped. (Code below.) The PNG library is PyPNG.
import png, numpy, Queue, operator, itertools
def is_white(coord, image):
""" Returns whether (x, y) is approx. a white pixel."""
a = True
for i in xrange(3):
if not a: break
a = image[coord[1]][coord[0] * 3 + i] > 240
return a
def bfs(s, e, i, visited):
""" Perform a breadth-first search. """
frontier = Queue.Queue()
while s != e:
for d in [(-1, 0), (0, -1), (1, 0), (0, 1)]:
np = tuple(map(operator.add, s, d))
if is_white(np, i) and np not in visited:
frontier.put(np)
visited.append(s)
s = frontier.get()
return visited
def main():
r = png.Reader(filename = "thescope-134.png")
rows, cols, pixels, meta = r.asDirect()
assert meta['planes'] == 3 # ensure the file is RGB
image2d = numpy.vstack(itertools.imap(numpy.uint8, pixels))
start, end = (402, 985), (398, 27)
print bfs(start, end, image2d, [])
Here is a solution.
Convert image to grayscale (not yet binary), adjusting weights for the colors so that final grayscale image is approximately uniform. You can do it simply by controlling sliders in Photoshop in Image -> Adjustments -> Black & White.
Convert image to binary by setting appropriate threshold in Photoshop in Image -> Adjustments -> Threshold.
Make sure threshold is selected right. Use the Magic Wand Tool with 0 tolerance, point sample, contiguous, no anti-aliasing. Check that edges at which selection breaks are not false edges introduced by wrong threshold. In fact, all interior points of this maze are accessible from the start.
Add artificial borders on the maze to make sure virtual traveler will not walk around it :)
Implement breadth-first search (BFS) in your favorite language and run it from the start. I prefer MATLAB for this task. As #Thomas already mentioned, there is no need to mess with regular representation of graphs. You can work with binarized image directly.
Here is the MATLAB code for BFS:
function path = solve_maze(img_file)
%% Init data
img = imread(img_file);
img = rgb2gray(img);
maze = img > 0;
start = [985 398];
finish = [26 399];
%% Init BFS
n = numel(maze);
Q = zeros(n, 2);
M = zeros([size(maze) 2]);
front = 0;
back = 1;
function push(p, d)
q = p + d;
if maze(q(1), q(2)) && M(q(1), q(2), 1) == 0
front = front + 1;
Q(front, :) = q;
M(q(1), q(2), :) = reshape(p, [1 1 2]);
end
end
push(start, [0 0]);
d = [0 1; 0 -1; 1 0; -1 0];
%% Run BFS
while back <= front
p = Q(back, :);
back = back + 1;
for i = 1:4
push(p, d(i, :));
end
end
%% Extracting path
path = finish;
while true
q = path(end, :);
p = reshape(M(q(1), q(2), :), 1, 2);
path(end + 1, :) = p;
if isequal(p, start)
break;
end
end
end
It is really very simple and standard, there should not be difficulties on implementing this in Python or whatever.
And here is the answer:
This solution is written in Python. Thanks Mikhail for the pointers on the image preparation.
An animated Breadth-First Search:
The Completed Maze:
#!/usr/bin/env python
import sys
from Queue import Queue
from PIL import Image
start = (400,984)
end = (398,25)
def iswhite(value):
if value == (255,255,255):
return True
def getadjacent(n):
x,y = n
return [(x-1,y),(x,y-1),(x+1,y),(x,y+1)]
def BFS(start, end, pixels):
queue = Queue()
queue.put([start]) # Wrapping the start tuple in a list
while not queue.empty():
path = queue.get()
pixel = path[-1]
if pixel == end:
return path
for adjacent in getadjacent(pixel):
x,y = adjacent
if iswhite(pixels[x,y]):
pixels[x,y] = (127,127,127) # see note
new_path = list(path)
new_path.append(adjacent)
queue.put(new_path)
print "Queue has been exhausted. No answer was found."
if __name__ == '__main__':
# invoke: python mazesolver.py <mazefile> <outputfile>[.jpg|.png|etc.]
base_img = Image.open(sys.argv[1])
base_pixels = base_img.load()
path = BFS(start, end, base_pixels)
path_img = Image.open(sys.argv[1])
path_pixels = path_img.load()
for position in path:
x,y = position
path_pixels[x,y] = (255,0,0) # red
path_img.save(sys.argv[2])
Note: Marks a white visited pixel grey. This removes the need for a visited list, but this requires a second load of the image file from disk before drawing a path (if you don't want a composite image of the final path and ALL paths taken).
A blank version of the maze I used.
I tried myself implementing A-Star search for this problem. Followed closely the implementation by Joseph Kern for the framework and the algorithm pseudocode given here:
def AStar(start, goal, neighbor_nodes, distance, cost_estimate):
def reconstruct_path(came_from, current_node):
path = []
while current_node is not None:
path.append(current_node)
current_node = came_from[current_node]
return list(reversed(path))
g_score = {start: 0}
f_score = {start: g_score[start] + cost_estimate(start, goal)}
openset = {start}
closedset = set()
came_from = {start: None}
while openset:
current = min(openset, key=lambda x: f_score[x])
if current == goal:
return reconstruct_path(came_from, goal)
openset.remove(current)
closedset.add(current)
for neighbor in neighbor_nodes(current):
if neighbor in closedset:
continue
if neighbor not in openset:
openset.add(neighbor)
tentative_g_score = g_score[current] + distance(current, neighbor)
if tentative_g_score >= g_score.get(neighbor, float('inf')):
continue
came_from[neighbor] = current
g_score[neighbor] = tentative_g_score
f_score[neighbor] = tentative_g_score + cost_estimate(neighbor, goal)
return []
As A-Star is a heuristic search algorithm you need to come up with a function that estimates the remaining cost (here: distance) until the goal is reached. Unless you're comfortable with a suboptimal solution it should not overestimate the cost. A conservative choice would here be the manhattan (or taxicab) distance as this represents the straight-line distance between two points on the grid for the used Von Neumann neighborhood. (Which, in this case, wouldn't ever overestimate the cost.)
This would however significantly underestimate the actual cost for the given maze at hand. Therefore I've added two other distance metrics squared euclidean distance and the manhattan distance multiplied by four for comparison. These however might overestimate the actual cost, and might therefore yield suboptimal results.
Here's the code:
import sys
from PIL import Image
def is_blocked(p):
x,y = p
pixel = path_pixels[x,y]
if any(c < 225 for c in pixel):
return True
def von_neumann_neighbors(p):
x, y = p
neighbors = [(x-1, y), (x, y-1), (x+1, y), (x, y+1)]
return [p for p in neighbors if not is_blocked(p)]
def manhattan(p1, p2):
return abs(p1[0]-p2[0]) + abs(p1[1]-p2[1])
def squared_euclidean(p1, p2):
return (p1[0]-p2[0])**2 + (p1[1]-p2[1])**2
start = (400, 984)
goal = (398, 25)
# invoke: python mazesolver.py <mazefile> <outputfile>[.jpg|.png|etc.]
path_img = Image.open(sys.argv[1])
path_pixels = path_img.load()
distance = manhattan
heuristic = manhattan
path = AStar(start, goal, von_neumann_neighbors, distance, heuristic)
for position in path:
x,y = position
path_pixels[x,y] = (255,0,0) # red
path_img.save(sys.argv[2])
Here are some images for a visualization of the results (inspired by the one posted by Joseph Kern). The animations show a new frame each after 10000 iterations of the main while-loop.
Breadth-First Search:
A-Star Manhattan Distance:
A-Star Squared Euclidean Distance:
A-Star Manhattan Distance multiplied by four:
The results show that the explored regions of the maze differ considerably for the heuristics being used. As such, squared euclidean distance even produces a different (suboptimal) path as the other metrics.
Concerning the performance of the A-Star algorithm in terms of the runtime until termination, note that a lot of evaluation of distance and cost functions add up compared to the Breadth-First Search (BFS) which only needs to evaluate the "goaliness" of each candidate position. Whether or not the cost for these additional function evaluations (A-Star) outweighs the cost for the larger number of nodes to check (BFS) and especially whether or not performance is an issue for your application at all, is a matter of individual perception and can of course not be generally answered.
A thing that can be said in general about whether or not an informed search algorithm (such as A-Star) could be the better choice compared to an exhaustive search (e.g., BFS) is the following. With the number of dimensions of the maze, i.e., the branching factor of the search tree, the disadvantage of an exhaustive search (to search exhaustively) grows exponentially. With growing complexity it becomes less and less feasible to do so and at some point you are pretty much happy with any result path, be it (approximately) optimal or not.
Tree search is too much. The maze is inherently separable along the solution path(s).
(Thanks to rainman002 from Reddit for pointing this out to me.)
Because of this, you can quickly use connected components to identify the connected sections of maze wall. This iterates over the pixels twice.
If you want to turn that into a nice diagram of the solution path(s), you can then use binary operations with structuring elements to fill in the "dead end" pathways for each connected region.
Demo code for MATLAB follows. It could use tweaking to clean up the result better, make it more generalizable, and make it run faster. (Sometime when it's not 2:30 AM.)
% read in and invert the image
im = 255 - imread('maze.jpg');
% sharpen it to address small fuzzy channels
% threshold to binary 15%
% run connected components
result = bwlabel(im2bw(imfilter(im,fspecial('unsharp')),0.15));
% purge small components (e.g. letters)
for i = 1:max(reshape(result,1,1002*800))
[count,~] = size(find(result==i));
if count < 500
result(result==i) = 0;
end
end
% close dead-end channels
closed = zeros(1002,800);
for i = 1:max(reshape(result,1,1002*800))
k = zeros(1002,800);
k(result==i) = 1; k = imclose(k,strel('square',8));
closed(k==1) = i;
end
% do output
out = 255 - im;
for x = 1:1002
for y = 1:800
if closed(x,y) == 0
out(x,y,:) = 0;
end
end
end
imshow(out);
Here you go: maze-solver-python (GitHub)
I had fun playing around with this and extended on Joseph Kern's answer. Not to detract from it; I just made some minor additions for anyone else who may be interested in playing around with this.
It's a python-based solver which uses BFS to find the shortest path. My main additions, at the time, are:
The image is cleaned before the search (ie. convert to pure black & white)
Automatically generate a GIF.
Automatically generate an AVI.
As it stands, the start/end-points are hard-coded for this sample maze, but I plan on extending it such that you can pick the appropriate pixels.
Uses a queue for a threshold continuous fill. Pushes the pixel left of the entrance onto the queue and then starts the loop. If a queued pixel is dark enough, it's colored light gray (above threshold), and all the neighbors are pushed onto the queue.
from PIL import Image
img = Image.open("/tmp/in.jpg")
(w,h) = img.size
scan = [(394,23)]
while(len(scan) > 0):
(i,j) = scan.pop()
(r,g,b) = img.getpixel((i,j))
if(r*g*b < 9000000):
img.putpixel((i,j),(210,210,210))
for x in [i-1,i,i+1]:
for y in [j-1,j,j+1]:
scan.append((x,y))
img.save("/tmp/out.png")
Solution is the corridor between gray wall and colored wall. Note this maze has multiple solutions. Also, this merely appears to work.
I'd go for the matrix-of-bools option. If you find that standard Python lists are too inefficient for this, you could use a numpy.bool array instead. Storage for a 1000x1000 pixel maze is then just 1 MB.
Don't bother with creating any tree or graph data structures. That's just a way of thinking about it, but not necessarily a good way to represent it in memory; a boolean matrix is both easier to code and more efficient.
Then use the A* algorithm to solve it. For the distance heuristic, use the Manhattan distance (distance_x + distance_y).
Represent nodes by a tuple of (row, column) coordinates. Whenever the algorithm (Wikipedia pseudocode) calls for "neighbours", it's a simple matter of looping over the four possible neighbours (mind the edges of the image!).
If you find that it's still too slow, you could try downscaling the image before you load it. Be careful not to lose any narrow paths in the process.
Maybe it's possible to do a 1:2 downscaling in Python as well, checking that you don't actually lose any possible paths. An interesting option, but it needs a bit more thought.
Here are some ideas.
(1. Image Processing:)
1.1 Load the image as RGB pixel map. In C# it is trivial using system.drawing.bitmap. In languages with no simple support for imaging, just convert the image to portable pixmap format (PPM) (a Unix text representation, produces large files) or some simple binary file format you can easily read, such as BMP or TGA. ImageMagick in Unix or IrfanView in Windows.
1.2 You may, as mentioned earlier, simplify the data by taking the (R+G+B)/3 for each pixel as an indicator of gray tone and then threshold the value to produce a black and white table. Something close to 200 assuming 0=black and 255=white will take out the JPEG artifacts.
(2. Solutions:)
2.1 Depth-First Search: Init an empty stack with starting location, collect available follow-up moves, pick one at random and push onto the stack, proceed until end is reached or a deadend. On deadend backtrack by popping the stack, you need to keep track of which positions were visited on the map so when you collect available moves you never take the same path twice. Very interesting to animate.
2.2 Breadth-First Search: Mentioned before, similar as above but only using queues. Also interesting to animate. This works like flood-fill in image editing software. I think you may be able to solve a maze in Photoshop using this trick.
2.3 Wall Follower: Geometrically speaking, a maze is a folded/convoluted tube. If you keep your hand on the wall you will eventually find the exit ;) This does not always work. There are certain assumption re: perfect mazes, etc., for instance, certain mazes contain islands. Do look it up; it is fascinating.
(3. Comments:)
This is the tricky one. It is easy to solve mazes if represented in some simple array formal with each element being a cell type with north, east, south and west walls and a visited flag field. However given that you are trying to do this given a hand drawn sketch it becomes messy. I honestly think that trying to rationalize the sketch will drive you nuts. This is akin to computer vision problems which are fairly involved. Perhaps going directly onto the image map may be easier yet more wasteful.
Here's a solution using R.
### download the image, read it into R, converting to something we can play with...
library(jpeg)
url <- "https://i.stack.imgur.com/TqKCM.jpg"
download.file(url, "./maze.jpg", mode = "wb")
jpg <- readJPEG("./maze.jpg")
### reshape array into data.frame
library(reshape2)
img3 <- melt(jpg, varnames = c("y","x","rgb"))
img3$rgb <- as.character(factor(img3$rgb, levels = c(1,2,3), labels=c("r","g","b")))
## split out rgb values into separate columns
img3 <- dcast(img3, x + y ~ rgb)
RGB to greyscale, see: https://stackoverflow.com/a/27491947/2371031
# convert rgb to greyscale (0, 1)
img3$v <- img3$r*.21 + img3$g*.72 + img3$b*.07
# v: values closer to 1 are white, closer to 0 are black
## strategically fill in some border pixels so the solver doesn't "go around":
img3$v2 <- img3$v
img3[(img3$x == 300 | img3$x == 500) & (img3$y %in% c(0:23,988:1002)),"v2"] = 0
# define some start/end point coordinates
pts_df <- data.frame(x = c(398, 399),
y = c(985, 26))
# set a reference value as the mean of the start and end point greyscale "v"s
ref_val <- mean(c(subset(img3, x==pts_df[1,1] & y==pts_df[1,2])$v,
subset(img3, x==pts_df[2,1] & y==pts_df[2,2])$v))
library(sp)
library(gdistance)
spdf3 <- SpatialPixelsDataFrame(points = img3[c("x","y")], data = img3["v2"])
r3 <- rasterFromXYZ(spdf3)
# transition layer defines a "conductance" function between any two points, and the number of connections (4 = Manhatten distances)
# x in the function represents the greyscale values ("v2") of two adjacent points (pixels), i.e., = (x1$v2, x2$v2)
# make function(x) encourages transitions between cells with small changes in greyscale compared to the reference values, such that:
# when v2 is closer to 0 (black) = poor conductance
# when v2 is closer to 1 (white) = good conductance
tl3 <- transition(r3, function(x) (1/max( abs( (x/ref_val)-1 ) )^2)-1, 4)
## get the shortest path between start, end points
sPath3 <- shortestPath(tl3, as.numeric(pts_df[1,]), as.numeric(pts_df[2,]), output = "SpatialLines")
## fortify for ggplot
sldf3 <- fortify(SpatialLinesDataFrame(sPath3, data = data.frame(ID = 1)))
# plot the image greyscale with start/end points (red) and shortest path (green)
ggplot(img3) +
geom_raster(aes(x, y, fill=v2)) +
scale_fill_continuous(high="white", low="black") +
scale_y_reverse() +
geom_point(data=pts_df, aes(x, y), color="red") +
geom_path(data=sldf3, aes(x=long, y=lat), color="green")
Voila!
This is what happens if you don't fill in some border pixels (Ha!)...
Full disclosure: I asked and answered a very similar question myself before I found this one. Then through the magic of SO, found this one as one of the top "Related Questions". I thought I'd use this maze as an additional test case... I was very pleased to find that my answer there also works for this application with very little modification.
the good solution would be that instead of finding the neighbors by pixel, it would be done by cell, because a corridor can have 15px so in the same corridor it can take actions like left or right, while if it was done as if the displacement was a cube it would be a simple action like UP,DOWN,LEFT OR RIGHT

Categories