Related
I am trying to triangulate a number of polygons such that the triangles do not add extra points. For the sake of keeping the question short I will be using 2 circles in each other, in reality these will be opencv contours, however the translation between the two is quite complex and the circles also show the problem.
So I have the following code (based on the example) in order to first get the circles and then triangulate them with the triangle project
import matplotlib.pyplot as plt
import numpy as np
import triangle as tr
def circle(N, R):
i = np.arange(N)
theta = i * 2 * np.pi / N
pts = np.stack([np.cos(theta), np.sin(theta)], axis=1) * R
seg = np.stack([i, i + 1], axis=1) % N
return pts, seg
pts0, seg0 = circle(30, 1.4)
pts1, seg1 = circle(16, 0.6)
pts = np.vstack([pts0, pts1])
seg = np.vstack([seg0, seg1 + seg0.shape[0]])
print(pts)
print(seg)
A = dict(vertices=pts, segments=seg, holes=[[0, 0]])
print(seg)
B = tr.triangulate(A) #note that the origin uses 'qpa0.05' here
tr.compare(plt, A, B)
plt.show()
Now this causes both the outer and inner circles to get triangulated like show here , clearly ignoring the hole. However by setting the 'qpa0.05' flag we can cause the circle to use the hole as seen here . However doing this causes the triangles to be split, adding many different triangles, increasing the qpa to a higher value does cause the number of triangles to be somewhat decreased, however they remain there.
Note that I want to be able to handle multiple holes in the same shape, and that shapes might end up being concave.
Anybody know how to get the triangulation to use the holes without adding extra triangles?
You can connect the hole (or holes) to the exterior perimeter so that you get a single "degenerate polygon" defined by a single point sequence that connects all points without self-intersection.
You go in and out through the same segment. If you follow the outer perimeter clockwise, you need to follow the hole perimeter counterclockwise or vice-versa. Otherwise, it would self-intersect.
I figured it out
the "qpa0.05" should have been a 'p', the p makes the code factor in holes and the a sets the maximum area for the triangles, this causes the extra points to be added.
B = tr.triangulate(A,'p')
I want to calculate the intersection over union IoU between two rectangles with axes not aligned, but with an angle of the axes smaller than 30 degrees. An approximate value is also seeked.
One possible solution is to check if the angle between the two rectangles is less than 30 degree and than rotate them parallel to aligne the axis. From here it is easy to calculate the IoU.
Another possibility is to use monte carlo methods for the intersection ( generate a point, find if the point is under some line of one rectangle and above some line of the other), but this seems expensive because I need to use this calculation a large number of times.
I was hopping that there is something better out there; maybe a geometry library, or maybe an algorithm from the computer vision folks.
I am trying to learn grasping positions using deep neural networks. My algorithem should predict a bounding box (rectangle) for an object in an rgb image. For any image I have also the ground truth (another rectangle) bounding box. From this two rectangles I need the IoU.
Any idea?
Since you're working in Python, I think the Shapely package would serve your needs.
There is quite effective algorithm for calculation of intersection between two convex polygons, described in O'Rourke book "Computational Geometry in C".
C code is available at the book page (convconv).
Algorithm traverse edges of the first polygon, checking orientations of the second polygon vertices in order to detect intersections. When two consequent vertices lie on the different sides of the edge, intersection occurs (there is a lot of trick cases). Algorithm outline is here
You can consider a number of numerical approaches, practically "rendering" the rectangles into some "canvas"/canvases, and traverse the pixels for making your statistics. The size of the canvas should be the size of the bounding box for the entire scene, practically you can find that via picking the minimum and maximum coordinates occurring for each axis.
1) "most CG" approach: really get a rendering library, render one rectangle with red, other rectangle with transparent blue. Then visit each pixel and if it has a non-0 red component, it belongs to the first rectangle, if it has a non-0 blue component, it belongs to the second rectangle. And if it has both, it belongs to the intersection too. This approach is cheap for coding, but requires both writing and reading the canvas even in the rendering phase, which is slower than just writing. This might be even done on GPU too, though I am not sure if setup costs and getting back the result do not weight out the benefit for such a simple scene.
2) another CG-approach would be rendering into 2 arrays, preferably some 1-byte-per-pixel variant, for the sake of speed (you may have to go back in time a bit in order to find such dedicated rendering libraries). This way the renderer only writes, into one array per rectangle, and you read from two when creating the statistics
3) as writing and reading pixels take time, you can do some shortcut, but it needs more coding: convex shapes can be rendered via collecting the minimum and maximum coordinates per scanline, and just filling between the two. If you do it yourself, you can spare the filling part and also the read-and-check-every-pixel step at the end. Build such min-max list for both rectangles, and then you "just" have to check their relation/order for each scanline, to recognize overlaps
And then there is the mathematical way: this is not really useful, see EDIT below while it is unlikely that you would find some sane algorithm for calculating intersection area, specifically for the case of rectangles, if you find such algorithm for triangles, which is more probable, that would be enough. Both rectangles can be split into two triangles, 1A+1B and 2A+2B respectively, and then you just have to run such algorithm 4 times: 1A-2A, 1A-2B, 1B-2A, 1B-2B, sum the results and that is the area of your intersection.
EDIT: for the maths approach (though this also comes from graphics), I think https://en.wikipedia.org/wiki/Sutherland%E2%80%93Hodgman_algorithm can be applied here (as both rectangles are convex polygons, A-B and B-A should produce the same result) for finding the intersection polygon, and then the remaining task is to calculate the area of that polygon (here and now I think it is going to be convex, and then it is really easy).
I ended up using Sutherland-Hodgman algorithm implemented as this functions:
def clip(subjectPolygon, clipPolygon):
def inside(p):
return(cp2[0]-cp1[0])*(p[1]-cp1[1]) > (cp2[1]-cp1[1])*(p[0]-cp1[0])
def computeIntersection():
dc = [ cp1[0] - cp2[0], cp1[1] - cp2[1] ]
dp = [ s[0] - e[0], s[1] - e[1] ]
n1 = cp1[0] * cp2[1] - cp1[1] * cp2[0]
n2 = s[0] * e[1] - s[1] * e[0]
n3 = 1.0 / (dc[0] * dp[1] - dc[1] * dp[0])
return [(n1*dp[0] - n2*dc[0]) * n3, (n1*dp[1] - n2*dc[1]) * n3]
outputList = subjectPolygon
cp1 = clipPolygon[-1]
for clipVertex in clipPolygon:
cp2 = clipVertex
inputList = outputList
outputList = []
s = inputList[-1]
for subjectVertex in inputList:
e = subjectVertex
if inside(e):
if not inside(s):
outputList.append(computeIntersection())
outputList.append(e)
elif inside(s):
outputList.append(computeIntersection())
s = e
cp1 = cp2
return(outputList)
def PolygonArea(corners):
n = len(corners) # of corners
area = 0.0
for i in range(n):
j = (i + 1) % n
area += corners[i][0] * corners[j][1]
area -= corners[j][0] * corners[i][1]
area = abs(area) / 2.0
return area
intersection = clip(rec1, rec2)
intersection_area = PolygonArea(intersection)
iou = intersection_area/(PolygonArea(rec1)+PolygonArea(rec2)-intersection_area)
Another slower method (don't know what algorithm) could be:
from shapely.geometry import Polygon
p1 = Polygon(rec1)
p2 = Polygon(rec2)
inter_sec_area = p1.intersection(rec2).area
iou = inter_sec_area/(p1.area + p2.area - inter_sec_area)
It is worth mentioning that in just one case with bigger polygons (not my case) the shapely module had an area twice greater than the first method. I didn't test both methods intensively.
This might help
What about using Pythagorean theorem ? Since you have two rectangles, when they intersect, you will have one or more triangles, each with one angle of 90°.
Since you also know the angle between the two rectangles (20° in my example), and the coordinates of each rectangle, you can use the the appropriate function (cos/sin/tan) to know the length of all the edges of the triangles.
I hope this can help
Consider I have the following a stabilized video frame where stabilization is done by only rotation and translation (no scaling):
As seen in the image, Right-hand side of the image is symmetric of the previous pixels, i.e the black region after rotation is filled with symmetry. I added a red line to indicate it more clearly.
I'd like to find the rotation angle which I will use later on. I could have done this via SURF or SIFT features, however, in real case scenario, I won't have the original frame.
I probably can find the angle by brute force but I wonder if there is any better and more elegant solution. Note that, the intensity value of the symmetric part is not precisely the same as the original part. I've checked some values, for example, upper right pixel of V character on the keyboard is [51 49 47] in original part but [50 50 47] in symmetric copy which means corresponding pixels are not guaranteed to be the same RGB value.
I'll implement this on Matlab or python and the video stabilization is done using ffmpeg.
EDIT: I only have stabilized video, don't have access to original video or files produced by ffmpeg.
Any help/suggestion is appreciated,
A pixel (probably) lies on the searched symmetry line if
Its (first/second/thrid/...) left and right point are equal (=> dG, Figure 1 left)
Its (first/second/thrid/...) left (or right) value is different (=> dGs, Figure 1 middle)
So, the points of interest are characterised by high values for |dGs| - |dG| (=> dGs_dG, Figure 1 right)
As can be seen on the right image of Figure 1, a lot of false positives still exist. Therefore, the Hough transform (Figure 2 left) will be used to detect all the points corresponding to the strongest line (Figure 2 right). The green line is indeed the searched line.
Tuning
Changing n: Higher values will discard more false positives, but also excludes n border pixels. This can be avoided by using a lower n for the border pixels.
Changing thresholds: A higher threshold on dGs_dG will discard more false positives. Discarding high values of dG may also be interesting to discard edge locations in the original image.
A priori knowledge of symmetry line: using the definition of the hough transform, you can discard all lines passing through the center part of the image.
The matlab code used to generate the images is:
I = imread('bnuqb.png');
G = int16(rgb2gray(I));
n = 3; % use the first, second and third left/right point
dG = int16(zeros(size(G) - [0 2*n+2]));
dGs = int16(zeros(size(G) - [0 2*n+2]));
for i=0:n
dG = dG + abs(G(:, 1+n-i:end-2-n-i) - G(:, 3+n+i:end-n+i));
dGs = dGs + abs(G(:, 1+n-i:end-2-n-i) - G(:, 2+n:end-n-1));
end
dGs_dG = dGs - dG;
dGs_dG(dGs_dG < 0) = 0;
figure
subplot(1,3,1);
imshow(dG, [])
subplot(1,3,2);
imshow(dGs, [])
subplot(1,3,3);
imshow(dGs_dG, [])
BW = dGs_dG > 0;
[H,theta,rho] = hough(BW);
P = houghpeaks(H,1);
lines = houghlines(BW,theta,rho,P,'FillGap',50000,'MinLength',7);
figure
subplot(1,2,1);
imshow(H, [])
hold on
plot(P(:, 2),P(:, 1),'r.');
subplot(1,2,2);
imshow(I(:, n+2:end-n-1, :))
hold on
max_len = 0;
for k = 1:length(lines)
xy = [lines(k).point1; lines(k).point2];
plot(xy(:,1),xy(:,2),'g');
end
I have a historical time sequence of seafloor images scanned from film that need registration.
from pylab import *
import cv2
import urllib
urllib.urlretrieve('http://geoport.whoi.edu/images/frame014.png','frame014.png');
urllib.urlretrieve('http://geoport.whoi.edu/images/frame015.png','frame015.png');
gray1=cv2.imread('frame014.png',0)
gray2=cv2.imread('frame015.png',0)
figure(figsize=(14,6))
subplot(121);imshow(gray1,cmap=cm.gray);
subplot(122);imshow(gray2,cmap=cm.gray);
I want to use the black region on the left of each image to do the registration, since that region was inside the camera and should be fixed in time. So I just need to compute the affine transformation between the black regions.
I determined these regions by thresholding and finding the largest contour:
def find_biggest_contour(gray,threshold=40):
# threshold a grayscale image
ret,thresh = cv2.threshold(gray,threshold,255,1)
# find the contours
contours,h = cv2.findContours(thresh,mode=cv2.RETR_LIST,method=cv2.CHAIN_APPROX_NONE)
# measure the perimeter
perim = [cv2.arcLength(cnt,True) for cnt in contours]
# find contour with largest perimeter
i=perim.index(max(perim))
return contours[i]
c1=find_biggest_contour(gray1)
c2=find_biggest_contour(gray2)
x1=c1[:,0,0];y1=c1[:,0,1]
x2=c2[:,0,0];y2=c2[:,0,1]
figure(figsize=(8,8))
imshow(gray1,cmap=cm.gray, alpha=0.5);plot(x1,y1,'b-')
imshow(gray2,cmap=cm.gray, alpha=0.5);plot(x2,y2,'g-')
axis([0,1500,1000,0]);
The blue is the longest contour from the 1st frame, the green is the longest contour from the 2nd frame.
What is the best way to determine the rotation and offset between the blue and green contours?
I only want to use the right side of the contours in some region surrounding the step, something like the region between the arrows.
Of course, if there is a better way to register these images, I'd love to hear it. I already tried a standard feature matching approach on the raw images, and it didn't work well enough.
Following Shambool's suggested approach, here's what I've come up with. I used a Ramer-Douglas-Peucker algorithm to simplify the contour in the region of interest and identified the two turning points. I was going to use the two turning points to get my three unknowns (xoffset, yoffset and angle of rotation), but the 2nd turning point is a bit too far toward the right because RDP simplified away the smoother curve in this region. So instead I used the angle of the line segment leading up to the 1st turning point. Differencing this angle between image1 and image2 gives me the rotation angle. I'm still not completely happy with this solution. It worked well enough for these two images, but I'm not sure it will work well on the entire image sequence. We'll see.
It would really be better to fit the contour to the known shape of the black border.
# select region of interest from largest contour
ind1=where((x1>190.) & (y1>200.) & (y1<900.))[0]
ind2=where((x2>190.) & (y2>200.) & (y2<900.))[0]
figure(figsize=(10,10))
imshow(gray1,cmap=cm.gray, alpha=0.5);plot(x1[ind1],y1[ind1],'b-')
imshow(gray2,cmap=cm.gray, alpha=0.5);plot(x2[ind2],y2[ind2],'g-')
axis([0,1500,1000,0])
def angle(x1,y1):
# Returns angle of each segment along an (x,y) track
return array([math.atan2(y,x) for (y,x) in zip(diff(y1),diff(x1))])
def simplify(x,y, tolerance=40, min_angle = 60.*pi/180.):
"""
Use the Ramer-Douglas-Peucker algorithm to simplify the path
http://en.wikipedia.org/wiki/Ramer-Douglas-Peucker_algorithm
Python implementation: https://github.com/sebleier/RDP/
"""
from RDP import rdp
points=vstack((x,y)).T
simplified = array(rdp(points.tolist(), tolerance))
sx, sy = simplified.T
theta=abs(diff(angle(sx,sy)))
# Select the index of the points with the greatest theta
# Large theta is associated with greatest change in direction.
idx = where(theta>min_angle)[0]+1
return sx,sy,idx
sx1,sy1,i1 = simplify(x1[ind1],y1[ind1])
sx2,sy2,i2 = simplify(x2[ind2],y2[ind2])
fig = plt.figure(figsize=(10,6))
ax =fig.add_subplot(111)
ax.plot(x1, y1, 'b-', x2, y2, 'g-',label='original path')
ax.plot(sx1, sy1, 'ko-', sx2, sy2, 'ko-',lw=2, label='simplified path')
ax.plot(sx1[i1], sy1[i1], 'ro', sx2[i2], sy2[i2], 'ro',
markersize = 10, label='turning points')
ax.invert_yaxis()
plt.legend(loc='best')
# determine x,y offset between 1st turning points, and
# angle from difference in slopes of line segments approaching 1st turning point
xoff = sx2[i2[0]] - sx1[i1[0]]
yoff = sy2[i2[0]] - sy1[i1[0]]
iseg1 = [i1[0]-1, i1[0]]
iseg2 = [i2[0]-1, i2[0]]
ang1 = angle(sx1[iseg1], sy1[iseg1])
ang2 = angle(sx2[iseg2], sy2[iseg2])
ang = -(ang2[0] - ang1[0])
print xoff, yoff, ang*180.*pi
-28 14 5.07775871644
# 2x3 affine matrix M
M=array([cos(ang),sin(ang),xoff,-sin(ang),cos(ang),yoff]).reshape(2,3)
print M
[[ 9.99959685e-01 8.97932821e-03 -2.80000000e+01]
[ -8.97932821e-03 9.99959685e-01 1.40000000e+01]]
# warp 2nd image into coordinate frame of 1st
Minv = cv2.invertAffineTransform(M)
gray2b = cv2.warpAffine(gray2,Minv,shape(gray2.T))
figure(figsize=(10,10))
imshow(gray1,cmap=cm.gray, alpha=0.5);plot(x1[ind1],y1[ind1],'b-')
imshow(gray2b,cmap=cm.gray, alpha=0.5);
axis([0,1500,1000,0]);
title('image1 and transformed image2 overlain with 50% transparency');
Good question.
One approach is to represent contours as 2d point clouds and then do registration.
More simple and clear code in Matlab that can give you affine transform.
And more complex C++ code(using VXL lib) with python and matlab wrapper included.
Or you can use some modificated ICP(iterative closest point) algorithm that is robust to noise and can handle affine transform.
Also your contours seems to be not very accurate so it can be a problem.
Another approach is to use some kind of registration that use pixel values.
Matlab code (I think it's using some kind of minimizer+ crosscorrelation metric)
Also maybe there is some kind of optical flow registration(or some other kind) that is used in medical imaging.
Also you can use point features as SIFT(SURF).
You can try it quick in FIJI(ImageJ)
also this link.
Open 2 images
Plugins->feature extraction-> sift (or other)
Set expected transformation to affine
Look at estimated transformation model [3,3] homography matrix in ImageJ log.
If it works good then you can implement it in python using OpenCV or maybe using Jython with ImageJ.
And it will be better if you post original images and describe all conditions (it seems that image is changing between frames)
You can represent these contours with their respective ellipses. These ellipses are centered on the centroid of the contour and they are oriented towards the main density axis. You can compare the centroids and the orientation angle.
1) Fill the contours => drawContours with thickness=CV_FILLED
2) Find moments => cvMoments()
3) And use them.
Centroid: { x, y } = {M10/M00, M01/M00 }
Orientation (theta):
EDIT: I customized the sample code from legacy (enteringblobdetection.cpp) for your case.
/* Image moments */
double M00,X,Y,XX,YY,XY;
CvMoments m;
CvRect r = ((CvContour*)cnt)->rect;
CvMat mat;
cvMoments( cvGetSubRect(pImgFG,&mat,r), &m, 0 );
M00 = cvGetSpatialMoment( &m, 0, 0 );
X = cvGetSpatialMoment( &m, 1, 0 )/M00;
Y = cvGetSpatialMoment( &m, 0, 1 )/M00;
XX = (cvGetSpatialMoment( &m, 2, 0 )/M00) - X*X;
YY = (cvGetSpatialMoment( &m, 0, 2 )/M00) - Y*Y;
XY = (cvGetSpatialMoment( &m, 1, 1 )/M00) - X*Y;
/* Contour description */
CvPoint myCentroid(r.x+(float)X,r.y+(float)Y);
double myTheta = atan( 2*XY/(XX-YY) );
Also, check this with OpenCV 2.0 examples.
If you don't want to find the homography between the two images and want to find the affine transformation you have three unknowns, rotation angle (R), and the displacement in x and y (X,Y). Therefore minimum of two points (with two known values for each) are needed to find the unknowns. Two points should be matched between the two images or two lines, each has two known values, the intercept and slope. If you go with the point matching approach, the further the points are from each other the more robust is the found transform to noise (this is very simple if you remember error propagation rules).
In the two point matching method:
find two points (A and B) in the first image I1 and their corresponding points (A',B') in the second image I2
find the middle point between A and B: C, and the middle point between A' and B': C'
the difference C and C' (C-C') gives the translation between the images (X and Y)
using the dot product of C-A and C'-A' you can find the rotation angle (R)
To detect robust points, I would find the the points along the side of counter you have found with highest absolute value of the second derivative (Hessian) and then try to match them. Since you mentioned this is a video footage you can easily make the assumption the transformation between each two frames is small to reject the outliers.
What is the best way to represent and solve a maze given an image?
Given an JPEG image (as seen above), what's the best way to read it in, parse it into some data structure and solve the maze? My first instinct is to read the image in pixel by pixel and store it in a list (array) of boolean values: True for a white pixel, and False for a non-white pixel (the colours can be discarded). The issue with this method, is that the image may not be "pixel perfect". By that I simply mean that if there is a white pixel somewhere on a wall it may create an unintended path.
Another method (which came to me after a bit of thought) is to convert the image to an SVG file - which is a list of paths drawn on a canvas. This way, the paths could be read into the same sort of list (boolean values) where True indicates a path or wall, False indicating a travel-able space. An issue with this method arises if the conversion is not 100% accurate, and does not fully connect all of the walls, creating gaps.
Also an issue with converting to SVG is that the lines are not "perfectly" straight. This results in the paths being cubic bezier curves. With a list (array) of boolean values indexed by integers, the curves would not transfer easily, and all the points that line on the curve would have to be calculated, but won't exactly match to list indices.
I assume that while one of these methods may work (though probably not) that they are woefully inefficient given such a large image, and that there exists a better way. How is this best (most efficiently and/or with the least complexity) done? Is there even a best way?
Then comes the solving of the maze. If I use either of the first two methods, I will essentially end up with a matrix. According to this answer, a good way to represent a maze is using a tree, and a good way to solve it is using the A* algorithm. How would one create a tree from the image? Any ideas?
TL;DR
Best way to parse? Into what data structure? How would said structure help/hinder solving?
UPDATE
I've tried my hand at implementing what #Mikhail has written in Python, using numpy, as #Thomas recommended. I feel that the algorithm is correct, but it's not working as hoped. (Code below.) The PNG library is PyPNG.
import png, numpy, Queue, operator, itertools
def is_white(coord, image):
""" Returns whether (x, y) is approx. a white pixel."""
a = True
for i in xrange(3):
if not a: break
a = image[coord[1]][coord[0] * 3 + i] > 240
return a
def bfs(s, e, i, visited):
""" Perform a breadth-first search. """
frontier = Queue.Queue()
while s != e:
for d in [(-1, 0), (0, -1), (1, 0), (0, 1)]:
np = tuple(map(operator.add, s, d))
if is_white(np, i) and np not in visited:
frontier.put(np)
visited.append(s)
s = frontier.get()
return visited
def main():
r = png.Reader(filename = "thescope-134.png")
rows, cols, pixels, meta = r.asDirect()
assert meta['planes'] == 3 # ensure the file is RGB
image2d = numpy.vstack(itertools.imap(numpy.uint8, pixels))
start, end = (402, 985), (398, 27)
print bfs(start, end, image2d, [])
Here is a solution.
Convert image to grayscale (not yet binary), adjusting weights for the colors so that final grayscale image is approximately uniform. You can do it simply by controlling sliders in Photoshop in Image -> Adjustments -> Black & White.
Convert image to binary by setting appropriate threshold in Photoshop in Image -> Adjustments -> Threshold.
Make sure threshold is selected right. Use the Magic Wand Tool with 0 tolerance, point sample, contiguous, no anti-aliasing. Check that edges at which selection breaks are not false edges introduced by wrong threshold. In fact, all interior points of this maze are accessible from the start.
Add artificial borders on the maze to make sure virtual traveler will not walk around it :)
Implement breadth-first search (BFS) in your favorite language and run it from the start. I prefer MATLAB for this task. As #Thomas already mentioned, there is no need to mess with regular representation of graphs. You can work with binarized image directly.
Here is the MATLAB code for BFS:
function path = solve_maze(img_file)
%% Init data
img = imread(img_file);
img = rgb2gray(img);
maze = img > 0;
start = [985 398];
finish = [26 399];
%% Init BFS
n = numel(maze);
Q = zeros(n, 2);
M = zeros([size(maze) 2]);
front = 0;
back = 1;
function push(p, d)
q = p + d;
if maze(q(1), q(2)) && M(q(1), q(2), 1) == 0
front = front + 1;
Q(front, :) = q;
M(q(1), q(2), :) = reshape(p, [1 1 2]);
end
end
push(start, [0 0]);
d = [0 1; 0 -1; 1 0; -1 0];
%% Run BFS
while back <= front
p = Q(back, :);
back = back + 1;
for i = 1:4
push(p, d(i, :));
end
end
%% Extracting path
path = finish;
while true
q = path(end, :);
p = reshape(M(q(1), q(2), :), 1, 2);
path(end + 1, :) = p;
if isequal(p, start)
break;
end
end
end
It is really very simple and standard, there should not be difficulties on implementing this in Python or whatever.
And here is the answer:
This solution is written in Python. Thanks Mikhail for the pointers on the image preparation.
An animated Breadth-First Search:
The Completed Maze:
#!/usr/bin/env python
import sys
from Queue import Queue
from PIL import Image
start = (400,984)
end = (398,25)
def iswhite(value):
if value == (255,255,255):
return True
def getadjacent(n):
x,y = n
return [(x-1,y),(x,y-1),(x+1,y),(x,y+1)]
def BFS(start, end, pixels):
queue = Queue()
queue.put([start]) # Wrapping the start tuple in a list
while not queue.empty():
path = queue.get()
pixel = path[-1]
if pixel == end:
return path
for adjacent in getadjacent(pixel):
x,y = adjacent
if iswhite(pixels[x,y]):
pixels[x,y] = (127,127,127) # see note
new_path = list(path)
new_path.append(adjacent)
queue.put(new_path)
print "Queue has been exhausted. No answer was found."
if __name__ == '__main__':
# invoke: python mazesolver.py <mazefile> <outputfile>[.jpg|.png|etc.]
base_img = Image.open(sys.argv[1])
base_pixels = base_img.load()
path = BFS(start, end, base_pixels)
path_img = Image.open(sys.argv[1])
path_pixels = path_img.load()
for position in path:
x,y = position
path_pixels[x,y] = (255,0,0) # red
path_img.save(sys.argv[2])
Note: Marks a white visited pixel grey. This removes the need for a visited list, but this requires a second load of the image file from disk before drawing a path (if you don't want a composite image of the final path and ALL paths taken).
A blank version of the maze I used.
I tried myself implementing A-Star search for this problem. Followed closely the implementation by Joseph Kern for the framework and the algorithm pseudocode given here:
def AStar(start, goal, neighbor_nodes, distance, cost_estimate):
def reconstruct_path(came_from, current_node):
path = []
while current_node is not None:
path.append(current_node)
current_node = came_from[current_node]
return list(reversed(path))
g_score = {start: 0}
f_score = {start: g_score[start] + cost_estimate(start, goal)}
openset = {start}
closedset = set()
came_from = {start: None}
while openset:
current = min(openset, key=lambda x: f_score[x])
if current == goal:
return reconstruct_path(came_from, goal)
openset.remove(current)
closedset.add(current)
for neighbor in neighbor_nodes(current):
if neighbor in closedset:
continue
if neighbor not in openset:
openset.add(neighbor)
tentative_g_score = g_score[current] + distance(current, neighbor)
if tentative_g_score >= g_score.get(neighbor, float('inf')):
continue
came_from[neighbor] = current
g_score[neighbor] = tentative_g_score
f_score[neighbor] = tentative_g_score + cost_estimate(neighbor, goal)
return []
As A-Star is a heuristic search algorithm you need to come up with a function that estimates the remaining cost (here: distance) until the goal is reached. Unless you're comfortable with a suboptimal solution it should not overestimate the cost. A conservative choice would here be the manhattan (or taxicab) distance as this represents the straight-line distance between two points on the grid for the used Von Neumann neighborhood. (Which, in this case, wouldn't ever overestimate the cost.)
This would however significantly underestimate the actual cost for the given maze at hand. Therefore I've added two other distance metrics squared euclidean distance and the manhattan distance multiplied by four for comparison. These however might overestimate the actual cost, and might therefore yield suboptimal results.
Here's the code:
import sys
from PIL import Image
def is_blocked(p):
x,y = p
pixel = path_pixels[x,y]
if any(c < 225 for c in pixel):
return True
def von_neumann_neighbors(p):
x, y = p
neighbors = [(x-1, y), (x, y-1), (x+1, y), (x, y+1)]
return [p for p in neighbors if not is_blocked(p)]
def manhattan(p1, p2):
return abs(p1[0]-p2[0]) + abs(p1[1]-p2[1])
def squared_euclidean(p1, p2):
return (p1[0]-p2[0])**2 + (p1[1]-p2[1])**2
start = (400, 984)
goal = (398, 25)
# invoke: python mazesolver.py <mazefile> <outputfile>[.jpg|.png|etc.]
path_img = Image.open(sys.argv[1])
path_pixels = path_img.load()
distance = manhattan
heuristic = manhattan
path = AStar(start, goal, von_neumann_neighbors, distance, heuristic)
for position in path:
x,y = position
path_pixels[x,y] = (255,0,0) # red
path_img.save(sys.argv[2])
Here are some images for a visualization of the results (inspired by the one posted by Joseph Kern). The animations show a new frame each after 10000 iterations of the main while-loop.
Breadth-First Search:
A-Star Manhattan Distance:
A-Star Squared Euclidean Distance:
A-Star Manhattan Distance multiplied by four:
The results show that the explored regions of the maze differ considerably for the heuristics being used. As such, squared euclidean distance even produces a different (suboptimal) path as the other metrics.
Concerning the performance of the A-Star algorithm in terms of the runtime until termination, note that a lot of evaluation of distance and cost functions add up compared to the Breadth-First Search (BFS) which only needs to evaluate the "goaliness" of each candidate position. Whether or not the cost for these additional function evaluations (A-Star) outweighs the cost for the larger number of nodes to check (BFS) and especially whether or not performance is an issue for your application at all, is a matter of individual perception and can of course not be generally answered.
A thing that can be said in general about whether or not an informed search algorithm (such as A-Star) could be the better choice compared to an exhaustive search (e.g., BFS) is the following. With the number of dimensions of the maze, i.e., the branching factor of the search tree, the disadvantage of an exhaustive search (to search exhaustively) grows exponentially. With growing complexity it becomes less and less feasible to do so and at some point you are pretty much happy with any result path, be it (approximately) optimal or not.
Tree search is too much. The maze is inherently separable along the solution path(s).
(Thanks to rainman002 from Reddit for pointing this out to me.)
Because of this, you can quickly use connected components to identify the connected sections of maze wall. This iterates over the pixels twice.
If you want to turn that into a nice diagram of the solution path(s), you can then use binary operations with structuring elements to fill in the "dead end" pathways for each connected region.
Demo code for MATLAB follows. It could use tweaking to clean up the result better, make it more generalizable, and make it run faster. (Sometime when it's not 2:30 AM.)
% read in and invert the image
im = 255 - imread('maze.jpg');
% sharpen it to address small fuzzy channels
% threshold to binary 15%
% run connected components
result = bwlabel(im2bw(imfilter(im,fspecial('unsharp')),0.15));
% purge small components (e.g. letters)
for i = 1:max(reshape(result,1,1002*800))
[count,~] = size(find(result==i));
if count < 500
result(result==i) = 0;
end
end
% close dead-end channels
closed = zeros(1002,800);
for i = 1:max(reshape(result,1,1002*800))
k = zeros(1002,800);
k(result==i) = 1; k = imclose(k,strel('square',8));
closed(k==1) = i;
end
% do output
out = 255 - im;
for x = 1:1002
for y = 1:800
if closed(x,y) == 0
out(x,y,:) = 0;
end
end
end
imshow(out);
Here you go: maze-solver-python (GitHub)
I had fun playing around with this and extended on Joseph Kern's answer. Not to detract from it; I just made some minor additions for anyone else who may be interested in playing around with this.
It's a python-based solver which uses BFS to find the shortest path. My main additions, at the time, are:
The image is cleaned before the search (ie. convert to pure black & white)
Automatically generate a GIF.
Automatically generate an AVI.
As it stands, the start/end-points are hard-coded for this sample maze, but I plan on extending it such that you can pick the appropriate pixels.
Uses a queue for a threshold continuous fill. Pushes the pixel left of the entrance onto the queue and then starts the loop. If a queued pixel is dark enough, it's colored light gray (above threshold), and all the neighbors are pushed onto the queue.
from PIL import Image
img = Image.open("/tmp/in.jpg")
(w,h) = img.size
scan = [(394,23)]
while(len(scan) > 0):
(i,j) = scan.pop()
(r,g,b) = img.getpixel((i,j))
if(r*g*b < 9000000):
img.putpixel((i,j),(210,210,210))
for x in [i-1,i,i+1]:
for y in [j-1,j,j+1]:
scan.append((x,y))
img.save("/tmp/out.png")
Solution is the corridor between gray wall and colored wall. Note this maze has multiple solutions. Also, this merely appears to work.
I'd go for the matrix-of-bools option. If you find that standard Python lists are too inefficient for this, you could use a numpy.bool array instead. Storage for a 1000x1000 pixel maze is then just 1 MB.
Don't bother with creating any tree or graph data structures. That's just a way of thinking about it, but not necessarily a good way to represent it in memory; a boolean matrix is both easier to code and more efficient.
Then use the A* algorithm to solve it. For the distance heuristic, use the Manhattan distance (distance_x + distance_y).
Represent nodes by a tuple of (row, column) coordinates. Whenever the algorithm (Wikipedia pseudocode) calls for "neighbours", it's a simple matter of looping over the four possible neighbours (mind the edges of the image!).
If you find that it's still too slow, you could try downscaling the image before you load it. Be careful not to lose any narrow paths in the process.
Maybe it's possible to do a 1:2 downscaling in Python as well, checking that you don't actually lose any possible paths. An interesting option, but it needs a bit more thought.
Here are some ideas.
(1. Image Processing:)
1.1 Load the image as RGB pixel map. In C# it is trivial using system.drawing.bitmap. In languages with no simple support for imaging, just convert the image to portable pixmap format (PPM) (a Unix text representation, produces large files) or some simple binary file format you can easily read, such as BMP or TGA. ImageMagick in Unix or IrfanView in Windows.
1.2 You may, as mentioned earlier, simplify the data by taking the (R+G+B)/3 for each pixel as an indicator of gray tone and then threshold the value to produce a black and white table. Something close to 200 assuming 0=black and 255=white will take out the JPEG artifacts.
(2. Solutions:)
2.1 Depth-First Search: Init an empty stack with starting location, collect available follow-up moves, pick one at random and push onto the stack, proceed until end is reached or a deadend. On deadend backtrack by popping the stack, you need to keep track of which positions were visited on the map so when you collect available moves you never take the same path twice. Very interesting to animate.
2.2 Breadth-First Search: Mentioned before, similar as above but only using queues. Also interesting to animate. This works like flood-fill in image editing software. I think you may be able to solve a maze in Photoshop using this trick.
2.3 Wall Follower: Geometrically speaking, a maze is a folded/convoluted tube. If you keep your hand on the wall you will eventually find the exit ;) This does not always work. There are certain assumption re: perfect mazes, etc., for instance, certain mazes contain islands. Do look it up; it is fascinating.
(3. Comments:)
This is the tricky one. It is easy to solve mazes if represented in some simple array formal with each element being a cell type with north, east, south and west walls and a visited flag field. However given that you are trying to do this given a hand drawn sketch it becomes messy. I honestly think that trying to rationalize the sketch will drive you nuts. This is akin to computer vision problems which are fairly involved. Perhaps going directly onto the image map may be easier yet more wasteful.
Here's a solution using R.
### download the image, read it into R, converting to something we can play with...
library(jpeg)
url <- "https://i.stack.imgur.com/TqKCM.jpg"
download.file(url, "./maze.jpg", mode = "wb")
jpg <- readJPEG("./maze.jpg")
### reshape array into data.frame
library(reshape2)
img3 <- melt(jpg, varnames = c("y","x","rgb"))
img3$rgb <- as.character(factor(img3$rgb, levels = c(1,2,3), labels=c("r","g","b")))
## split out rgb values into separate columns
img3 <- dcast(img3, x + y ~ rgb)
RGB to greyscale, see: https://stackoverflow.com/a/27491947/2371031
# convert rgb to greyscale (0, 1)
img3$v <- img3$r*.21 + img3$g*.72 + img3$b*.07
# v: values closer to 1 are white, closer to 0 are black
## strategically fill in some border pixels so the solver doesn't "go around":
img3$v2 <- img3$v
img3[(img3$x == 300 | img3$x == 500) & (img3$y %in% c(0:23,988:1002)),"v2"] = 0
# define some start/end point coordinates
pts_df <- data.frame(x = c(398, 399),
y = c(985, 26))
# set a reference value as the mean of the start and end point greyscale "v"s
ref_val <- mean(c(subset(img3, x==pts_df[1,1] & y==pts_df[1,2])$v,
subset(img3, x==pts_df[2,1] & y==pts_df[2,2])$v))
library(sp)
library(gdistance)
spdf3 <- SpatialPixelsDataFrame(points = img3[c("x","y")], data = img3["v2"])
r3 <- rasterFromXYZ(spdf3)
# transition layer defines a "conductance" function between any two points, and the number of connections (4 = Manhatten distances)
# x in the function represents the greyscale values ("v2") of two adjacent points (pixels), i.e., = (x1$v2, x2$v2)
# make function(x) encourages transitions between cells with small changes in greyscale compared to the reference values, such that:
# when v2 is closer to 0 (black) = poor conductance
# when v2 is closer to 1 (white) = good conductance
tl3 <- transition(r3, function(x) (1/max( abs( (x/ref_val)-1 ) )^2)-1, 4)
## get the shortest path between start, end points
sPath3 <- shortestPath(tl3, as.numeric(pts_df[1,]), as.numeric(pts_df[2,]), output = "SpatialLines")
## fortify for ggplot
sldf3 <- fortify(SpatialLinesDataFrame(sPath3, data = data.frame(ID = 1)))
# plot the image greyscale with start/end points (red) and shortest path (green)
ggplot(img3) +
geom_raster(aes(x, y, fill=v2)) +
scale_fill_continuous(high="white", low="black") +
scale_y_reverse() +
geom_point(data=pts_df, aes(x, y), color="red") +
geom_path(data=sldf3, aes(x=long, y=lat), color="green")
Voila!
This is what happens if you don't fill in some border pixels (Ha!)...
Full disclosure: I asked and answered a very similar question myself before I found this one. Then through the magic of SO, found this one as one of the top "Related Questions". I thought I'd use this maze as an additional test case... I was very pleased to find that my answer there also works for this application with very little modification.
the good solution would be that instead of finding the neighbors by pixel, it would be done by cell, because a corridor can have 15px so in the same corridor it can take actions like left or right, while if it was done as if the displacement was a cube it would be a simple action like UP,DOWN,LEFT OR RIGHT