Representing and solving a maze given an image - python
What is the best way to represent and solve a maze given an image?
Given an JPEG image (as seen above), what's the best way to read it in, parse it into some data structure and solve the maze? My first instinct is to read the image in pixel by pixel and store it in a list (array) of boolean values: True for a white pixel, and False for a non-white pixel (the colours can be discarded). The issue with this method, is that the image may not be "pixel perfect". By that I simply mean that if there is a white pixel somewhere on a wall it may create an unintended path.
Another method (which came to me after a bit of thought) is to convert the image to an SVG file - which is a list of paths drawn on a canvas. This way, the paths could be read into the same sort of list (boolean values) where True indicates a path or wall, False indicating a travel-able space. An issue with this method arises if the conversion is not 100% accurate, and does not fully connect all of the walls, creating gaps.
Also an issue with converting to SVG is that the lines are not "perfectly" straight. This results in the paths being cubic bezier curves. With a list (array) of boolean values indexed by integers, the curves would not transfer easily, and all the points that line on the curve would have to be calculated, but won't exactly match to list indices.
I assume that while one of these methods may work (though probably not) that they are woefully inefficient given such a large image, and that there exists a better way. How is this best (most efficiently and/or with the least complexity) done? Is there even a best way?
Then comes the solving of the maze. If I use either of the first two methods, I will essentially end up with a matrix. According to this answer, a good way to represent a maze is using a tree, and a good way to solve it is using the A* algorithm. How would one create a tree from the image? Any ideas?
TL;DR
Best way to parse? Into what data structure? How would said structure help/hinder solving?
UPDATE
I've tried my hand at implementing what #Mikhail has written in Python, using numpy, as #Thomas recommended. I feel that the algorithm is correct, but it's not working as hoped. (Code below.) The PNG library is PyPNG.
import png, numpy, Queue, operator, itertools
def is_white(coord, image):
""" Returns whether (x, y) is approx. a white pixel."""
a = True
for i in xrange(3):
if not a: break
a = image[coord[1]][coord[0] * 3 + i] > 240
return a
def bfs(s, e, i, visited):
""" Perform a breadth-first search. """
frontier = Queue.Queue()
while s != e:
for d in [(-1, 0), (0, -1), (1, 0), (0, 1)]:
np = tuple(map(operator.add, s, d))
if is_white(np, i) and np not in visited:
frontier.put(np)
visited.append(s)
s = frontier.get()
return visited
def main():
r = png.Reader(filename = "thescope-134.png")
rows, cols, pixels, meta = r.asDirect()
assert meta['planes'] == 3 # ensure the file is RGB
image2d = numpy.vstack(itertools.imap(numpy.uint8, pixels))
start, end = (402, 985), (398, 27)
print bfs(start, end, image2d, [])
Here is a solution.
Convert image to grayscale (not yet binary), adjusting weights for the colors so that final grayscale image is approximately uniform. You can do it simply by controlling sliders in Photoshop in Image -> Adjustments -> Black & White.
Convert image to binary by setting appropriate threshold in Photoshop in Image -> Adjustments -> Threshold.
Make sure threshold is selected right. Use the Magic Wand Tool with 0 tolerance, point sample, contiguous, no anti-aliasing. Check that edges at which selection breaks are not false edges introduced by wrong threshold. In fact, all interior points of this maze are accessible from the start.
Add artificial borders on the maze to make sure virtual traveler will not walk around it :)
Implement breadth-first search (BFS) in your favorite language and run it from the start. I prefer MATLAB for this task. As #Thomas already mentioned, there is no need to mess with regular representation of graphs. You can work with binarized image directly.
Here is the MATLAB code for BFS:
function path = solve_maze(img_file)
%% Init data
img = imread(img_file);
img = rgb2gray(img);
maze = img > 0;
start = [985 398];
finish = [26 399];
%% Init BFS
n = numel(maze);
Q = zeros(n, 2);
M = zeros([size(maze) 2]);
front = 0;
back = 1;
function push(p, d)
q = p + d;
if maze(q(1), q(2)) && M(q(1), q(2), 1) == 0
front = front + 1;
Q(front, :) = q;
M(q(1), q(2), :) = reshape(p, [1 1 2]);
end
end
push(start, [0 0]);
d = [0 1; 0 -1; 1 0; -1 0];
%% Run BFS
while back <= front
p = Q(back, :);
back = back + 1;
for i = 1:4
push(p, d(i, :));
end
end
%% Extracting path
path = finish;
while true
q = path(end, :);
p = reshape(M(q(1), q(2), :), 1, 2);
path(end + 1, :) = p;
if isequal(p, start)
break;
end
end
end
It is really very simple and standard, there should not be difficulties on implementing this in Python or whatever.
And here is the answer:
This solution is written in Python. Thanks Mikhail for the pointers on the image preparation.
An animated Breadth-First Search:
The Completed Maze:
#!/usr/bin/env python
import sys
from Queue import Queue
from PIL import Image
start = (400,984)
end = (398,25)
def iswhite(value):
if value == (255,255,255):
return True
def getadjacent(n):
x,y = n
return [(x-1,y),(x,y-1),(x+1,y),(x,y+1)]
def BFS(start, end, pixels):
queue = Queue()
queue.put([start]) # Wrapping the start tuple in a list
while not queue.empty():
path = queue.get()
pixel = path[-1]
if pixel == end:
return path
for adjacent in getadjacent(pixel):
x,y = adjacent
if iswhite(pixels[x,y]):
pixels[x,y] = (127,127,127) # see note
new_path = list(path)
new_path.append(adjacent)
queue.put(new_path)
print "Queue has been exhausted. No answer was found."
if __name__ == '__main__':
# invoke: python mazesolver.py <mazefile> <outputfile>[.jpg|.png|etc.]
base_img = Image.open(sys.argv[1])
base_pixels = base_img.load()
path = BFS(start, end, base_pixels)
path_img = Image.open(sys.argv[1])
path_pixels = path_img.load()
for position in path:
x,y = position
path_pixels[x,y] = (255,0,0) # red
path_img.save(sys.argv[2])
Note: Marks a white visited pixel grey. This removes the need for a visited list, but this requires a second load of the image file from disk before drawing a path (if you don't want a composite image of the final path and ALL paths taken).
A blank version of the maze I used.
I tried myself implementing A-Star search for this problem. Followed closely the implementation by Joseph Kern for the framework and the algorithm pseudocode given here:
def AStar(start, goal, neighbor_nodes, distance, cost_estimate):
def reconstruct_path(came_from, current_node):
path = []
while current_node is not None:
path.append(current_node)
current_node = came_from[current_node]
return list(reversed(path))
g_score = {start: 0}
f_score = {start: g_score[start] + cost_estimate(start, goal)}
openset = {start}
closedset = set()
came_from = {start: None}
while openset:
current = min(openset, key=lambda x: f_score[x])
if current == goal:
return reconstruct_path(came_from, goal)
openset.remove(current)
closedset.add(current)
for neighbor in neighbor_nodes(current):
if neighbor in closedset:
continue
if neighbor not in openset:
openset.add(neighbor)
tentative_g_score = g_score[current] + distance(current, neighbor)
if tentative_g_score >= g_score.get(neighbor, float('inf')):
continue
came_from[neighbor] = current
g_score[neighbor] = tentative_g_score
f_score[neighbor] = tentative_g_score + cost_estimate(neighbor, goal)
return []
As A-Star is a heuristic search algorithm you need to come up with a function that estimates the remaining cost (here: distance) until the goal is reached. Unless you're comfortable with a suboptimal solution it should not overestimate the cost. A conservative choice would here be the manhattan (or taxicab) distance as this represents the straight-line distance between two points on the grid for the used Von Neumann neighborhood. (Which, in this case, wouldn't ever overestimate the cost.)
This would however significantly underestimate the actual cost for the given maze at hand. Therefore I've added two other distance metrics squared euclidean distance and the manhattan distance multiplied by four for comparison. These however might overestimate the actual cost, and might therefore yield suboptimal results.
Here's the code:
import sys
from PIL import Image
def is_blocked(p):
x,y = p
pixel = path_pixels[x,y]
if any(c < 225 for c in pixel):
return True
def von_neumann_neighbors(p):
x, y = p
neighbors = [(x-1, y), (x, y-1), (x+1, y), (x, y+1)]
return [p for p in neighbors if not is_blocked(p)]
def manhattan(p1, p2):
return abs(p1[0]-p2[0]) + abs(p1[1]-p2[1])
def squared_euclidean(p1, p2):
return (p1[0]-p2[0])**2 + (p1[1]-p2[1])**2
start = (400, 984)
goal = (398, 25)
# invoke: python mazesolver.py <mazefile> <outputfile>[.jpg|.png|etc.]
path_img = Image.open(sys.argv[1])
path_pixels = path_img.load()
distance = manhattan
heuristic = manhattan
path = AStar(start, goal, von_neumann_neighbors, distance, heuristic)
for position in path:
x,y = position
path_pixels[x,y] = (255,0,0) # red
path_img.save(sys.argv[2])
Here are some images for a visualization of the results (inspired by the one posted by Joseph Kern). The animations show a new frame each after 10000 iterations of the main while-loop.
Breadth-First Search:
A-Star Manhattan Distance:
A-Star Squared Euclidean Distance:
A-Star Manhattan Distance multiplied by four:
The results show that the explored regions of the maze differ considerably for the heuristics being used. As such, squared euclidean distance even produces a different (suboptimal) path as the other metrics.
Concerning the performance of the A-Star algorithm in terms of the runtime until termination, note that a lot of evaluation of distance and cost functions add up compared to the Breadth-First Search (BFS) which only needs to evaluate the "goaliness" of each candidate position. Whether or not the cost for these additional function evaluations (A-Star) outweighs the cost for the larger number of nodes to check (BFS) and especially whether or not performance is an issue for your application at all, is a matter of individual perception and can of course not be generally answered.
A thing that can be said in general about whether or not an informed search algorithm (such as A-Star) could be the better choice compared to an exhaustive search (e.g., BFS) is the following. With the number of dimensions of the maze, i.e., the branching factor of the search tree, the disadvantage of an exhaustive search (to search exhaustively) grows exponentially. With growing complexity it becomes less and less feasible to do so and at some point you are pretty much happy with any result path, be it (approximately) optimal or not.
Tree search is too much. The maze is inherently separable along the solution path(s).
(Thanks to rainman002 from Reddit for pointing this out to me.)
Because of this, you can quickly use connected components to identify the connected sections of maze wall. This iterates over the pixels twice.
If you want to turn that into a nice diagram of the solution path(s), you can then use binary operations with structuring elements to fill in the "dead end" pathways for each connected region.
Demo code for MATLAB follows. It could use tweaking to clean up the result better, make it more generalizable, and make it run faster. (Sometime when it's not 2:30 AM.)
% read in and invert the image
im = 255 - imread('maze.jpg');
% sharpen it to address small fuzzy channels
% threshold to binary 15%
% run connected components
result = bwlabel(im2bw(imfilter(im,fspecial('unsharp')),0.15));
% purge small components (e.g. letters)
for i = 1:max(reshape(result,1,1002*800))
[count,~] = size(find(result==i));
if count < 500
result(result==i) = 0;
end
end
% close dead-end channels
closed = zeros(1002,800);
for i = 1:max(reshape(result,1,1002*800))
k = zeros(1002,800);
k(result==i) = 1; k = imclose(k,strel('square',8));
closed(k==1) = i;
end
% do output
out = 255 - im;
for x = 1:1002
for y = 1:800
if closed(x,y) == 0
out(x,y,:) = 0;
end
end
end
imshow(out);
Here you go: maze-solver-python (GitHub)
I had fun playing around with this and extended on Joseph Kern's answer. Not to detract from it; I just made some minor additions for anyone else who may be interested in playing around with this.
It's a python-based solver which uses BFS to find the shortest path. My main additions, at the time, are:
The image is cleaned before the search (ie. convert to pure black & white)
Automatically generate a GIF.
Automatically generate an AVI.
As it stands, the start/end-points are hard-coded for this sample maze, but I plan on extending it such that you can pick the appropriate pixels.
Uses a queue for a threshold continuous fill. Pushes the pixel left of the entrance onto the queue and then starts the loop. If a queued pixel is dark enough, it's colored light gray (above threshold), and all the neighbors are pushed onto the queue.
from PIL import Image
img = Image.open("/tmp/in.jpg")
(w,h) = img.size
scan = [(394,23)]
while(len(scan) > 0):
(i,j) = scan.pop()
(r,g,b) = img.getpixel((i,j))
if(r*g*b < 9000000):
img.putpixel((i,j),(210,210,210))
for x in [i-1,i,i+1]:
for y in [j-1,j,j+1]:
scan.append((x,y))
img.save("/tmp/out.png")
Solution is the corridor between gray wall and colored wall. Note this maze has multiple solutions. Also, this merely appears to work.
I'd go for the matrix-of-bools option. If you find that standard Python lists are too inefficient for this, you could use a numpy.bool array instead. Storage for a 1000x1000 pixel maze is then just 1 MB.
Don't bother with creating any tree or graph data structures. That's just a way of thinking about it, but not necessarily a good way to represent it in memory; a boolean matrix is both easier to code and more efficient.
Then use the A* algorithm to solve it. For the distance heuristic, use the Manhattan distance (distance_x + distance_y).
Represent nodes by a tuple of (row, column) coordinates. Whenever the algorithm (Wikipedia pseudocode) calls for "neighbours", it's a simple matter of looping over the four possible neighbours (mind the edges of the image!).
If you find that it's still too slow, you could try downscaling the image before you load it. Be careful not to lose any narrow paths in the process.
Maybe it's possible to do a 1:2 downscaling in Python as well, checking that you don't actually lose any possible paths. An interesting option, but it needs a bit more thought.
Here are some ideas.
(1. Image Processing:)
1.1 Load the image as RGB pixel map. In C# it is trivial using system.drawing.bitmap. In languages with no simple support for imaging, just convert the image to portable pixmap format (PPM) (a Unix text representation, produces large files) or some simple binary file format you can easily read, such as BMP or TGA. ImageMagick in Unix or IrfanView in Windows.
1.2 You may, as mentioned earlier, simplify the data by taking the (R+G+B)/3 for each pixel as an indicator of gray tone and then threshold the value to produce a black and white table. Something close to 200 assuming 0=black and 255=white will take out the JPEG artifacts.
(2. Solutions:)
2.1 Depth-First Search: Init an empty stack with starting location, collect available follow-up moves, pick one at random and push onto the stack, proceed until end is reached or a deadend. On deadend backtrack by popping the stack, you need to keep track of which positions were visited on the map so when you collect available moves you never take the same path twice. Very interesting to animate.
2.2 Breadth-First Search: Mentioned before, similar as above but only using queues. Also interesting to animate. This works like flood-fill in image editing software. I think you may be able to solve a maze in Photoshop using this trick.
2.3 Wall Follower: Geometrically speaking, a maze is a folded/convoluted tube. If you keep your hand on the wall you will eventually find the exit ;) This does not always work. There are certain assumption re: perfect mazes, etc., for instance, certain mazes contain islands. Do look it up; it is fascinating.
(3. Comments:)
This is the tricky one. It is easy to solve mazes if represented in some simple array formal with each element being a cell type with north, east, south and west walls and a visited flag field. However given that you are trying to do this given a hand drawn sketch it becomes messy. I honestly think that trying to rationalize the sketch will drive you nuts. This is akin to computer vision problems which are fairly involved. Perhaps going directly onto the image map may be easier yet more wasteful.
Here's a solution using R.
### download the image, read it into R, converting to something we can play with...
library(jpeg)
url <- "https://i.stack.imgur.com/TqKCM.jpg"
download.file(url, "./maze.jpg", mode = "wb")
jpg <- readJPEG("./maze.jpg")
### reshape array into data.frame
library(reshape2)
img3 <- melt(jpg, varnames = c("y","x","rgb"))
img3$rgb <- as.character(factor(img3$rgb, levels = c(1,2,3), labels=c("r","g","b")))
## split out rgb values into separate columns
img3 <- dcast(img3, x + y ~ rgb)
RGB to greyscale, see: https://stackoverflow.com/a/27491947/2371031
# convert rgb to greyscale (0, 1)
img3$v <- img3$r*.21 + img3$g*.72 + img3$b*.07
# v: values closer to 1 are white, closer to 0 are black
## strategically fill in some border pixels so the solver doesn't "go around":
img3$v2 <- img3$v
img3[(img3$x == 300 | img3$x == 500) & (img3$y %in% c(0:23,988:1002)),"v2"] = 0
# define some start/end point coordinates
pts_df <- data.frame(x = c(398, 399),
y = c(985, 26))
# set a reference value as the mean of the start and end point greyscale "v"s
ref_val <- mean(c(subset(img3, x==pts_df[1,1] & y==pts_df[1,2])$v,
subset(img3, x==pts_df[2,1] & y==pts_df[2,2])$v))
library(sp)
library(gdistance)
spdf3 <- SpatialPixelsDataFrame(points = img3[c("x","y")], data = img3["v2"])
r3 <- rasterFromXYZ(spdf3)
# transition layer defines a "conductance" function between any two points, and the number of connections (4 = Manhatten distances)
# x in the function represents the greyscale values ("v2") of two adjacent points (pixels), i.e., = (x1$v2, x2$v2)
# make function(x) encourages transitions between cells with small changes in greyscale compared to the reference values, such that:
# when v2 is closer to 0 (black) = poor conductance
# when v2 is closer to 1 (white) = good conductance
tl3 <- transition(r3, function(x) (1/max( abs( (x/ref_val)-1 ) )^2)-1, 4)
## get the shortest path between start, end points
sPath3 <- shortestPath(tl3, as.numeric(pts_df[1,]), as.numeric(pts_df[2,]), output = "SpatialLines")
## fortify for ggplot
sldf3 <- fortify(SpatialLinesDataFrame(sPath3, data = data.frame(ID = 1)))
# plot the image greyscale with start/end points (red) and shortest path (green)
ggplot(img3) +
geom_raster(aes(x, y, fill=v2)) +
scale_fill_continuous(high="white", low="black") +
scale_y_reverse() +
geom_point(data=pts_df, aes(x, y), color="red") +
geom_path(data=sldf3, aes(x=long, y=lat), color="green")
Voila!
This is what happens if you don't fill in some border pixels (Ha!)...
Full disclosure: I asked and answered a very similar question myself before I found this one. Then through the magic of SO, found this one as one of the top "Related Questions". I thought I'd use this maze as an additional test case... I was very pleased to find that my answer there also works for this application with very little modification.
the good solution would be that instead of finding the neighbors by pixel, it would be done by cell, because a corridor can have 15px so in the same corridor it can take actions like left or right, while if it was done as if the displacement was a cube it would be a simple action like UP,DOWN,LEFT OR RIGHT
Related
Speed up computation for Distance Transform on Image in Python
I would like to find the find the distance transform of a binary image in the fastest way possible without using the scipy package distance_trnsform_edt(). The image is 256 by 256. The reason I don't want to use scipy is because using it is difficult in tensorflow. Evry time I want to use this package I need to start a new session and this takes a lot of time. So I would like to make a custom function that only utilizes numpy. My approach is as follows: Find the coordinated for all the ones and all the zeros in the image. Find the euclidian distance between each of the zero pixels (a) and the one pixels (b) and then the value at each (a) position is the minimum distance to a (b) pixel. I do this for each 0 pixel. The resultant image has the same dimensions as the original binary map. My attempt at doing this is shown below. I tried to do this as fast as possible using no loops and only vectorization. But my function still can't work as fast as the scipy package can. When I timed the code it looks like the assignment to the variable "a" is taking the longest time. But I do not know if there is a way to speed this up. If anyone has any other suggestions for different algorithms to solve this problem of distance transforms or can direct me to other implementations in python, it would be very appreciated. def get_dst_transform_img(og): #og is a numpy array of original image ones_loc = np.where(og == 1) ones = np.asarray(ones_loc).T # coords of all ones in og zeros_loc = np.where(og == 0) zeros = np.asarray(zeros_loc).T # coords of all zeros in og a = -2 * np.dot(zeros, ones.T) b = np.sum(np.square(ones), axis=1) c = np.sum(np.square(zeros), axis=1)[:,np.newaxis] dists = a + b + c dists = np.sqrt(dists.min(axis=1)) # min dist of each zero pixel to one pixel x = og.shape[0] y = og.shape[1] dist_transform = np.zeros((x,y)) dist_transform[zeros[:,0], zeros[:,1]] = dists plt.figure() plt.imshow(dist_transform)
The implementation in the OP is a brute-force approach to the distance transform. This algorithm is O(n2), as it computes the distance from each background pixel to each foreground pixel. Furthermore, because of the way it is vectorized, it requires a lot of memory. On my computer it couldn't compute the distance transform of a 256x256 image without thrashing. Many other algorithms are described in the literature, below I'll discuss two O(n) algorithms. Note: Typically, the distance transform is computed for object pixels (value 1) to the nearest background pixel (value 0). The code in the OP does the reverse, and so the code I've pasted below follows OP's convention, not the more common convention. The easiest to implement, IMO, is the chamfer distance algorithm. This is a recursive algorithm that does two passes over the image: one left to right and top to bottom, and one right to left and bottom to top. In each pass, the distance computed for previous pixels is propagated. This algorithm can be implemented using integer distances or floating-point distances between neighbors. The latter yields smaller errors, of course. But in both cases the errors can be reduced significantly by increasing the number of neighbors queried in this propagation. The algorithm is older, but G. Borgefors analyzed it and proposed suitable neighbor distances (G. Borgefors, Distance Transformations in Digital Images, Computer Vision, Graphics, and Image Processing 34:344-371, 1986). Here is an implementation using 3-4 distance (distance to edge-connected neighbors is 3, distance to vertex-connected neighbors is 4): def chamfer_distance(img): w, h = img.shape dt = np.zeros((w,h), np.uint32) # Forward pass x = 0 y = 0 if img[x,y] == 0: dt[x,y] = 65535 # some large value for x in range(1, w): if img[x,y] == 0: dt[x,y] = 3 + dt[x-1,y] for y in range(1, h): x = 0 if img[x,y] == 0: dt[x,y] = min(3 + dt[x,y-1], 4 + dt[x+1,y-1]) for x in range(1, w-1): if img[x,y] == 0: dt[x,y] = min(4 + dt[x-1,y-1], 3 + dt[x,y-1], 4 + dt[x+1,y-1], 3 + dt[x-1,y]) x = w-1 if img[x,y] == 0: dt[x,y] = min(4 + dt[x-1,y-1], 3 + dt[x,y-1], 3 + dt[x-1,y]) # Backward pass for x in range(w-2, -1, -1): y = h-1 if img[x,y] == 0: dt[x,y] = min(dt[x,y], 3 + dt[x+1,y]) for y in range(h-2, -1, -1): x = w-1 if img[x,y] == 0: dt[x,y] = min(dt[x,y], 3 + dt[x,y+1], 4 + dt[x-1,y+1]) for x in range(1, w-1): if img[x,y] == 0: dt[x,y] = min(dt[x,y], 4 + dt[x+1,y+1], 3 + dt[x,y+1], 4 + dt[x-1,y+1], 3 + dt[x+1,y]) x = 0 if img[x,y] == 0: dt[x,y] = min(dt[x,y], 4 + dt[x+1,y+1], 3 + dt[x,y+1], 3 + dt[x+1,y]) return dt Note that a lot of the complication here is to avoid indexing out of bounds, but still computing distances all the way to the edges of the image. If we simply skip the pixels around the border of the image, the code becomes much simpler. Because it is a recursive algorithm, it is not possible to vectorize its implementation. The Python code will not be very efficient. But programmed in C or the like will yield a very fast algorithm that yields a fairly good approximation to the Euclidean distance. OpenCV's cv.distanceTransform implements this algorithm. Another very efficient algorithm computes the square of the distance transform. The square distance is separable (i.e. can be computed independently for each axis and added). This leads to an algorithm that is easy to parallelize. For each image row, the algorithm does a forward and a backward pass. For each column in the result, the algorithm then does another forward and backward pass. This process leads to an exact Euclidean distance transform. This algorithm was first proposed by R. van den Boomgaard in his Ph.D. thesis in 1992. Unfortunately this went unnoticed. The algorithm was then again proposed by A. Meijster, J.B.T.M. Roerdink and W.H. Hesselink (A General Algorithm for Computing Distance Transforms in Linear Time, Mathematical Morphology and its Applications to Image and Signal Processing, pp 331-340, 2002), and again by P. Felzenszwalb and D. Huttenlocher (Distance transforms of sampled functions, Technical report, Cornell University, 2004). This is the most efficient algorithm known, in part because it is the only one that can be easily and efficiently parallelized (computation on each image row, and later on each image column, is independent of other rows/columns). Unfortunately I don't have any Python code for this one to share, but you can find implementations online. For example OpenCV's cv.distanceTransform implements this algorithm, and DIPlib's dip.EuclideanDistanceTransform does too.
Calculating the intersection area between two rectangles with axes not aligned
I want to calculate the intersection over union IoU between two rectangles with axes not aligned, but with an angle of the axes smaller than 30 degrees. An approximate value is also seeked. One possible solution is to check if the angle between the two rectangles is less than 30 degree and than rotate them parallel to aligne the axis. From here it is easy to calculate the IoU. Another possibility is to use monte carlo methods for the intersection ( generate a point, find if the point is under some line of one rectangle and above some line of the other), but this seems expensive because I need to use this calculation a large number of times. I was hopping that there is something better out there; maybe a geometry library, or maybe an algorithm from the computer vision folks. I am trying to learn grasping positions using deep neural networks. My algorithem should predict a bounding box (rectangle) for an object in an rgb image. For any image I have also the ground truth (another rectangle) bounding box. From this two rectangles I need the IoU. Any idea?
Since you're working in Python, I think the Shapely package would serve your needs.
There is quite effective algorithm for calculation of intersection between two convex polygons, described in O'Rourke book "Computational Geometry in C". C code is available at the book page (convconv). Algorithm traverse edges of the first polygon, checking orientations of the second polygon vertices in order to detect intersections. When two consequent vertices lie on the different sides of the edge, intersection occurs (there is a lot of trick cases). Algorithm outline is here
You can consider a number of numerical approaches, practically "rendering" the rectangles into some "canvas"/canvases, and traverse the pixels for making your statistics. The size of the canvas should be the size of the bounding box for the entire scene, practically you can find that via picking the minimum and maximum coordinates occurring for each axis. 1) "most CG" approach: really get a rendering library, render one rectangle with red, other rectangle with transparent blue. Then visit each pixel and if it has a non-0 red component, it belongs to the first rectangle, if it has a non-0 blue component, it belongs to the second rectangle. And if it has both, it belongs to the intersection too. This approach is cheap for coding, but requires both writing and reading the canvas even in the rendering phase, which is slower than just writing. This might be even done on GPU too, though I am not sure if setup costs and getting back the result do not weight out the benefit for such a simple scene. 2) another CG-approach would be rendering into 2 arrays, preferably some 1-byte-per-pixel variant, for the sake of speed (you may have to go back in time a bit in order to find such dedicated rendering libraries). This way the renderer only writes, into one array per rectangle, and you read from two when creating the statistics 3) as writing and reading pixels take time, you can do some shortcut, but it needs more coding: convex shapes can be rendered via collecting the minimum and maximum coordinates per scanline, and just filling between the two. If you do it yourself, you can spare the filling part and also the read-and-check-every-pixel step at the end. Build such min-max list for both rectangles, and then you "just" have to check their relation/order for each scanline, to recognize overlaps And then there is the mathematical way: this is not really useful, see EDIT below while it is unlikely that you would find some sane algorithm for calculating intersection area, specifically for the case of rectangles, if you find such algorithm for triangles, which is more probable, that would be enough. Both rectangles can be split into two triangles, 1A+1B and 2A+2B respectively, and then you just have to run such algorithm 4 times: 1A-2A, 1A-2B, 1B-2A, 1B-2B, sum the results and that is the area of your intersection. EDIT: for the maths approach (though this also comes from graphics), I think https://en.wikipedia.org/wiki/Sutherland%E2%80%93Hodgman_algorithm can be applied here (as both rectangles are convex polygons, A-B and B-A should produce the same result) for finding the intersection polygon, and then the remaining task is to calculate the area of that polygon (here and now I think it is going to be convex, and then it is really easy).
I ended up using Sutherland-Hodgman algorithm implemented as this functions: def clip(subjectPolygon, clipPolygon): def inside(p): return(cp2[0]-cp1[0])*(p[1]-cp1[1]) > (cp2[1]-cp1[1])*(p[0]-cp1[0]) def computeIntersection(): dc = [ cp1[0] - cp2[0], cp1[1] - cp2[1] ] dp = [ s[0] - e[0], s[1] - e[1] ] n1 = cp1[0] * cp2[1] - cp1[1] * cp2[0] n2 = s[0] * e[1] - s[1] * e[0] n3 = 1.0 / (dc[0] * dp[1] - dc[1] * dp[0]) return [(n1*dp[0] - n2*dc[0]) * n3, (n1*dp[1] - n2*dc[1]) * n3] outputList = subjectPolygon cp1 = clipPolygon[-1] for clipVertex in clipPolygon: cp2 = clipVertex inputList = outputList outputList = [] s = inputList[-1] for subjectVertex in inputList: e = subjectVertex if inside(e): if not inside(s): outputList.append(computeIntersection()) outputList.append(e) elif inside(s): outputList.append(computeIntersection()) s = e cp1 = cp2 return(outputList) def PolygonArea(corners): n = len(corners) # of corners area = 0.0 for i in range(n): j = (i + 1) % n area += corners[i][0] * corners[j][1] area -= corners[j][0] * corners[i][1] area = abs(area) / 2.0 return area intersection = clip(rec1, rec2) intersection_area = PolygonArea(intersection) iou = intersection_area/(PolygonArea(rec1)+PolygonArea(rec2)-intersection_area) Another slower method (don't know what algorithm) could be: from shapely.geometry import Polygon p1 = Polygon(rec1) p2 = Polygon(rec2) inter_sec_area = p1.intersection(rec2).area iou = inter_sec_area/(p1.area + p2.area - inter_sec_area) It is worth mentioning that in just one case with bigger polygons (not my case) the shapely module had an area twice greater than the first method. I didn't test both methods intensively.
This might help What about using Pythagorean theorem ? Since you have two rectangles, when they intersect, you will have one or more triangles, each with one angle of 90°. Since you also know the angle between the two rectangles (20° in my example), and the coordinates of each rectangle, you can use the the appropriate function (cos/sin/tan) to know the length of all the edges of the triangles. I hope this can help
Optimizations for Polygon Intersection Finding
I have ~1 billion 2D polygons, each with between 5 and 2000 vertices, laid out on a geographical map. I want to generate cutout masks of these polygons in the bounded range of a quadrangle which is a square portion of the map. I have a naively-coded solution which collects all the polygons that overlap the quadrangle bounds, then creates a mask and draws each of the polygons which it has found to be overlapping with the quadrangle bounds. relevant code: def generate_alteration_layers_by_extent(alterations, poly_locators, extents, output_res=1000): left, right, upper, lower = extents query_rectangle = pysal.cg.shapes.Rectangle(left, lower, right, upper) num_layers = len(shapes_by_alterations) boxed_alteration_layers = [] for i, (alteration, poly_locator) in enumerate(zip(alterations, poly_locators)): print "computing overlapping polygons" overlapping_polys = poly_locator.overlapping(query_rectangle) print "drawing polygons onto mask" img = Image.new('L', (output_res, output_res), 0) polys = [np.array(map(list, poly.vertices)) for poly in overlapping_polys] for poly in polys: # normalize to the dimensions of img poly[:,0] -= left; poly[:,0] *= (right-left); poly[:,0] *= output_res poly[:,1] -= lower; poly[:,1] *= (upper-lower); poly[:,1] *= output_res ImageDraw.Draw(img).polygon(poly, outline=1, fill=1) mask = np.array(img) boxed_alteration_layers.append(mask) return np.dstack(boxed_alteration_layers) The problem is that this computation takes too long -- I have run the program for three hours for one mask on a large-memory machine and it has still not finished. The most expensive operation is computing overlapping polygons. I think that the code can by looking over a smaller domain of polygons, excluding searching those that will definitely not overlap. Note that the poly_locator variables are psyal.cg.locators.PolygonLocators. Any advice would be appreciated. I'm not hell-bent on using Python and think that C++ might have the functionality that I need.
I strongly recommend the BOOST.polygon library by Lucanus Simonson. It handles all sorts of polygon interactions, with a surprising reduction in computational complexity. Simonson's innovative calculus to reduce the dimensionality of the 2-D problems is really neat; it left my jaw hanging open for the last 20 minutes or so of his initial proposal.
How to find rotation angle of a stabilized video frame on Matlab
Consider I have the following a stabilized video frame where stabilization is done by only rotation and translation (no scaling): As seen in the image, Right-hand side of the image is symmetric of the previous pixels, i.e the black region after rotation is filled with symmetry. I added a red line to indicate it more clearly. I'd like to find the rotation angle which I will use later on. I could have done this via SURF or SIFT features, however, in real case scenario, I won't have the original frame. I probably can find the angle by brute force but I wonder if there is any better and more elegant solution. Note that, the intensity value of the symmetric part is not precisely the same as the original part. I've checked some values, for example, upper right pixel of V character on the keyboard is [51 49 47] in original part but [50 50 47] in symmetric copy which means corresponding pixels are not guaranteed to be the same RGB value. I'll implement this on Matlab or python and the video stabilization is done using ffmpeg. EDIT: I only have stabilized video, don't have access to original video or files produced by ffmpeg. Any help/suggestion is appreciated,
A pixel (probably) lies on the searched symmetry line if Its (first/second/thrid/...) left and right point are equal (=> dG, Figure 1 left) Its (first/second/thrid/...) left (or right) value is different (=> dGs, Figure 1 middle) So, the points of interest are characterised by high values for |dGs| - |dG| (=> dGs_dG, Figure 1 right) As can be seen on the right image of Figure 1, a lot of false positives still exist. Therefore, the Hough transform (Figure 2 left) will be used to detect all the points corresponding to the strongest line (Figure 2 right). The green line is indeed the searched line. Tuning Changing n: Higher values will discard more false positives, but also excludes n border pixels. This can be avoided by using a lower n for the border pixels. Changing thresholds: A higher threshold on dGs_dG will discard more false positives. Discarding high values of dG may also be interesting to discard edge locations in the original image. A priori knowledge of symmetry line: using the definition of the hough transform, you can discard all lines passing through the center part of the image. The matlab code used to generate the images is: I = imread('bnuqb.png'); G = int16(rgb2gray(I)); n = 3; % use the first, second and third left/right point dG = int16(zeros(size(G) - [0 2*n+2])); dGs = int16(zeros(size(G) - [0 2*n+2])); for i=0:n dG = dG + abs(G(:, 1+n-i:end-2-n-i) - G(:, 3+n+i:end-n+i)); dGs = dGs + abs(G(:, 1+n-i:end-2-n-i) - G(:, 2+n:end-n-1)); end dGs_dG = dGs - dG; dGs_dG(dGs_dG < 0) = 0; figure subplot(1,3,1); imshow(dG, []) subplot(1,3,2); imshow(dGs, []) subplot(1,3,3); imshow(dGs_dG, []) BW = dGs_dG > 0; [H,theta,rho] = hough(BW); P = houghpeaks(H,1); lines = houghlines(BW,theta,rho,P,'FillGap',50000,'MinLength',7); figure subplot(1,2,1); imshow(H, []) hold on plot(P(:, 2),P(:, 1),'r.'); subplot(1,2,2); imshow(I(:, n+2:end-n-1, :)) hold on max_len = 0; for k = 1:length(lines) xy = [lines(k).point1; lines(k).point2]; plot(xy(:,1),xy(:,2),'g'); end
Peak detection in a 2D array
I'm helping a veterinary clinic measuring pressure under a dogs paw. I use Python for my data analysis and now I'm stuck trying to divide the paws into (anatomical) subregions. I made a 2D array of each paw, that consists of the maximal values for each sensor that has been loaded by the paw over time. Here's an example of one paw, where I used Excel to draw the areas I want to 'detect'. These are 2 by 2 boxes around the sensor with local maxima's, that together have the largest sum. So I tried some experimenting and decide to simply look for the maximums of each column and row (can't look in one direction due to the shape of the paw). This seems to 'detect' the location of the separate toes fairly well, but it also marks neighboring sensors. So what would be the best way to tell Python which of these maximums are the ones I want? Note: The 2x2 squares can't overlap, since they have to be separate toes! Also I took 2x2 as a convenience, any more advanced solution is welcome, but I'm simply a human movement scientist, so I'm neither a real programmer or a mathematician, so please keep it 'simple'. Here's a version that can be loaded with np.loadtxt Results So I tried #jextee's solution (see the results below). As you can see, it works very on the front paws, but it works less well for the hind legs. More specifically, it can't recognize the small peak that's the fourth toe. This is obviously inherent to the fact that the loop looks top down towards the lowest value, without taking into account where this is. Would anyone know how to tweak #jextee's algorithm, so that it might be able to find the 4th toe too? Since I haven't processed any other trials yet, I can't supply any other samples. But the data I gave before were the averages of each paw. This file is an array with the maximal data of 9 paws in the order they made contact with the plate. This image shows how they were spatially spread out over the plate. Update: I have set up a blog for anyone interested and I have setup a OneDrive with all the raw measurements. So to anyone requesting more data: more power to you! New update: So after the help I got with my questions regarding paw detection and paw sorting, I was finally able to check the toe detection for every paw! Turns out, it doesn't work so well in anything but paws sized like the one in my own example. Off course in hindsight, it's my own fault for choosing the 2x2 so arbitrarily. Here's a nice example of where it goes wrong: a nail is being recognized as a toe and the 'heel' is so wide, it gets recognized twice! The paw is too large, so taking a 2x2 size with no overlap, causes some toes to be detected twice. The other way around, in small dogs it often fails to find a 5th toe, which I suspect is being caused by the 2x2 area being too large. After trying the current solution on all my measurements I came to the staggering conclusion that for nearly all my small dogs it didn't find a 5th toe and that in over 50% of the impacts for the large dogs it would find more! So clearly I need to change it. My own guess was changing the size of the neighborhood to something smaller for small dogs and larger for large dogs. But generate_binary_structure wouldn't let me change the size of the array. Therefore, I'm hoping that anyone else has a better suggestion for locating the toes, perhaps having the toe area scale with the paw size?
I detected the peaks using a local maximum filter. Here is the result on your first dataset of 4 paws: I also ran it on the second dataset of 9 paws and it worked as well. Here is how you do it: import numpy as np from scipy.ndimage.filters import maximum_filter from scipy.ndimage.morphology import generate_binary_structure, binary_erosion import matplotlib.pyplot as pp #for some reason I had to reshape. Numpy ignored the shape header. paws_data = np.loadtxt("paws.txt").reshape(4,11,14) #getting a list of images paws = [p.squeeze() for p in np.vsplit(paws_data,4)] def detect_peaks(image): """ Takes an image and detect the peaks usingthe local maximum filter. Returns a boolean mask of the peaks (i.e. 1 when the pixel's value is the neighborhood maximum, 0 otherwise) """ # define an 8-connected neighborhood neighborhood = generate_binary_structure(2,2) #apply the local maximum filter; all pixel of maximal value #in their neighborhood are set to 1 local_max = maximum_filter(image, footprint=neighborhood)==image #local_max is a mask that contains the peaks we are #looking for, but also the background. #In order to isolate the peaks we must remove the background from the mask. #we create the mask of the background background = (image==0) #a little technicality: we must erode the background in order to #successfully subtract it form local_max, otherwise a line will #appear along the background border (artifact of the local maximum filter) eroded_background = binary_erosion(background, structure=neighborhood, border_value=1) #we obtain the final mask, containing only peaks, #by removing the background from the local_max mask (xor operation) detected_peaks = local_max ^ eroded_background return detected_peaks #applying the detection and plotting results for i, paw in enumerate(paws): detected_peaks = detect_peaks(paw) pp.subplot(4,2,(2*i+1)) pp.imshow(paw) pp.subplot(4,2,(2*i+2) ) pp.imshow(detected_peaks) pp.show() All you need to do after is use scipy.ndimage.measurements.label on the mask to label all distinct objects. Then you'll be able to play with them individually. Note that the method works well because the background is not noisy. If it were, you would detect a bunch of other unwanted peaks in the background. Another important factor is the size of the neighborhood. You will need to adjust it if the peak size changes (the should remain roughly proportional).
Solution Data file: paw.txt. Source code: from scipy import * from operator import itemgetter n = 5 # how many fingers are we looking for d = loadtxt("paw.txt") width, height = d.shape # Create an array where every element is a sum of 2x2 squares. fourSums = d[:-1,:-1] + d[1:,:-1] + d[1:,1:] + d[:-1,1:] # Find positions of the fingers. # Pair each sum with its position number (from 0 to width*height-1), pairs = zip(arange(width*height), fourSums.flatten()) # Sort by descending sum value, filter overlapping squares def drop_overlapping(pairs): no_overlaps = [] def does_not_overlap(p1, p2): i1, i2 = p1[0], p2[0] r1, col1 = i1 / (width-1), i1 % (width-1) r2, col2 = i2 / (width-1), i2 % (width-1) return (max(abs(r1-r2),abs(col1-col2)) >= 2) for p in pairs: if all(map(lambda prev: does_not_overlap(p,prev), no_overlaps)): no_overlaps.append(p) return no_overlaps pairs2 = drop_overlapping(sorted(pairs, key=itemgetter(1), reverse=True)) # Take the first n with the heighest values positions = pairs2[:n] # Print results print d, "\n" for i, val in positions: row = i / (width-1) column = i % (width-1) print "sum = %f # %d,%d (%d)" % (val, row, column, i) print d[row:row+2,column:column+2], "\n" Output without overlapping squares. It seems that the same areas are selected as in your example. Some comments The tricky part is to calculate sums of all 2x2 squares. I assumed you need all of them, so there might be some overlapping. I used slices to cut the first/last columns and rows from the original 2D array, and then overlapping them all together and calculating sums. To understand it better, imaging a 3x3 array: >>> a = arange(9).reshape(3,3) ; a array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) Then you can take its slices: >>> a[:-1,:-1] array([[0, 1], [3, 4]]) >>> a[1:,:-1] array([[3, 4], [6, 7]]) >>> a[:-1,1:] array([[1, 2], [4, 5]]) >>> a[1:,1:] array([[4, 5], [7, 8]]) Now imagine you stack them one above the other and sum elements at the same positions. These sums will be exactly the same sums over the 2x2 squares with the top-left corner in the same position: >>> sums = a[:-1,:-1] + a[1:,:-1] + a[:-1,1:] + a[1:,1:]; sums array([[ 8, 12], [20, 24]]) When you have the sums over 2x2 squares, you can use max to find the maximum, or sort, or sorted to find the peaks. To remember positions of the peaks I couple every value (the sum) with its ordinal position in a flattened array (see zip). Then I calculate row/column position again when I print the results. Notes I allowed for the 2x2 squares to overlap. Edited version filters out some of them such that only non-overlapping squares appear in the results. Choosing fingers (an idea) Another problem is how to choose what is likely to be fingers out of all the peaks. I have an idea which may or may not work. I don't have time to implement it right now, so just pseudo-code. I noticed that if the front fingers stay on almost a perfect circle, the rear finger should be inside of that circle. Also, the front fingers are more or less equally spaced. We may try to use these heuristic properties to detect the fingers. Pseudo code: select the top N finger candidates (not too many, 10 or 12) consider all possible combinations of 5 out of N (use itertools.combinations) for each combination of 5 fingers: for each finger out of 5: fit the best circle to the remaining 4 => position of the center, radius check if the selected finger is inside of the circle check if the remaining four are evenly spread (for example, consider angles from the center of the circle) assign some cost (penalty) to this selection of 4 peaks + a rear finger (consider, probably weighted: circle fitting error, if the rear finger is inside, variance in the spreading of the front fingers, total intensity of 5 peaks) choose a combination of 4 peaks + a rear peak with the lowest penalty This is a brute-force approach. If N is relatively small, then I think it is doable. For N=12, there are C_12^5 = 792 combinations, times 5 ways to select a rear finger, so 3960 cases to evaluate for every paw.
This is an image registration problem. The general strategy is: Have a known example, or some kind of prior on the data. Fit your data to the example, or fit the example to your data. It helps if your data is roughly aligned in the first place. Here's a rough and ready approach, "the dumbest thing that could possibly work": Start with five toe coordinates in roughly the place you expect. With each one, iteratively climb to the top of the hill. i.e. given current position, move to maximum neighbouring pixel, if its value is greater than current pixel. Stop when your toe coordinates have stopped moving. To counteract the orientation problem, you could have 8 or so initial settings for the basic directions (North, North East, etc). Run each one individually and throw away any results where two or more toes end up at the same pixel. I'll think about this some more, but this kind of thing is still being researched in image processing - there are no right answers! Slightly more complex idea: (weighted) K-means clustering. It's not that bad. Start with five toe coordinates, but now these are "cluster centres". Then iterate until convergence: Assign each pixel to the closest cluster (just make a list for each cluster). Calculate the center of mass of each cluster. For each cluster, this is: Sum(coordinate * intensity value)/Sum(coordinate) Move each cluster to the new centre of mass. This method will almost certainly give much better results, and you get the mass of each cluster which may help in identifying the toes. (Again, you've specified the number of clusters up front. With clustering you have to specify the density one way or another: Either choose the number of clusters, appropriate in this case, or choose a cluster radius and see how many you end up with. An example of the latter is mean-shift.) Sorry about the lack of implementation details or other specifics. I would code this up but I've got a deadline. If nothing else has worked by next week let me know and I'll give it a shot.
Using persistent homology to analyze your data set I get the following result (click to enlarge): This is the 2D-version of the peak detection method described in this SO answer. The above figure simply shows 0-dimensional persistent homology classes sorted by persistence. I did upscale the original dataset by a factor of 2 using scipy.misc.imresize(). However, note that I did consider the four paws as one dataset; splitting it into four would make the problem easier. Methodology. The idea behind this quite simple: Consider the function graph of the function that assigns each pixel its level. It looks like this: Now consider a water level at height 255 that continuously descents to lower levels. At local maxima islands pop up (birth). At saddle points two islands merge; we consider the lower island to be merged to the higher island (death). The so-called persistence diagram (of the 0-th dimensional homology classes, our islands) depicts death- over birth-values of all islands: The persistence of an island is then the difference between the birth- and death-level; the vertical distance of a dot to the grey main diagonal. The figure labels the islands by decreasing persistence. The very first picture shows the locations of births of the islands. This method not only gives the local maxima but also quantifies their "significance" by the above mentioned persistence. One would then filter out all islands with a too low persistence. However, in your example every island (i.e., every local maximum) is a peak you look for. Python code can be found here.
This problem has been studied in some depth by physicists. There is a good implementation in ROOT. Look at the TSpectrum classes (especially TSpectrum2 for your case) and the documentation for them. References: M.Morhac et al.: Background elimination methods for multidimensional coincidence gamma-ray spectra. Nuclear Instruments and Methods in Physics Research A 401 (1997) 113-132. M.Morhac et al.: Efficient one- and two-dimensional Gold deconvolution and its application to gamma-ray spectra decomposition. Nuclear Instruments and Methods in Physics Research A 401 (1997) 385-408. M.Morhac et al.: Identification of peaks in multidimensional coincidence gamma-ray spectra. Nuclear Instruments and Methods in Research Physics A 443(2000), 108-125. ...and for those who don't have access to a subscription to NIM: Spectrum.doc SpectrumDec.ps.gz SpectrumSrc.ps.gz SpectrumBck.ps.gz
I'm sure you have enough to go on by now, but I can't help but suggest using the k-means clustering method. k-means is an unsupervised clustering algorithm which will take you data (in any number of dimensions - I happen to do this in 3D) and arrange it into k clusters with distinct boundaries. It's nice here because you know exactly how many toes these canines (should) have. Additionally, it's implemented in Scipy which is really nice (http://docs.scipy.org/doc/scipy/reference/cluster.vq.html). Here's an example of what it can do to spatially resolve 3D clusters: What you want to do is a bit different (2D and includes pressure values), but I still think you could give it a shot.
Here is an idea: you calculate the (discrete) Laplacian of the image. I would expect it to be (negative and) large at maxima, in a way that is more dramatic than in the original images. Thus, maxima could be easier to find. Here is another idea: if you know the typical size of the high-pressure spots, you can first smooth your image by convoluting it with a Gaussian of the same size. This may give you simpler images to process.
Just a couple of ideas off the top of my head: take the gradient (derivative) of the scan, see if that eliminates the false calls take the maximum of the local maxima You might also want to take a look at OpenCV, it's got a fairly decent Python API and might have some functions you'd find useful.
thanks for the raw data. I'm on the train and this is as far as I've gotten (my stop is coming up). I massaged your txt file with regexps and have plopped it into a html page with some javascript for visualization. I'm sharing it here because some, like myself, might find it more readily hackable than python. I think a good approach will be scale and rotation invariant, and my next step will be to investigate mixtures of gaussians. (each paw pad being the center of a gaussian). <html> <head> <script type="text/javascript" src="http://vis.stanford.edu/protovis/protovis-r3.2.js"></script> <script type="text/javascript"> var heatmap = [[[0,0,0,0,0,0,0,4,4,0,0,0,0], [0,0,0,0,0,7,14,22,18,7,0,0,0], [0,0,0,0,11,40,65,43,18,7,0,0,0], [0,0,0,0,14,61,72,32,7,4,11,14,4], [0,7,14,11,7,22,25,11,4,14,65,72,14], [4,29,79,54,14,7,4,11,18,29,79,83,18], [0,18,54,32,18,43,36,29,61,76,25,18,4], [0,4,7,7,25,90,79,36,79,90,22,0,0], [0,0,0,0,11,47,40,14,29,36,7,0,0], [0,0,0,0,4,7,7,4,4,4,0,0,0] ],[ [0,0,0,4,4,0,0,0,0,0,0,0,0], [0,0,11,18,18,7,0,0,0,0,0,0,0], [0,4,29,47,29,7,0,4,4,0,0,0,0], [0,0,11,29,29,7,7,22,25,7,0,0,0], [0,0,0,4,4,4,14,61,83,22,0,0,0], [4,7,4,4,4,4,14,32,25,7,0,0,0], [4,11,7,14,25,25,47,79,32,4,0,0,0], [0,4,4,22,58,40,29,86,36,4,0,0,0], [0,0,0,7,18,14,7,18,7,0,0,0,0], [0,0,0,0,4,4,0,0,0,0,0,0,0], ],[ [0,0,0,4,11,11,7,4,0,0,0,0,0], [0,0,0,4,22,36,32,22,11,4,0,0,0], [4,11,7,4,11,29,54,50,22,4,0,0,0], [11,58,43,11,4,11,25,22,11,11,18,7,0], [11,50,43,18,11,4,4,7,18,61,86,29,4], [0,11,18,54,58,25,32,50,32,47,54,14,0], [0,0,14,72,76,40,86,101,32,11,7,4,0], [0,0,4,22,22,18,47,65,18,0,0,0,0], [0,0,0,0,4,4,7,11,4,0,0,0,0], ],[ [0,0,0,0,4,4,4,0,0,0,0,0,0], [0,0,0,4,14,14,18,7,0,0,0,0,0], [0,0,0,4,14,40,54,22,4,0,0,0,0], [0,7,11,4,11,32,36,11,0,0,0,0,0], [4,29,36,11,4,7,7,4,4,0,0,0,0], [4,25,32,18,7,4,4,4,14,7,0,0,0], [0,7,36,58,29,14,22,14,18,11,0,0,0], [0,11,50,68,32,40,61,18,4,4,0,0,0], [0,4,11,18,18,43,32,7,0,0,0,0,0], [0,0,0,0,4,7,4,0,0,0,0,0,0], ],[ [0,0,0,0,0,0,4,7,4,0,0,0,0], [0,0,0,0,4,18,25,32,25,7,0,0,0], [0,0,0,4,18,65,68,29,11,0,0,0,0], [0,4,4,4,18,65,54,18,4,7,14,11,0], [4,22,36,14,4,14,11,7,7,29,79,47,7], [7,54,76,36,18,14,11,36,40,32,72,36,4], [4,11,18,18,61,79,36,54,97,40,14,7,0], [0,0,0,11,58,101,40,47,108,50,7,0,0], [0,0,0,4,11,25,7,11,22,11,0,0,0], [0,0,0,0,0,4,0,0,0,0,0,0,0], ],[ [0,0,4,7,4,0,0,0,0,0,0,0,0], [0,0,11,22,14,4,0,4,0,0,0,0,0], [0,0,7,18,14,4,4,14,18,4,0,0,0], [0,4,0,4,4,0,4,32,54,18,0,0,0], [4,11,7,4,7,7,18,29,22,4,0,0,0], [7,18,7,22,40,25,50,76,25,4,0,0,0], [0,4,4,22,61,32,25,54,18,0,0,0,0], [0,0,0,4,11,7,4,11,4,0,0,0,0], ],[ [0,0,0,0,7,14,11,4,0,0,0,0,0], [0,0,0,4,18,43,50,32,14,4,0,0,0], [0,4,11,4,7,29,61,65,43,11,0,0,0], [4,18,54,25,7,11,32,40,25,7,11,4,0], [4,36,86,40,11,7,7,7,7,25,58,25,4], [0,7,18,25,65,40,18,25,22,22,47,18,0], [0,0,4,32,79,47,43,86,54,11,7,4,0], [0,0,0,14,32,14,25,61,40,7,0,0,0], [0,0,0,0,4,4,4,11,7,0,0,0,0], ],[ [0,0,0,0,4,7,11,4,0,0,0,0,0], [0,4,4,0,4,11,18,11,0,0,0,0,0], [4,11,11,4,0,4,4,4,0,0,0,0,0], [4,18,14,7,4,0,0,4,7,7,0,0,0], [0,7,18,29,14,11,11,7,18,18,4,0,0], [0,11,43,50,29,43,40,11,4,4,0,0,0], [0,4,18,25,22,54,40,7,0,0,0,0,0], [0,0,4,4,4,11,7,0,0,0,0,0,0], ],[ [0,0,0,0,0,7,7,7,7,0,0,0,0], [0,0,0,0,7,32,32,18,4,0,0,0,0], [0,0,0,0,11,54,40,14,4,4,22,11,0], [0,7,14,11,4,14,11,4,4,25,94,50,7], [4,25,65,43,11,7,4,7,22,25,54,36,7], [0,7,25,22,29,58,32,25,72,61,14,7,0], [0,0,4,4,40,115,68,29,83,72,11,0,0], [0,0,0,0,11,29,18,7,18,14,4,0,0], [0,0,0,0,0,4,0,0,0,0,0,0,0], ] ]; </script> </head> <body> <script type="text/javascript+protovis"> for (var a=0; a < heatmap.length; a++) { var w = heatmap[a][0].length, h = heatmap[a].length; var vis = new pv.Panel() .width(w * 6) .height(h * 6) .strokeStyle("#aaa") .lineWidth(4) .antialias(true); vis.add(pv.Image) .imageWidth(w) .imageHeight(h) .image(pv.Scale.linear() .domain(0, 99, 100) .range("#000", "#fff", '#ff0a0a') .by(function(i, j) heatmap[a][j][i])); vis.render(); } </script> </body> </html>
Physicist's solution: Define 5 paw-markers identified by their positions X_i and init them with random positions. Define some energy function combining some award for location of markers in paws' positions with some punishment for overlap of markers; let's say: E(X_i;S)=-Sum_i(S(X_i))+alfa*Sum_ij (|X_i-Xj|<=2*sqrt(2)?1:0) (S(X_i) is the mean force in 2x2 square around X_i, alfa is a parameter to be peaked experimentally) Now time to do some Metropolis-Hastings magic: 1. Select random marker and move it by one pixel in random direction. 2. Calculate dE, the difference of energy this move caused. 3. Get an uniform random number from 0-1 and call it r. 4. If dE<0 or exp(-beta*dE)>r, accept the move and go to 1; if not, undo the move and go to 1. This should be repeated until the markers will converge to paws. Beta controls the scanning to optimizing tradeoff, so it should be also optimized experimentally; it can be also constantly increased with the time of simulation (simulated annealing).
Just wanna tell you guys there is a nice option to find local maxima in images with python: from skimage.feature import peak_local_max or for skimage 0.8.0: from skimage.feature.peak import peak_local_max http://scikit-image.org/docs/0.8.0/api/skimage.feature.peak.html
It's probably worth to try with neural networks if you are able to create some training data... but this needs many samples annotated by hand.
Heres another approach that I used when doing something similar for a large telescope: 1) Search for the highest pixel. Once you have that, search around that for the best fit for 2x2 (maybe maximizing the 2x2 sum), or do a 2d gaussian fit inside the sub region of say 4x4 centered on the highest pixel. Then set those 2x2 pixels you have found to zero (or maybe 3x3) around the peak center go back to 1) and repeat till the highest peak falls below a noise threshold, or you have all the toes you need
a rough outline... you'd probably want to use a connected components algorithm to isolate each paw region. wiki has a decent description of this (with some code) here: http://en.wikipedia.org/wiki/Connected_Component_Labeling you'll have to make a decision about whether to use 4 or 8 connectedness. personally, for most problems i prefer 6-connectedness. anyway, once you've separated out each "paw print" as a connected region, it should be easy enough to iterate through the region and find the maxima. once you've found the maxima, you could iteratively enlarge the region until you reach a predetermined threshold in order to identify it as a given "toe". one subtle problem here is that as soon as you start using computer vision techniques to identify something as a right/left/front/rear paw and you start looking at individual toes, you have to start taking rotations, skews, and translations into account. this is accomplished through the analysis of so-called "moments". there are a few different moments to consider in vision applications: central moments: translation invariant normalized moments: scaling and translation invariant hu moments: translation, scale, and rotation invariant more information about moments can be found by searching "image moments" on wiki.
Perhaps you can use something like Gaussian Mixture Models. Here's a Python package for doing GMMs (just did a Google search) http://www.ar.media.kyoto-u.ac.jp/members/david/softwares/em/
Interesting problem. The solution I would try is the following. Apply a low pass filter, such as convolution with a 2D gaussian mask. This will give you a bunch of (probably, but not necessarily floating point) values. Perform a 2D non-maximal suppression using the known approximate radius of each paw pad (or toe). This should give you the maximal positions without having multiple candidates which are close together. Just to clarify, the radius of the mask in step 1 should also be similar to the radius used in step 2. This radius could be selectable, or the vet could explicitly measure it beforehand (it will vary with age/breed/etc). Some of the solutions suggested (mean shift, neural nets, and so on) probably will work to some degree, but are overly complicated and probably not ideal.
It seems you can cheat a bit using jetxee's algorithm. He is finding the first three toes fine, and you should be able to guess where the fourth is based off that.
Well, here's some simple and not terribly efficient code, but for this size of a data set it is fine. import numpy as np grid = np.array([[0,0,0,0,0,0,0,0,0,0,0,0,0,0], [0,0,0,0,0,0,0,0,0.4,0.4,0.4,0,0,0], [0,0,0,0,0.4,1.4,1.4,1.8,0.7,0,0,0,0,0], [0,0,0,0,0.4,1.4,4,5.4,2.2,0.4,0,0,0,0], [0,0,0.7,1.1,0.4,1.1,3.2,3.6,1.1,0,0,0,0,0], [0,0.4,2.9,3.6,1.1,0.4,0.7,0.7,0.4,0.4,0,0,0,0], [0,0.4,2.5,3.2,1.8,0.7,0.4,0.4,0.4,1.4,0.7,0,0,0], [0,0,0.7,3.6,5.8,2.9,1.4,2.2,1.4,1.8,1.1,0,0,0], [0,0,1.1,5,6.8,3.2,4,6.1,1.8,0.4,0.4,0,0,0], [0,0,0.4,1.1,1.8,1.8,4.3,3.2,0.7,0,0,0,0,0], [0,0,0,0,0,0.4,0.7,0.4,0,0,0,0,0,0]]) arr = [] for i in xrange(grid.shape[0] - 1): for j in xrange(grid.shape[1] - 1): tot = grid[i][j] + grid[i+1][j] + grid[i][j+1] + grid[i+1][j+1] arr.append([(i,j),tot]) best = [] arr.sort(key = lambda x: x[1]) for i in xrange(5): best.append(arr.pop()) badpos = set([(best[-1][0][0]+x,best[-1][0][1]+y) for x in [-1,0,1] for y in [-1,0,1] if x != 0 or y != 0]) for j in xrange(len(arr)-1,-1,-1): if arr[j][0] in badpos: arr.pop(j) for item in best: print grid[item[0][0]:item[0][0]+2,item[0][1]:item[0][1]+2] I basically just make an array with the position of the upper-left and the sum of each 2x2 square and sort it by the sum. I then take the 2x2 square with the highest sum out of contention, put it in the best array, and remove all other 2x2 squares that used any part of this just removed 2x2 square. It seems to work fine except with the last paw (the one with the smallest sum on the far right in your first picture), it turns out that there are two other eligible 2x2 squares with a larger sum (and they have an equal sum to each other). One of them is still selects one square from your 2x2 square, but the other is off to the left. Fortunately, by luck we see to be choosing more of the one that you would want, but this may require some other ideas to be used to get what you actually want all of the time.
I am not sure this answers the question, but it seems like you can just look for the n highest peaks that don't have neighbors. Here is the gist. Note that it's in Ruby, but the idea should be clear. require 'pp' NUM_PEAKS = 5 NEIGHBOR_DISTANCE = 1 data = [[1,2,3,4,5], [2,6,4,4,6], [3,6,7,4,3], ] def tuples(matrix) tuples = [] matrix.each_with_index { |row, ri| row.each_with_index { |value, ci| tuples << [value, ri, ci] } } tuples end def neighbor?(t1, t2, distance = 1) [1,2].each { |axis| return false if (t1[axis] - t2[axis]).abs > distance } true end # convert the matrix into a sorted list of tuples (value, row, col), highest peaks first sorted = tuples(data).sort_by { |tuple| tuple.first }.reverse # the list of peaks that don't have neighbors non_neighboring_peaks = [] sorted.each { |candidate| # always take the highest peak if non_neighboring_peaks.empty? non_neighboring_peaks << candidate puts "took the first peak: #{candidate}" else # check that this candidate doesn't have any accepted neighbors is_ok = true non_neighboring_peaks.each { |accepted| if neighbor?(candidate, accepted, NEIGHBOR_DISTANCE) is_ok = false break end } if is_ok non_neighboring_peaks << candidate puts "took #{candidate}" else puts "denied #{candidate}" end end } pp non_neighboring_peaks
Maybe a naive approach is sufficient here: Build a list of all 2x2 squares on your plane, order them by their sum (in descending order). First, select the highest-valued square into your "paw list". Then, iteratively pick 4 of the next-best squares that don't intersect with any of the previously found squares.
What if you proceed step by step: you first locate the global maximum, process if needed the surrounding points given their value, then set the found region to zero, and repeat for the next one.