Calculating the intersection area between two rectangles with axes not aligned - python

I want to calculate the intersection over union IoU between two rectangles with axes not aligned, but with an angle of the axes smaller than 30 degrees. An approximate value is also seeked.
One possible solution is to check if the angle between the two rectangles is less than 30 degree and than rotate them parallel to aligne the axis. From here it is easy to calculate the IoU.
Another possibility is to use monte carlo methods for the intersection ( generate a point, find if the point is under some line of one rectangle and above some line of the other), but this seems expensive because I need to use this calculation a large number of times.
I was hopping that there is something better out there; maybe a geometry library, or maybe an algorithm from the computer vision folks.
I am trying to learn grasping positions using deep neural networks. My algorithem should predict a bounding box (rectangle) for an object in an rgb image. For any image I have also the ground truth (another rectangle) bounding box. From this two rectangles I need the IoU.
Any idea?

Since you're working in Python, I think the Shapely package would serve your needs.

There is quite effective algorithm for calculation of intersection between two convex polygons, described in O'Rourke book "Computational Geometry in C".
C code is available at the book page (convconv).
Algorithm traverse edges of the first polygon, checking orientations of the second polygon vertices in order to detect intersections. When two consequent vertices lie on the different sides of the edge, intersection occurs (there is a lot of trick cases). Algorithm outline is here

You can consider a number of numerical approaches, practically "rendering" the rectangles into some "canvas"/canvases, and traverse the pixels for making your statistics. The size of the canvas should be the size of the bounding box for the entire scene, practically you can find that via picking the minimum and maximum coordinates occurring for each axis.
1) "most CG" approach: really get a rendering library, render one rectangle with red, other rectangle with transparent blue. Then visit each pixel and if it has a non-0 red component, it belongs to the first rectangle, if it has a non-0 blue component, it belongs to the second rectangle. And if it has both, it belongs to the intersection too. This approach is cheap for coding, but requires both writing and reading the canvas even in the rendering phase, which is slower than just writing. This might be even done on GPU too, though I am not sure if setup costs and getting back the result do not weight out the benefit for such a simple scene.
2) another CG-approach would be rendering into 2 arrays, preferably some 1-byte-per-pixel variant, for the sake of speed (you may have to go back in time a bit in order to find such dedicated rendering libraries). This way the renderer only writes, into one array per rectangle, and you read from two when creating the statistics
3) as writing and reading pixels take time, you can do some shortcut, but it needs more coding: convex shapes can be rendered via collecting the minimum and maximum coordinates per scanline, and just filling between the two. If you do it yourself, you can spare the filling part and also the read-and-check-every-pixel step at the end. Build such min-max list for both rectangles, and then you "just" have to check their relation/order for each scanline, to recognize overlaps
And then there is the mathematical way: this is not really useful, see EDIT below while it is unlikely that you would find some sane algorithm for calculating intersection area, specifically for the case of rectangles, if you find such algorithm for triangles, which is more probable, that would be enough. Both rectangles can be split into two triangles, 1A+1B and 2A+2B respectively, and then you just have to run such algorithm 4 times: 1A-2A, 1A-2B, 1B-2A, 1B-2B, sum the results and that is the area of your intersection.
EDIT: for the maths approach (though this also comes from graphics), I think https://en.wikipedia.org/wiki/Sutherland%E2%80%93Hodgman_algorithm can be applied here (as both rectangles are convex polygons, A-B and B-A should produce the same result) for finding the intersection polygon, and then the remaining task is to calculate the area of that polygon (here and now I think it is going to be convex, and then it is really easy).

I ended up using Sutherland-Hodgman algorithm implemented as this functions:
def clip(subjectPolygon, clipPolygon):
def inside(p):
return(cp2[0]-cp1[0])*(p[1]-cp1[1]) > (cp2[1]-cp1[1])*(p[0]-cp1[0])
def computeIntersection():
dc = [ cp1[0] - cp2[0], cp1[1] - cp2[1] ]
dp = [ s[0] - e[0], s[1] - e[1] ]
n1 = cp1[0] * cp2[1] - cp1[1] * cp2[0]
n2 = s[0] * e[1] - s[1] * e[0]
n3 = 1.0 / (dc[0] * dp[1] - dc[1] * dp[0])
return [(n1*dp[0] - n2*dc[0]) * n3, (n1*dp[1] - n2*dc[1]) * n3]
outputList = subjectPolygon
cp1 = clipPolygon[-1]
for clipVertex in clipPolygon:
cp2 = clipVertex
inputList = outputList
outputList = []
s = inputList[-1]
for subjectVertex in inputList:
e = subjectVertex
if inside(e):
if not inside(s):
outputList.append(computeIntersection())
outputList.append(e)
elif inside(s):
outputList.append(computeIntersection())
s = e
cp1 = cp2
return(outputList)
def PolygonArea(corners):
n = len(corners) # of corners
area = 0.0
for i in range(n):
j = (i + 1) % n
area += corners[i][0] * corners[j][1]
area -= corners[j][0] * corners[i][1]
area = abs(area) / 2.0
return area
intersection = clip(rec1, rec2)
intersection_area = PolygonArea(intersection)
iou = intersection_area/(PolygonArea(rec1)+PolygonArea(rec2)-intersection_area)
Another slower method (don't know what algorithm) could be:
from shapely.geometry import Polygon
p1 = Polygon(rec1)
p2 = Polygon(rec2)
inter_sec_area = p1.intersection(rec2).area
iou = inter_sec_area/(p1.area + p2.area - inter_sec_area)
It is worth mentioning that in just one case with bigger polygons (not my case) the shapely module had an area twice greater than the first method. I didn't test both methods intensively.

This might help
What about using Pythagorean theorem ? Since you have two rectangles, when they intersect, you will have one or more triangles, each with one angle of 90°.
Since you also know the angle between the two rectangles (20° in my example), and the coordinates of each rectangle, you can use the the appropriate function (cos/sin/tan) to know the length of all the edges of the triangles.
I hope this can help

Related

Algorithm to check if cylinders are overlapping in 3D

I am creating a script to generate cylinders in a 3D space, however, I would like for them to not occupy the same region in space (avoid overlapping).
The cylinders are defined by a start and end point, and all have a fixed radius.
I am storing the existing cylinder in an array called listOfCylinders which is an nDim array of shape (nCylinders, 2Points [start, end], {x,y,z} coordinates of each point)
I was able to cook up:
def detect_overlap(new_start, new_end, listOfCylinders):
starts = listOfCylinders[:, 0]
ends = listOfCylinders[:, 1]
radius = 0.1
# Calculate the distance between the new cylinder and all the existing cylinders
dists = np.linalg.norm(np.cross(new_end - new_start, starts - new_start), axis=1) / np.linalg.norm(new_end - new_start)
# Check if any of the distances are less than the sum of the radii
if np.any(dists < (2*radius)):
return True
# If no overlap or intersection is found, return False
return False
But this is not accountting for situations where there is lateral overlaping.
Does anyone have a good algorithm for this?
Best Regards
WLOG one of the cylinders is vertical (otherwise rotate space). If you look at the projections of the apparent outline onto XY, you see a circle and a rectangle ended with ellipses. (For simplicity of the equations, you can also make the second cylindre parallel to XZ.)
If these 2D shapes do not overlap, your are done. Anyway, the intersection of a circle and an ellipse leads to a quartic equation.
You can repeat this process, exchanging the roles of the two cylinders. This gives a sufficient condition of non-overlap. Unfortunately, I am not sure it is necessary, though there is a direct connection to the plane separation theorem.
For a numerical approach, you can proceed as follows:
move the cylindre in the canonical position;
generate rectangles on the oblique cylindre, by rotation around the axis and using an angular parameter;
for all sides of the rectangles, detect interference with the cylindre (this involves a system of a quadratic inequation and two linear ones, which is quite tractable);
sample the angular parameter densely enough to check for no valid intersection.
I guess that a complete analytical solution is possible, but complex, and might anyway lead to equations that need to be solved numerically.

Implementing collisions using the Separating axis theorem and Pygame [duplicate]

This is what I am currently doing:
Creating 4 axis that are perpendicular to 4 edges of 2 rectangles. Since they are rectangles I do not need to generate an axis (normal) per edge.
I then loop over my 4 axes.
So for each axis:
I get the projection of every corner of a rectangle on to the axis.
There are 2 lists (arrays) containing those projections. One for each rectangle.
I then get the dot product of each projection and the axis. This returns a scalar value
that can be used to to determine the min and max.
Now the 2 lists contain scalars and not vectors. I sort the lists so I can easily select the min and max values. If the min of box B >= the max of box A OR the max of box B <= the min of box A then there is no collision on that axis and no collision between the objects.
At this point the function finishes and the loop breaks.
If those conditions are never met for all the axis then we have a collision
I hope this was the correct way of doing it.
The python code itself can be found here http://pastebin.com/vNFP3mAb
Also:
http://www.gamedev.net/page/reference/index.html/_/reference/programming/game-programming/collision-detection/2d-rotated-rectangle-collision-r2604
The problem i was having is that the code above does not work. It always detects a a collision even where there is not a collision. What i typed out is exactly what the code is doing. If I am missing any steps or just not understanding how SAT works please let me know.
In general it is necessary to carry out the steps outlined in the Question to determine if the rectangles "collide" (intersect), noting as the OP does that we can break (with a conclusion of non-intersection) as soon as a separating axis is found.
There are a couple of simple ways to "optimize" in the sense of providing chances for earlier exits. The practical value of these depends on the distribution of rectangles being checked, but both are easily incorporated in the existing framework.
(1) Bounding Circle Check
One quick way to prove non-intersection is by showing the bounding circles of the two rectangles do not intersect. The bounding circle of a rectangle shares its center, the midpoint of either diagonal, and has diameter equal to the length of either diagonal. If the distance between the two centers exceeds the sum of the two circles' radii, then the circles do not intersect. Thus the rectangles also cannot intersect. If the purpose was to find an axis of separation, we haven't accomplished that yet. However if we only want to know if the rectangles "collide", this allows an early exit.
(2) Vertex of one rectangle inside the other
The projection of a vertex of one rectangle on axes parallel to the other rectangle's edges provides enough information to detect when that vertex is inside the other rectangle. This check is especially easy when the latter rectangle has been translated and unrotated to the origin (with edges parallel to the ordinary axes). If it happens that a vertex of one rectangle is inside the other, the rectangles obviously intersect. Of course this is a sufficient condition for intersection, not a necessary one. But it allows for an early exit with a conclusion of intersection (and of course without finding an axis of separation because none will exist).
I see two things wrong. First, the projection should simply be the dot product of a vertex with the axis. What you're doing is way too complicated. Second, the way you get your axis is incorrect. You write:
Axis1 = [ -(A_TR[0] - A_TL[0]),
A_TR[1] - A_TL[1] ]
Where it should read:
Axis1 = [ -(A_TR[1] - A_TL[1]),
A_TR[0] - A_TL[0] ]
The difference is coordinates does give you a vector, but to get the perpendicular you need to exchange the x and y values and negate one of them.
Hope that helps.
EDIT Found another bug
In this code:
if not ( B_Scalars[0] <= A_Scalars[3] or B_Scalars[3] >= A_Scalars[0] ):
#no overlap so no collision
return 0
That should read:
if not ( B_Scalars[3] <= A_Scalars[0] or A_Scalars[3] <= B_Scalars[0] ):
Sort gives you a list increasing in value. So [1,2,3,4] and [10,11,12,13] do not overlap because the minimum of the later is greater than the maximum of the former. The second comparison is for when the input sets are swapped.

Merging image regions (bboxes) in linear time

I have a set of regions (bounding boxes) for some image, example python code:
im = Image.open("single.png")
pix = np.array(im)
gray = rgb2grey(pix)
thresh = threshold_otsu(gray)
bw = closing(gray > thresh, square(1))
cleared = bw.copy()
clear_border(cleared)
borders = np.logical_xor(bw, cleared)
label_image = label(borders)
for region in regionprops(label_image, ['Area', 'BoundingBox']):
#now i have bounding boxes in hand
What I would like to do is to merge regions which overlap or the distance between bbox edges is less than X. Naive approach would be checking distances between all regions, which has O(n2) complexity. I can write something smarter but I have impression that this kind of algorithm already exists and I don't want to reinvent the wheel. Any help is appreciated.
Is this your question "There is n boxes (not necessarily // to x-y axis), and you want to find all overlapping boxes and merge them if they exist?"
I cannot think of a linear algorithm yet but I have a rough idea faster than O(n^2), maybe O(n lg n) describes as follow:
Give each box an id, also for each edge, mark it's belonging box
Use sweeping line algorithm to find all intersections
In the sweeping line algorithm, once an intersection is reported, you know which 2 boxes are overlapping, use something like disjoint-set to group them.
Lastly linearly scan the disjoint-set, for each set, keep updating the leftmost, rightmost, topmost, bottommost point for making a larger box to bound them all
(merging done here, note that if a box has no overlapping with others, the set will only contain itself)
I hope this method will work and it should be faster than O(n^2), but even if it does works, it still have some problems at step 4, where the larger merged box must be // to x-y axis, which is not a must.
Edit: Sorry I just go through OP again, and understand the above solution does not solve the "merge boxes with distance < x", it even only solves partly of the overlapping boxes problem.
Moreover, the merging box procedure is not a 1-pass job, it is kind of recursive, for example a box A and box B merged become box C, then box C may overlap / distance < x with box D..and so on.
To solve this task in linear time is quite impossible for me, as pre-computing the distance between all pair-wise box is already hardly be done in O(n)...

How do I calculate a 3D centroid?

Is there even such a thing as a 3D centroid? Let me be perfectly clear—I've been reading and reading about centroids for the last 2 days both on this site and across the web, so I'm perfectly aware at the existing posts on the topic, including Wikipedia.
That said, let me explain what I'm trying to do. Basically, I want to take a selection of edges and/or vertices, but NOT faces. Then, I want to place an object at the 3D centroid position.
I'll tell you what I don't want:
The vertices average, which would pull too far in any direction that has a more high-detailed mesh.
The bounding box center, because I already have something working for this scenario.
I'm open to suggestions about center of mass, but I don't see how this would work, because vertices or edges alone don't define any sort of mass, especially when I just have an edge loop selected.
For kicks, I'll show you some PyMEL that I worked up, using #Emile's code as reference, but I don't think it's working the way it should:
from pymel.core import ls, spaceLocator
from pymel.core.datatypes import Vector
from pymel.core.nodetypes import NurbsCurve
def get_centroid(node):
if not isinstance(node, NurbsCurve):
raise TypeError("Requires NurbsCurve.")
centroid = Vector(0, 0, 0)
signed_area = 0.0
cvs = node.getCVs(space='world')
v0 = cvs[len(cvs) - 1]
for i, cv in enumerate(cvs[:-1]):
v1 = cv
a = v0.x * v1.y - v1.x * v0.y
signed_area += a
centroid += sum([v0, v1]) * a
v0 = v1
signed_area *= 0.5
centroid /= 6 * signed_area
return centroid
texas = ls(selection=True)[0]
centroid = get_centroid(texas)
print(centroid)
spaceLocator(position=centroid)
In theory centroid = SUM(pos*volume)/SUM(volume) when you split the part into finite volumes each with a location pos and volume value volume.
This is precisely the calculation done for finding the center of gravity of a composite part.
There is not just a 3D centroid, there is an n-dimensional centroid, and the formula for it is given in the "By integral formula" section of the Wikipedia article you cite.
Perhaps you are having trouble setting up this integral? You have not defined your shape.
[Edit] I'll beef up this answer in response to your comment. Since you have described your shape in terms of edges and vertices, then I'll assume it is a polyhedron. You can partition a polyedron into pyramids, find the centroids of the pyramids, and then the centroid of your shape is the centroid of the centroids (this last calculation is done using ja72's formula).
I'll assume your shape is convex (no hollow parts---if this is not the case then break it into convex chunks). You can partition it into pyramids (triangulate it) by picking a point in the interior and drawing edges to the vertices. Then each face of your shape is the base of a pyramid. There are formulas for the centroid of a pyramid (you can look this up, it's 1/4 the way from the centroid of the face to your interior point). Then as was said, the centroid of your shape is the centroid of the centroids---ja72's finite calculation, not an integral---as given in the other answer.
This is the same algorithm as in Hugh Bothwell's answer, however I believe that 1/4 is correct instead of 1/3. Perhaps you can find some code for it lurking around somewhere using the search terms in this description.
I like the question. Centre of mass sounds right, but the question then becomes, what mass for each vertex?
Why not use the average length of each edge that includes the vertex? This should compensate nicely areas with a dense mesh.
You will have to recreate face information from the vertices (essentially a Delauney triangulation).
If your vertices define a convex hull, you can pick any arbitrary point A inside the object. Treat your object as a collection of pyramidal prisms having apex A and each face as a base.
For each face, find the area Fa and the 2d centroid Fc; then the prism's mass is proportional to the volume (== 1/3 base * height (component of Fc-A perpendicular to the face)) and you can disregard the constant of proportionality so long as you do the same for all prisms; the center of mass is (2/3 A + 1/3 Fc), or a third of the way from the apex to the 2d centroid of the base.
You can then do a mass-weighted average of the center-of-mass points to find the 3d centroid of the object as a whole.
The same process should work for non-convex hulls - or even for A outside the hull - but the face-calculation may be a problem; you will need to be careful about the handedness of your faces.

Peak detection in a 2D array

I'm helping a veterinary clinic measuring pressure under a dogs paw. I use Python for my data analysis and now I'm stuck trying to divide the paws into (anatomical) subregions.
I made a 2D array of each paw, that consists of the maximal values for each sensor that has been loaded by the paw over time. Here's an example of one paw, where I used Excel to draw the areas I want to 'detect'. These are 2 by 2 boxes around the sensor with local maxima's, that together have the largest sum.
So I tried some experimenting and decide to simply look for the maximums of each column and row (can't look in one direction due to the shape of the paw). This seems to 'detect' the location of the separate toes fairly well, but it also marks neighboring sensors.
So what would be the best way to tell Python which of these maximums are the ones I want?
Note: The 2x2 squares can't overlap, since they have to be separate toes!
Also I took 2x2 as a convenience, any more advanced solution is welcome, but I'm simply a human movement scientist, so I'm neither a real programmer or a mathematician, so please keep it 'simple'.
Here's a version that can be loaded with np.loadtxt
Results
So I tried #jextee's solution (see the results below). As you can see, it works very on the front paws, but it works less well for the hind legs.
More specifically, it can't recognize the small peak that's the fourth toe. This is obviously inherent to the fact that the loop looks top down towards the lowest value, without taking into account where this is.
Would anyone know how to tweak #jextee's algorithm, so that it might be able to find the 4th toe too?
Since I haven't processed any other trials yet, I can't supply any other samples. But the data I gave before were the averages of each paw. This file is an array with the maximal data of 9 paws in the order they made contact with the plate.
This image shows how they were spatially spread out over the plate.
Update:
I have set up a blog for anyone interested and I have setup a OneDrive with all the raw measurements. So to anyone requesting more data: more power to you!
New update:
So after the help I got with my questions regarding paw detection and paw sorting, I was finally able to check the toe detection for every paw! Turns out, it doesn't work so well in anything but paws sized like the one in my own example. Off course in hindsight, it's my own fault for choosing the 2x2 so arbitrarily.
Here's a nice example of where it goes wrong: a nail is being recognized as a toe and the 'heel' is so wide, it gets recognized twice!
The paw is too large, so taking a 2x2 size with no overlap, causes some toes to be detected twice. The other way around, in small dogs it often fails to find a 5th toe, which I suspect is being caused by the 2x2 area being too large.
After trying the current solution on all my measurements I came to the staggering conclusion that for nearly all my small dogs it didn't find a 5th toe and that in over 50% of the impacts for the large dogs it would find more!
So clearly I need to change it. My own guess was changing the size of the neighborhood to something smaller for small dogs and larger for large dogs. But generate_binary_structure wouldn't let me change the size of the array.
Therefore, I'm hoping that anyone else has a better suggestion for locating the toes, perhaps having the toe area scale with the paw size?
I detected the peaks using a local maximum filter. Here is the result on your first dataset of 4 paws:
I also ran it on the second dataset of 9 paws and it worked as well.
Here is how you do it:
import numpy as np
from scipy.ndimage.filters import maximum_filter
from scipy.ndimage.morphology import generate_binary_structure, binary_erosion
import matplotlib.pyplot as pp
#for some reason I had to reshape. Numpy ignored the shape header.
paws_data = np.loadtxt("paws.txt").reshape(4,11,14)
#getting a list of images
paws = [p.squeeze() for p in np.vsplit(paws_data,4)]
def detect_peaks(image):
"""
Takes an image and detect the peaks usingthe local maximum filter.
Returns a boolean mask of the peaks (i.e. 1 when
the pixel's value is the neighborhood maximum, 0 otherwise)
"""
# define an 8-connected neighborhood
neighborhood = generate_binary_structure(2,2)
#apply the local maximum filter; all pixel of maximal value
#in their neighborhood are set to 1
local_max = maximum_filter(image, footprint=neighborhood)==image
#local_max is a mask that contains the peaks we are
#looking for, but also the background.
#In order to isolate the peaks we must remove the background from the mask.
#we create the mask of the background
background = (image==0)
#a little technicality: we must erode the background in order to
#successfully subtract it form local_max, otherwise a line will
#appear along the background border (artifact of the local maximum filter)
eroded_background = binary_erosion(background, structure=neighborhood, border_value=1)
#we obtain the final mask, containing only peaks,
#by removing the background from the local_max mask (xor operation)
detected_peaks = local_max ^ eroded_background
return detected_peaks
#applying the detection and plotting results
for i, paw in enumerate(paws):
detected_peaks = detect_peaks(paw)
pp.subplot(4,2,(2*i+1))
pp.imshow(paw)
pp.subplot(4,2,(2*i+2) )
pp.imshow(detected_peaks)
pp.show()
All you need to do after is use scipy.ndimage.measurements.label on the mask to label all distinct objects. Then you'll be able to play with them individually.
Note that the method works well because the background is not noisy. If it were, you would detect a bunch of other unwanted peaks in the background. Another important factor is the size of the neighborhood. You will need to adjust it if the peak size changes (the should remain roughly proportional).
Solution
Data file: paw.txt. Source code:
from scipy import *
from operator import itemgetter
n = 5 # how many fingers are we looking for
d = loadtxt("paw.txt")
width, height = d.shape
# Create an array where every element is a sum of 2x2 squares.
fourSums = d[:-1,:-1] + d[1:,:-1] + d[1:,1:] + d[:-1,1:]
# Find positions of the fingers.
# Pair each sum with its position number (from 0 to width*height-1),
pairs = zip(arange(width*height), fourSums.flatten())
# Sort by descending sum value, filter overlapping squares
def drop_overlapping(pairs):
no_overlaps = []
def does_not_overlap(p1, p2):
i1, i2 = p1[0], p2[0]
r1, col1 = i1 / (width-1), i1 % (width-1)
r2, col2 = i2 / (width-1), i2 % (width-1)
return (max(abs(r1-r2),abs(col1-col2)) >= 2)
for p in pairs:
if all(map(lambda prev: does_not_overlap(p,prev), no_overlaps)):
no_overlaps.append(p)
return no_overlaps
pairs2 = drop_overlapping(sorted(pairs, key=itemgetter(1), reverse=True))
# Take the first n with the heighest values
positions = pairs2[:n]
# Print results
print d, "\n"
for i, val in positions:
row = i / (width-1)
column = i % (width-1)
print "sum = %f # %d,%d (%d)" % (val, row, column, i)
print d[row:row+2,column:column+2], "\n"
Output without overlapping squares. It seems that the same areas are selected as in your example.
Some comments
The tricky part is to calculate sums of all 2x2 squares. I assumed you need all of them, so there might be some overlapping. I used slices to cut the first/last columns and rows from the original 2D array, and then overlapping them all together and calculating sums.
To understand it better, imaging a 3x3 array:
>>> a = arange(9).reshape(3,3) ; a
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
Then you can take its slices:
>>> a[:-1,:-1]
array([[0, 1],
[3, 4]])
>>> a[1:,:-1]
array([[3, 4],
[6, 7]])
>>> a[:-1,1:]
array([[1, 2],
[4, 5]])
>>> a[1:,1:]
array([[4, 5],
[7, 8]])
Now imagine you stack them one above the other and sum elements at the same positions. These sums will be exactly the same sums over the 2x2 squares with the top-left corner in the same position:
>>> sums = a[:-1,:-1] + a[1:,:-1] + a[:-1,1:] + a[1:,1:]; sums
array([[ 8, 12],
[20, 24]])
When you have the sums over 2x2 squares, you can use max to find the maximum, or sort, or sorted to find the peaks.
To remember positions of the peaks I couple every value (the sum) with its ordinal position in a flattened array (see zip). Then I calculate row/column position again when I print the results.
Notes
I allowed for the 2x2 squares to overlap. Edited version filters out some of them such that only non-overlapping squares appear in the results.
Choosing fingers (an idea)
Another problem is how to choose what is likely to be fingers out of all the peaks. I have an idea which may or may not work. I don't have time to implement it right now, so just pseudo-code.
I noticed that if the front fingers stay on almost a perfect circle, the rear finger should be inside of that circle. Also, the front fingers are more or less equally spaced. We may try to use these heuristic properties to detect the fingers.
Pseudo code:
select the top N finger candidates (not too many, 10 or 12)
consider all possible combinations of 5 out of N (use itertools.combinations)
for each combination of 5 fingers:
for each finger out of 5:
fit the best circle to the remaining 4
=> position of the center, radius
check if the selected finger is inside of the circle
check if the remaining four are evenly spread
(for example, consider angles from the center of the circle)
assign some cost (penalty) to this selection of 4 peaks + a rear finger
(consider, probably weighted:
circle fitting error,
if the rear finger is inside,
variance in the spreading of the front fingers,
total intensity of 5 peaks)
choose a combination of 4 peaks + a rear peak with the lowest penalty
This is a brute-force approach. If N is relatively small, then I think it is doable. For N=12, there are C_12^5 = 792 combinations, times 5 ways to select a rear finger, so 3960 cases to evaluate for every paw.
This is an image registration problem. The general strategy is:
Have a known example, or some kind of prior on the data.
Fit your data to the example, or fit the example to your data.
It helps if your data is roughly aligned in the first place.
Here's a rough and ready approach, "the dumbest thing that could possibly work":
Start with five toe coordinates in roughly the place you expect.
With each one, iteratively climb to the top of the hill. i.e. given current position, move to maximum neighbouring pixel, if its value is greater than current pixel. Stop when your toe coordinates have stopped moving.
To counteract the orientation problem, you could have 8 or so initial settings for the basic directions (North, North East, etc). Run each one individually and throw away any results where two or more toes end up at the same pixel. I'll think about this some more, but this kind of thing is still being researched in image processing - there are no right answers!
Slightly more complex idea: (weighted) K-means clustering. It's not that bad.
Start with five toe coordinates, but now these are "cluster centres".
Then iterate until convergence:
Assign each pixel to the closest cluster (just make a list for each cluster).
Calculate the center of mass of each cluster. For each cluster, this is: Sum(coordinate * intensity value)/Sum(coordinate)
Move each cluster to the new centre of mass.
This method will almost certainly give much better results, and you get the mass of each cluster which may help in identifying the toes.
(Again, you've specified the number of clusters up front. With clustering you have to specify the density one way or another: Either choose the number of clusters, appropriate in this case, or choose a cluster radius and see how many you end up with. An example of the latter is mean-shift.)
Sorry about the lack of implementation details or other specifics. I would code this up but I've got a deadline. If nothing else has worked by next week let me know and I'll give it a shot.
Using persistent homology to analyze your data set I get the following result (click to enlarge):
This is the 2D-version of the peak detection method described in this SO answer. The above figure simply shows 0-dimensional persistent homology classes sorted by persistence.
I did upscale the original dataset by a factor of 2 using scipy.misc.imresize(). However, note that I did consider the four paws as one dataset; splitting it into four would make the problem easier.
Methodology.
The idea behind this quite simple: Consider the function graph of the function that assigns each pixel its level. It looks like this:
Now consider a water level at height 255 that continuously descents to lower levels. At local maxima islands pop up (birth). At saddle points two islands merge; we consider the lower island to be merged to the higher island (death). The so-called persistence diagram (of the 0-th dimensional homology classes, our islands) depicts death- over birth-values of all islands:
The persistence of an island is then the difference between the birth- and death-level; the vertical distance of a dot to the grey main diagonal. The figure labels the islands by decreasing persistence.
The very first picture shows the locations of births of the islands. This method not only gives the local maxima but also quantifies their "significance" by the above mentioned persistence. One would then filter out all islands with a too low persistence. However, in your example every island (i.e., every local maximum) is a peak you look for.
Python code can be found here.
This problem has been studied in some depth by physicists. There is a good implementation in ROOT. Look at the TSpectrum classes (especially TSpectrum2 for your case) and the documentation for them.
References:
M.Morhac et al.: Background elimination methods for multidimensional coincidence gamma-ray spectra. Nuclear Instruments and Methods in Physics Research A 401 (1997) 113-132.
M.Morhac et al.: Efficient one- and two-dimensional Gold deconvolution and its application to gamma-ray spectra decomposition. Nuclear Instruments and Methods in Physics Research A 401 (1997) 385-408.
M.Morhac et al.: Identification of peaks in multidimensional coincidence gamma-ray spectra. Nuclear Instruments and Methods in Research Physics A 443(2000), 108-125.
...and for those who don't have access to a subscription to NIM:
Spectrum.doc
SpectrumDec.ps.gz
SpectrumSrc.ps.gz
SpectrumBck.ps.gz
I'm sure you have enough to go on by now, but I can't help but suggest using the k-means clustering method. k-means is an unsupervised clustering algorithm which will take you data (in any number of dimensions - I happen to do this in 3D) and arrange it into k clusters with distinct boundaries. It's nice here because you know exactly how many toes these canines (should) have.
Additionally, it's implemented in Scipy which is really nice (http://docs.scipy.org/doc/scipy/reference/cluster.vq.html).
Here's an example of what it can do to spatially resolve 3D clusters:
What you want to do is a bit different (2D and includes pressure values), but I still think you could give it a shot.
Here is an idea: you calculate the (discrete) Laplacian of the image. I would expect it to be (negative and) large at maxima, in a way that is more dramatic than in the original images. Thus, maxima could be easier to find.
Here is another idea: if you know the typical size of the high-pressure spots, you can first smooth your image by convoluting it with a Gaussian of the same size. This may give you simpler images to process.
Just a couple of ideas off the top of my head:
take the gradient (derivative) of the scan, see if that eliminates the false calls
take the maximum of the local maxima
You might also want to take a look at OpenCV, it's got a fairly decent Python API and might have some functions you'd find useful.
thanks for the raw data. I'm on the train and this is as far as I've gotten (my stop is coming up). I massaged your txt file with regexps and have plopped it into a html page with some javascript for visualization. I'm sharing it here because some, like myself, might find it more readily hackable than python.
I think a good approach will be scale and rotation invariant, and my next step will be to investigate mixtures of gaussians. (each paw pad being the center of a gaussian).
<html>
<head>
<script type="text/javascript" src="http://vis.stanford.edu/protovis/protovis-r3.2.js"></script>
<script type="text/javascript">
var heatmap = [[[0,0,0,0,0,0,0,4,4,0,0,0,0],
[0,0,0,0,0,7,14,22,18,7,0,0,0],
[0,0,0,0,11,40,65,43,18,7,0,0,0],
[0,0,0,0,14,61,72,32,7,4,11,14,4],
[0,7,14,11,7,22,25,11,4,14,65,72,14],
[4,29,79,54,14,7,4,11,18,29,79,83,18],
[0,18,54,32,18,43,36,29,61,76,25,18,4],
[0,4,7,7,25,90,79,36,79,90,22,0,0],
[0,0,0,0,11,47,40,14,29,36,7,0,0],
[0,0,0,0,4,7,7,4,4,4,0,0,0]
],[
[0,0,0,4,4,0,0,0,0,0,0,0,0],
[0,0,11,18,18,7,0,0,0,0,0,0,0],
[0,4,29,47,29,7,0,4,4,0,0,0,0],
[0,0,11,29,29,7,7,22,25,7,0,0,0],
[0,0,0,4,4,4,14,61,83,22,0,0,0],
[4,7,4,4,4,4,14,32,25,7,0,0,0],
[4,11,7,14,25,25,47,79,32,4,0,0,0],
[0,4,4,22,58,40,29,86,36,4,0,0,0],
[0,0,0,7,18,14,7,18,7,0,0,0,0],
[0,0,0,0,4,4,0,0,0,0,0,0,0],
],[
[0,0,0,4,11,11,7,4,0,0,0,0,0],
[0,0,0,4,22,36,32,22,11,4,0,0,0],
[4,11,7,4,11,29,54,50,22,4,0,0,0],
[11,58,43,11,4,11,25,22,11,11,18,7,0],
[11,50,43,18,11,4,4,7,18,61,86,29,4],
[0,11,18,54,58,25,32,50,32,47,54,14,0],
[0,0,14,72,76,40,86,101,32,11,7,4,0],
[0,0,4,22,22,18,47,65,18,0,0,0,0],
[0,0,0,0,4,4,7,11,4,0,0,0,0],
],[
[0,0,0,0,4,4,4,0,0,0,0,0,0],
[0,0,0,4,14,14,18,7,0,0,0,0,0],
[0,0,0,4,14,40,54,22,4,0,0,0,0],
[0,7,11,4,11,32,36,11,0,0,0,0,0],
[4,29,36,11,4,7,7,4,4,0,0,0,0],
[4,25,32,18,7,4,4,4,14,7,0,0,0],
[0,7,36,58,29,14,22,14,18,11,0,0,0],
[0,11,50,68,32,40,61,18,4,4,0,0,0],
[0,4,11,18,18,43,32,7,0,0,0,0,0],
[0,0,0,0,4,7,4,0,0,0,0,0,0],
],[
[0,0,0,0,0,0,4,7,4,0,0,0,0],
[0,0,0,0,4,18,25,32,25,7,0,0,0],
[0,0,0,4,18,65,68,29,11,0,0,0,0],
[0,4,4,4,18,65,54,18,4,7,14,11,0],
[4,22,36,14,4,14,11,7,7,29,79,47,7],
[7,54,76,36,18,14,11,36,40,32,72,36,4],
[4,11,18,18,61,79,36,54,97,40,14,7,0],
[0,0,0,11,58,101,40,47,108,50,7,0,0],
[0,0,0,4,11,25,7,11,22,11,0,0,0],
[0,0,0,0,0,4,0,0,0,0,0,0,0],
],[
[0,0,4,7,4,0,0,0,0,0,0,0,0],
[0,0,11,22,14,4,0,4,0,0,0,0,0],
[0,0,7,18,14,4,4,14,18,4,0,0,0],
[0,4,0,4,4,0,4,32,54,18,0,0,0],
[4,11,7,4,7,7,18,29,22,4,0,0,0],
[7,18,7,22,40,25,50,76,25,4,0,0,0],
[0,4,4,22,61,32,25,54,18,0,0,0,0],
[0,0,0,4,11,7,4,11,4,0,0,0,0],
],[
[0,0,0,0,7,14,11,4,0,0,0,0,0],
[0,0,0,4,18,43,50,32,14,4,0,0,0],
[0,4,11,4,7,29,61,65,43,11,0,0,0],
[4,18,54,25,7,11,32,40,25,7,11,4,0],
[4,36,86,40,11,7,7,7,7,25,58,25,4],
[0,7,18,25,65,40,18,25,22,22,47,18,0],
[0,0,4,32,79,47,43,86,54,11,7,4,0],
[0,0,0,14,32,14,25,61,40,7,0,0,0],
[0,0,0,0,4,4,4,11,7,0,0,0,0],
],[
[0,0,0,0,4,7,11,4,0,0,0,0,0],
[0,4,4,0,4,11,18,11,0,0,0,0,0],
[4,11,11,4,0,4,4,4,0,0,0,0,0],
[4,18,14,7,4,0,0,4,7,7,0,0,0],
[0,7,18,29,14,11,11,7,18,18,4,0,0],
[0,11,43,50,29,43,40,11,4,4,0,0,0],
[0,4,18,25,22,54,40,7,0,0,0,0,0],
[0,0,4,4,4,11,7,0,0,0,0,0,0],
],[
[0,0,0,0,0,7,7,7,7,0,0,0,0],
[0,0,0,0,7,32,32,18,4,0,0,0,0],
[0,0,0,0,11,54,40,14,4,4,22,11,0],
[0,7,14,11,4,14,11,4,4,25,94,50,7],
[4,25,65,43,11,7,4,7,22,25,54,36,7],
[0,7,25,22,29,58,32,25,72,61,14,7,0],
[0,0,4,4,40,115,68,29,83,72,11,0,0],
[0,0,0,0,11,29,18,7,18,14,4,0,0],
[0,0,0,0,0,4,0,0,0,0,0,0,0],
]
];
</script>
</head>
<body>
<script type="text/javascript+protovis">
for (var a=0; a < heatmap.length; a++) {
var w = heatmap[a][0].length,
h = heatmap[a].length;
var vis = new pv.Panel()
.width(w * 6)
.height(h * 6)
.strokeStyle("#aaa")
.lineWidth(4)
.antialias(true);
vis.add(pv.Image)
.imageWidth(w)
.imageHeight(h)
.image(pv.Scale.linear()
.domain(0, 99, 100)
.range("#000", "#fff", '#ff0a0a')
.by(function(i, j) heatmap[a][j][i]));
vis.render();
}
</script>
</body>
</html>
Physicist's solution:
Define 5 paw-markers identified by their positions X_i and init them with random positions.
Define some energy function combining some award for location of markers in paws' positions with some punishment for overlap of markers; let's say:
E(X_i;S)=-Sum_i(S(X_i))+alfa*Sum_ij (|X_i-Xj|<=2*sqrt(2)?1:0)
(S(X_i) is the mean force in 2x2 square around X_i, alfa is a parameter to be peaked experimentally)
Now time to do some Metropolis-Hastings magic:
1. Select random marker and move it by one pixel in random direction.
2. Calculate dE, the difference of energy this move caused.
3. Get an uniform random number from 0-1 and call it r.
4. If dE<0 or exp(-beta*dE)>r, accept the move and go to 1; if not, undo the move and go to 1.
This should be repeated until the markers will converge to paws. Beta controls the scanning to optimizing tradeoff, so it should be also optimized experimentally; it can be also constantly increased with the time of simulation (simulated annealing).
Just wanna tell you guys there is a nice option to find local maxima in images with python:
from skimage.feature import peak_local_max
or for skimage 0.8.0:
from skimage.feature.peak import peak_local_max
http://scikit-image.org/docs/0.8.0/api/skimage.feature.peak.html
It's probably worth to try with neural networks if you are able to create some training data... but this needs many samples annotated by hand.
Heres another approach that I used when doing something similar for a large telescope:
1) Search for the highest pixel.
Once you have that, search around that for the best fit for 2x2 (maybe maximizing the 2x2 sum), or do a 2d gaussian fit inside the sub region of say 4x4 centered on the highest pixel.
Then set those 2x2 pixels you have found to zero (or maybe 3x3) around the peak center
go back to 1) and repeat till the highest peak falls below a noise threshold, or you have all the toes you need
a rough outline...
you'd probably want to use a connected components algorithm to isolate each paw region. wiki has a decent description of this (with some code) here: http://en.wikipedia.org/wiki/Connected_Component_Labeling
you'll have to make a decision about whether to use 4 or 8 connectedness. personally, for most problems i prefer 6-connectedness. anyway, once you've separated out each "paw print" as a connected region, it should be easy enough to iterate through the region and find the maxima. once you've found the maxima, you could iteratively enlarge the region until you reach a predetermined threshold in order to identify it as a given "toe".
one subtle problem here is that as soon as you start using computer vision techniques to identify something as a right/left/front/rear paw and you start looking at individual toes, you have to start taking rotations, skews, and translations into account. this is accomplished through the analysis of so-called "moments". there are a few different moments to consider in vision applications:
central moments: translation invariant
normalized moments: scaling and translation invariant
hu moments: translation, scale, and rotation invariant
more information about moments can be found by searching "image moments" on wiki.
Perhaps you can use something like Gaussian Mixture Models. Here's a Python package for doing GMMs (just did a Google search)
http://www.ar.media.kyoto-u.ac.jp/members/david/softwares/em/
Interesting problem. The solution I would try is the following.
Apply a low pass filter, such as convolution with a 2D gaussian mask. This will give you a bunch of (probably, but not necessarily floating point) values.
Perform a 2D non-maximal suppression using the known approximate radius of each paw pad (or toe).
This should give you the maximal positions without having multiple candidates which are close together. Just to clarify, the radius of the mask in step 1 should also be similar to the radius used in step 2. This radius could be selectable, or the vet could explicitly measure it beforehand (it will vary with age/breed/etc).
Some of the solutions suggested (mean shift, neural nets, and so on) probably will work to some degree, but are overly complicated and probably not ideal.
It seems you can cheat a bit using jetxee's algorithm. He is finding the first three toes fine, and you should be able to guess where the fourth is based off that.
Well, here's some simple and not terribly efficient code, but for this size of a data set it is fine.
import numpy as np
grid = np.array([[0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,0,0,0,0.4,0.4,0.4,0,0,0],
[0,0,0,0,0.4,1.4,1.4,1.8,0.7,0,0,0,0,0],
[0,0,0,0,0.4,1.4,4,5.4,2.2,0.4,0,0,0,0],
[0,0,0.7,1.1,0.4,1.1,3.2,3.6,1.1,0,0,0,0,0],
[0,0.4,2.9,3.6,1.1,0.4,0.7,0.7,0.4,0.4,0,0,0,0],
[0,0.4,2.5,3.2,1.8,0.7,0.4,0.4,0.4,1.4,0.7,0,0,0],
[0,0,0.7,3.6,5.8,2.9,1.4,2.2,1.4,1.8,1.1,0,0,0],
[0,0,1.1,5,6.8,3.2,4,6.1,1.8,0.4,0.4,0,0,0],
[0,0,0.4,1.1,1.8,1.8,4.3,3.2,0.7,0,0,0,0,0],
[0,0,0,0,0,0.4,0.7,0.4,0,0,0,0,0,0]])
arr = []
for i in xrange(grid.shape[0] - 1):
for j in xrange(grid.shape[1] - 1):
tot = grid[i][j] + grid[i+1][j] + grid[i][j+1] + grid[i+1][j+1]
arr.append([(i,j),tot])
best = []
arr.sort(key = lambda x: x[1])
for i in xrange(5):
best.append(arr.pop())
badpos = set([(best[-1][0][0]+x,best[-1][0][1]+y)
for x in [-1,0,1] for y in [-1,0,1] if x != 0 or y != 0])
for j in xrange(len(arr)-1,-1,-1):
if arr[j][0] in badpos:
arr.pop(j)
for item in best:
print grid[item[0][0]:item[0][0]+2,item[0][1]:item[0][1]+2]
I basically just make an array with the position of the upper-left and the sum of each 2x2 square and sort it by the sum. I then take the 2x2 square with the highest sum out of contention, put it in the best array, and remove all other 2x2 squares that used any part of this just removed 2x2 square.
It seems to work fine except with the last paw (the one with the smallest sum on the far right in your first picture), it turns out that there are two other eligible 2x2 squares with a larger sum (and they have an equal sum to each other). One of them is still selects one square from your 2x2 square, but the other is off to the left. Fortunately, by luck we see to be choosing more of the one that you would want, but this may require some other ideas to be used to get what you actually want all of the time.
I am not sure this answers the question, but it seems like you can just look for the n highest peaks that don't have neighbors.
Here is the gist. Note that it's in Ruby, but the idea should be clear.
require 'pp'
NUM_PEAKS = 5
NEIGHBOR_DISTANCE = 1
data = [[1,2,3,4,5],
[2,6,4,4,6],
[3,6,7,4,3],
]
def tuples(matrix)
tuples = []
matrix.each_with_index { |row, ri|
row.each_with_index { |value, ci|
tuples << [value, ri, ci]
}
}
tuples
end
def neighbor?(t1, t2, distance = 1)
[1,2].each { |axis|
return false if (t1[axis] - t2[axis]).abs > distance
}
true
end
# convert the matrix into a sorted list of tuples (value, row, col), highest peaks first
sorted = tuples(data).sort_by { |tuple| tuple.first }.reverse
# the list of peaks that don't have neighbors
non_neighboring_peaks = []
sorted.each { |candidate|
# always take the highest peak
if non_neighboring_peaks.empty?
non_neighboring_peaks << candidate
puts "took the first peak: #{candidate}"
else
# check that this candidate doesn't have any accepted neighbors
is_ok = true
non_neighboring_peaks.each { |accepted|
if neighbor?(candidate, accepted, NEIGHBOR_DISTANCE)
is_ok = false
break
end
}
if is_ok
non_neighboring_peaks << candidate
puts "took #{candidate}"
else
puts "denied #{candidate}"
end
end
}
pp non_neighboring_peaks
Maybe a naive approach is sufficient here: Build a list of all 2x2 squares on your plane, order them by their sum (in descending order).
First, select the highest-valued square into your "paw list". Then, iteratively pick 4 of the next-best squares that don't intersect with any of the previously found squares.
What if you proceed step by step: you first locate the global maximum, process if needed the surrounding points given their value, then set the found region to zero, and repeat for the next one.

Categories