Measuring shift between two images along one direction only - python
I have to measure shifts between two monochromatic images.
These images are actually spectra before calibration, which are very noisy and full of unwanted features, but they basically look like following
I know that between different images, they have shifts along x-direction, but not along y-direction. And I want to know the amount of the shift along x-direction between them.
Luckily I found a function in skimage, register_translation, which can be used for arbitrary subpixel precision. But the problem is, I want to know shift along x-direction only, and I want resulting y-direction shift to be 0, but the program finds the shift to x and y at the same time, presumably along the direction perpendicular to the features. (marked as blue arrow in the figure)
So, I am wondering :
is there any function or package in python that measures the shift between two images along one direction only, or even with any prior knowledge?
what is a correct way of finding shifts between two noisy images? Would finding maximum cross-correlation value in FFT space would do the job?
Some simple maths should do in this situation if register_translation gives you the xy shift, be it in vector or component form. You can calculate the movement in x that would be required if the y shift was non-existent, which is what you want. I am travelling so unfortunately can't give you the graph right now, would recommend drawing the triangles out.
The extra x shift required (x_extra) is defined by:
x_extra = y * tan[arctan(y_shift/x_shift)]
Which is simplified to:
x_extra = y_shift^2 / x_shift
Therefore, the total shift in x is:
x_shift_total = x_shift + x_extra
Where the x_shift is given to you by register_translation.
If you then move imageA by x_shift_total, it should be aligned with imageB, assuming the x_shift given by register_translation is correct.
#jni I would be keen to implement this as an option in register_translation!
I'm not positive it will work, but: one of the benefits of open source is that you can look at the implementation details of register_translation, then try to adapt it to your case. In your case, I would replace the fftn with fftn(..., axis=1), so that you only compute the fft along the columns axis. Then, multiply the two FFT signals together (this is equivalent to the convolution of each line, as suggested by #CypherX). Finally, you have to find a way to "coalesce" the shifts found along each line into a single measurement. One idea would be to take each shift (the maximum along that line) and plot a histogram. One would hope that you get a sharp peak around the true x shift.
If it works, it would be a pretty great contribution to scikit-image to add an "axis" keyword argument to register_translation. You can read the how to contribute guide and propose a change accordingly!
Another, much faster and simpler, approach would be to calculate the horizontal profile at the same location in both images. That would give you a 1D profile for each image horizontally. Simple peak finding will then give you the location of the lines, and the difference between the peak indexes will tell you the shift solely in the x-axis.
I use this approach routinely to do shift detection similar to your problem, and it is very very fast, very simple, and very robust.
# pick a row to use
row = 10
x_profile1 = np.mean(image1[row, :], axis=0)
x_profiel2 = np.mean(image2[row, :], axis=0)
# 'get_peaks' is a function to return indices of found peaks - several
# around
peaks1 = get_peaks(x_profile1)
peaks2 = get_peaks(x_profile2)
x_shift = peaks1[0] - peaks2[0]
Method-1
You could use convolution between the two images to find where you get a maximum. You could envision this as sliding the non-shifted images over the shifted image from left to right, and the convolution will produce maxima corresponding to the scenario when the identical sections of each image lies on top of one-another. Take a look at scipy.ndimage.convolution and scipy.signal.convolve and see which one suits your needs better.
Method-2
On the other hand, you could take a horizontal slice from each image and find the position of the peaks (assuming black strips are 1's and white regions are 0's).
Calculate the centroids of these peaks in each image. Find the difference between the positions of these centroids and that is the shift your are looking for.
For robustness, you could then apply this to various rows of the image-pairs and the average of all the such differences would be a more statistically viable result for a measure of horizontal shift.
Related
3D rotations to connect balls and cylinders
I've been tasked with writing a python based plugin for a graph drawing program that generates an STL model of a graph. A graph being an object made up of vertices and edges, where a vertex is represented by a 3D ball (a tessellated icosahedron), and an edge is represented with a cylinder that connects with two balls at either end. The end result of the 3D model is that it will get dumped out to an STL file for 3D printing. I'm able to generate the 3D models for the balls and cylinders without any issues, but I'm having some issues generating the overall model, and getting the balls and cylinders to connect properly. My original idea was to create tessellated icosahedrons at the origin, then translate them out to the positions of the vertices. This works fine. I then, for each edge, I would create a cylinder at the origin, rotate it to the correct angle so that it points in the correct direction, then translate it to the midpoint between the two vertices so that the ends of the cylinders are embedded in the icosahedrons. This is where things are going wrong. I'm having some difficulties getting the rotations correct. To calculate the rotations, I'm doing the following: First, I find the angle between the two points as follows (where source and target are both vertices in the graph, belonging to the edge that I'm currently processing): deltaX = source.x - target.x deltaY = source.y - target.y deltaZ = source.z - target.z xyAngle = math.atan2(deltaX, deltaY) xzAngle = math.atan2(deltaX, deltaZ) yzAngle = math.atan2(deltaY, deltaZ) The angles being calculated seem reasonable, and as far as I can tell, do actually represent the angle between the vertices. For example, if I have a vertex at (1, 1, 0) and another vertex at (3, 3, 0), the angle edge connecting them does show up as a 45 degree angle between the two vertices. (That, or -135 degrees, depending which vertex is the source and which is the target). Once I have the angles calculated, I create a cylinder and rotate it by the angles that have been calculated, like so, using some other classes that I've created: c = cylinder() c.createCylinder(edgeThickness, edgeLength) c.rotateX(-yzAngle) c.rotateY(xzAngle) c.rotateZ(-xyAngle) c.translate(edgePosition.x, edgePosition.y, edgePosition.z) (Where edgePosition is the midpoint between the two vertices in the graph, edgeThickness is the radius of the cylinder being created, and edgeLength is the distance between the two vertices). As mentioned, its the rotating of the cylinders that doesn't work as expected. It seems to do the correct rotation on the x/y plane, but as soon as an edge has vertices that differ in all three components (x, y, and z), the rotation fails. Here's an example of a graph that differs in the x, and y components, but not in the z component: And here's the resulting STL file, as seen in Makerware (which is used to send the 3D models to the 3D printer): (The extra cylinder looking bit in the bottom left is something I've currently left in for testing purposes - a cylinder that points in the direction of the z axis, located at the origin). If I take that same graph and move the middle vertex out in the z axis, so now all the edges involve angles in all three axis, I get a result something like the following: As show in the app: The resulting STL file, as show in Makerware: ...and that same model as viewed from the side: As you can see, the cylinders definitely aren't meeting up with the balls like I thought they would. My question is this: Is my approach to doing this flawed, or is it some small but critical mistake that I'm making somewhere in my rotations? I'm pretty sure it isn't a problem with the rotation functions themselves, as I've been able to independently verify that they work as expected. I also tried creating a rotate function that takes in a yaw, pitch, and roll and does all three at once, and it seemed to generate the same result, like so: c.rotateYawPitchRoll(xzAngle, -yzAngle, -xyAngle) So... anyone have any ideas on what I might be doing wrong? UPDATE: As joojaa pointed out, it was a combination of calculating the correct angles as well as the order that they were applied. In order to get things working, I first calculate the rotation on the x axis, as follows: zyAngle = math.atan2(deltaVector.z, deltaVector.y) where deltaVector is the difference between the target and source vectors. This rotation is not yet applied though! The next step is to calculate the rotation on the y axis, as follows: angle = vector.angleBetweenVectors(vector(target.x - source.x, target.y - source.y, target.z - source.z), vector(target.x - source.x, target.y - source.y, 0.0)) Once both rotations are calculated, they are then applied... in the reverse order! First, the x, then the y: c.rotateY(angle) c.rotateX(-zyAngle) #... where c is a cylinder object There still seems to be a few bugs, but this seems to at least work for a simple test case.
Rotation happens in successive order, so the angles affect each other. It is not possible to use a Euler model to rotate them at once. This is why you can not just calculate the rotations based on the first static situation. Just imagine turning a cube so that it is standing on its corner upright. Yes the first rotation is 45 but the second is not since the cube is already turned by that time (draw a each step of the sequence and see what happens). Space rotations aren't trivial. So you need to rotate one angle then re calculate the second angle and so forth. This is also why your first rotation works right. You only need 2 rotations unless your interested in making sure the rotation around the shaft has a certain direction. I would suggest you use axis angles or matrices instead to do this. Mainly because in axis angles this is trivial the angle is the dot between the along tube start and end vectors and the axis is the cross between those 2. You can then convert those to Euler angles if you need. But probably you can just use the matrix directly. For ideas on how conversions and how the rotation could directly be calculated see: transformations.py by Christoph Gohlke. Also see the accompanying c source. I think i need to expand this answer a bit There is a really easy way out for this question that sidesteps all your and many other persons problems. The answer is do not use Euler angle rotation. Ive used a lot of brainpower to try to explain Euler rotations to problems that are ultimately solved more easily without Euler rotations. To justify i will leave just one reason for this if you want more think up of some more answers. The reason most to use Euler rotation sequences is that you probably don't understand Euler angles. There are in fact only a handful of situations where they are good. No self respecting programmer uses Euler rotations to solve this issue. What you do is you use vector math instead. So you have the direction vector from the source to target which is usually calculated: along = normalize(target-source) this is simply one of your matrix rows (or column notation is up to model maker), the one that corresponds to your cylinders original direction (the rows are just x y z w), then you need another vector perpendicular to this one. Choose a arbitrary vector like up (or left if your along is pointing close to up). cross product this up vector by your along for the second row direction. and finally put your source as the last row with 1 in the last column. Done fully formed affine matrix describing the cylinders prition. Much easier to understand since you can draw the vectors. There are shorter ways but this one is easy to understand.
Selecting best range of values from histogram curve
Scenario : I am trying to track two different colored objects. At the beginning, user is prompted to hold the first colored object (say, may be a RED) at a particular position in front of camera (marked on screen by a rectangle) and press any key, then my program takes that portion of frame (ROI) and analyze the color in it, to find what color to track. Similarly for second object also. Then as usual, use cv.inRange function in HSV color plane and track the object. What is done : I took the ROI of object to be tracked, converted it to HSV and checked the Hue histogram. I got two cases as below : ( here there is only one major central peak. But in some cases, I get two such peaks, One a bigger peak with some pixel cluster around it, and second peak, smaller than first one, but significant size with small cluster around it also. I don't have an sample image of it now. But it almost look like below (created in paint)) Question : How can I get best range of hue values from these histograms? By best range I mean, may be around 80-90% of the pixels in ROI lie in that range. Or is there any better method than this to track different colored objects ?
If I understand right, the only thing you need here is to find a maximum in a graph, where the maximum is not necessarily the highest peak, but the area with largest density. Here's a very simple not too scientific but fast O(n) approach. Run the histogram trough a low pass filter. E.g. a moving average. The length of your average can be let's say 20. In that case the 10th value of your new modified histogram would be: mh10 = (h1 + h2 + ... + h20) / 20 where h1, h2... are values from your histogram. The next value: mh11 = (h2 + h3 + ... + h21) / 20 which can be calculated much easier using the previously calculated mh10, by dropping it's first component and adding a new one to the end: mh11 = mh10 - h1/20 + h21/20 Your only problem is how you handle numbers at the edge of your histogram. You could shrink your moving average's length to the length available, or you could add values before and after what you already have. But either way, you couldn't handle peaks right at the edge. And finally, when you have this modified histogram, just get the maximum. This works, because now every value in your histogram contains not only himself but it's neighbors as well. A more sophisticated approach is to weight your average for example with a Gaussian curve. But that's not linear any more. It would be O(k*n), where k is the length of your average which is also the length of the Gaussian.
Resizing image algorithm in python
So, I'm learning my self python by this tutorial and I'm stuck with exercise number 13 which says: Write a function to uniformly shrink or enlarge an image. Your function should take an image along with a scaling factor. To shrink the image the scale factor should be between 0 and 1 to enlarge the image the scaling factor should be greater than 1. This is not meant as a question about PIL, but to ask which algorithm to use so I can code it myself. I've found some similar questions like this, but I dunno how to translate this into python. Any help would be appreciated. I've come to this: import image win = image.ImageWin() img = image.Image("cy.png") factor = 2 W = img.getWidth() H = img.getHeight() newW = int(W*factor) newH = int(H*factor) newImage = image.EmptyImage(newW, newH) for col in range(newW): for row in range(newH): p = img.getPixel(col,row) newImage.setPixel(col*factor,row*factor,p) newImage.draw(win) win.exitonclick() I should do this in a function, but this doesn't matter right now. Arguments for function would be (image, factor). You can try it on OP tutorial in ActiveCode. It makes a stretched image with empty columns :.
Your code as shown is simple and effective for what's known as a Nearest Neighbor resize, except for one little bug: p = img.getPixel(col/factor,row/factor) newImage.setPixel(col,row,p) Edit: since you're sending a floating point coordinate into getPixel you're not limited to Nearest Neighbor - you can implement any interpolation algorithm you want inside. The simplest thing to do is simply truncate the coordinates to int which will cause pixels to be replicated when factor is greater than 1, or skipped when factor is less than 1.
Mark has the correct approach. To get a smoother result, you replace: p = img.getPixel(col/factor,row/factor) with a function that takes floating point coordinates and returns a pixel interpolated from several neighboring points in the source image. For linear interpolation it takes the four nearest neigbors; for higher-order interpolation it takes a larger number of surrounding pixels. For example, if col/factor = 3.75 and row/factor = 1.9, a linear interpolation would take the source pixels at (3,1), (3,2), (4,1), and (4,2) and give a result between those 4 rgb values, weighted most heavily to the pixel at (4,2).
You can do that using the Python Imaging Library. Image.resize() should do what you want. See http://effbot.org/imagingbook/image.htm EDIT Since you want to program this yourself without using a module, I have added an extra solution. You will have to use the following algorithm. load your image extract it's size calculate the desired size (height * factor, width * factor) create a new EmptyImage with the desired size Using a nested loop through the pixels (row by column) in your image. Then (for shrinking) you remove some pixels every once in while, or for (enlarging) you duplicate some pixels in your image. If you want you want to get fancy, you could smooth the added, or removed pixels, by averaging the rgb values with their neighbours.
These spectrum bands used to be judged by eye, how to do it programmatically?
Operators used to examine the spectrum, knowing the location and width of each peak and judge the piece the spectrum belongs to. In the new way, the image is captured by a camera to a screen. And the width of each band must be computed programatically. Old system: spectroscope -> human eye New system: spectroscope -> camera -> program What is a good method to compute the width of each band, given their approximate X-axis positions; given that this task used to be performed perfectly by eye, and must now be performed by program? Sorry if I am short of details, but they are scarce. Program listing that generated the previous graph; I hope it is relevant: import Image from scipy import * from scipy.optimize import leastsq # Load the picture with PIL, process if needed pic = asarray(Image.open("spectrum.jpg")) # Average the pixel values along vertical axis pic_avg = pic.mean(axis=2) projection = pic_avg.sum(axis=0) # Set the min value to zero for a nice fit projection /= projection.mean() projection -= projection.min() #print projection # Fit function, two gaussians, adjust as needed def fitfunc(p,x): return p[0]*exp(-(x-p[1])**2/(2.0*p[2]**2)) + \ p[3]*exp(-(x-p[4])**2/(2.0*p[5]**2)) errfunc = lambda p, x, y: fitfunc(p,x)-y # Use scipy to fit, p0 is inital guess p0 = array([0,20,1,0,75,10]) X = xrange(len(projection)) p1, success = leastsq(errfunc, p0, args=(X,projection)) Y = fitfunc(p1,X) # Output the result print "Mean values at: ", p1[1], p1[4] # Plot the result from pylab import * #subplot(211) #imshow(pic) #subplot(223) #plot(projection) #subplot(224) #plot(X,Y,'r',lw=5) #show() subplot(311) imshow(pic) subplot(312) plot(projection) subplot(313) plot(X,Y,'r',lw=5) show()
Given an approximate starting point, you could use a simple algorithm that finds a local maxima closest to this point. Your fitting code may be doing that already (I wasn't sure whether you were using it successfully or not). Here's some code that demonstrates simple peak finding from a user-given starting point: #!/usr/bin/env python from __future__ import division import numpy as np from matplotlib import pyplot as plt # Sample data with two peaks: small one at t=0.4, large one at t=0.8 ts = np.arange(0, 1, 0.01) xs = np.exp(-((ts-0.4)/0.1)**2) + 2*np.exp(-((ts-0.8)/0.1)**2) # Say we have an approximate starting point of 0.35 start_point = 0.35 # Nearest index in "ts" to this starting point is... start_index = np.argmin(np.abs(ts - start_point)) # Find the local maxima in our data by looking for a sign change in # the first difference # From http://stackoverflow.com/a/9667121/188535 maxes = (np.diff(np.sign(np.diff(xs))) < 0).nonzero()[0] + 1 # Find which of these peaks is closest to our starting point index_of_peak = maxes[np.argmin(np.abs(maxes - start_index))] print "Peak centre at: %.3f" % ts[index_of_peak] # Quick plot showing the results: blue line is data, green dot is # starting point, red dot is peak location plt.plot(ts, xs, '-b') plt.plot(ts[start_index], xs[start_index], 'og') plt.plot(ts[index_of_peak], xs[index_of_peak], 'or') plt.show() This method will only work if the ascent up the peak is perfectly smooth from your starting point. If this needs to be more resilient to noise, I have not used it, but PyDSTool seems like it might help. This SciPy post details how to use it for detecting 1D peaks in a noisy data set. So assume at this point you've found the centre of the peak. Now for the width: there are several methods you could use, but the easiest is probably the "full width at half maximum" (FWHM). Again, this is simple and therefore fragile. It will break for close double-peaks, or for noisy data. The FWHM is exactly what its name suggests: you find the width of the peak were it's halfway to the maximum. Here's some code that does that (it just continues on from above): # FWHM... half_max = xs[index_of_peak]/2 # This finds where in the data we cross over the halfway point to our peak. Note # that this is global, so we need an extra step to refine these results to find # the closest crossovers to our peak. # Same sign-change-in-first-diff technique as above hm_left_indices = (np.diff(np.sign(np.diff(np.abs(xs[:index_of_peak] - half_max)))) > 0).nonzero()[0] + 1 # Add "index_of_peak" to result because we cut off the left side of the data! hm_right_indices = (np.diff(np.sign(np.diff(np.abs(xs[index_of_peak:] - half_max)))) > 0).nonzero()[0] + 1 + index_of_peak # Find closest half-max index to peak hm_left_index = hm_left_indices[np.argmin(np.abs(hm_left_indices - index_of_peak))] hm_right_index = hm_right_indices[np.argmin(np.abs(hm_right_indices - index_of_peak))] # And the width is... fwhm = ts[hm_right_index] - ts[hm_left_index] print "Width: %.3f" % fwhm # Plot to illustrate FWHM: blue line is data, red circle is peak, red line # shows FWHM plt.plot(ts, xs, '-b') plt.plot(ts[index_of_peak], xs[index_of_peak], 'or') plt.plot( [ts[hm_left_index], ts[hm_right_index]], [xs[hm_left_index], xs[hm_right_index]], '-r') plt.show() It doesn't have to be the full width at half maximum — as one commenter points out, you can try to figure out where your operators' normal threshold for peak detection is, and turn that into an algorithm for this step of the process. A more robust way might be to fit a Gaussian curve (or your own model) to a subset of the data centred around the peak — say, from a local minima on one side to a local minima on the other — and use one of the parameters of that curve (eg. sigma) to calculate the width. I realise this is a lot of code, but I've deliberately avoided factoring out the index-finding functions to "show my working" a bit more, and of course the plotting functions are there just to demonstrate. Hopefully this gives you at least a good starting point to come up with something more suitable to your particular set.
Late to the party, but for anyone coming across this question in the future... Eye movement data looks very similar to this; I'd base an approach off that used by Nystrom + Holmqvist, 2010. Smooth the data using a Savitsky-Golay filter (scipy.signal.savgol_filter in scipy v0.14+) to get rid of some of the low-level noise while keeping the large peaks intact - the authors recommend using an order of 2 and a window size of about twice the width of the smallest peak you want to be able to detect. You can find where the bands are by arbitrarily removing all values above a certain y value (set them to numpy.nan). Then take the (nan)mean and (nan)standard deviation of the remainder, and remove all values greater than the mean + [parameter]*std (I think they use 6 in the paper). Iterate until you're not removing any data points - but depending on your data, certain values of [parameter] may not stabilise. Then use numpy.isnan() to find events vs non-events, and numpy.diff() to find the start and end of each event (values of -1 and 1 respectively). To get even more accurate start and end points, you can scan along the data backward from each start and forward from each end to find the nearest local minimum which has value smaller than mean + [another parameter]*std (I think they use 3 in the paper). Then you just need to count the data points between each start and end. This won't work for that double peak; you'd have to do some extrapolation for that.
The best method might be to statistically compare a bunch of methods with human results. You would take a large variety data and a large variety of measurement estimates (widths at various thresholds, area above various thresholds, different threshold selection methods, 2nd moments, polynomial curve fits of various degrees, pattern matching, and etc.) and compare these estimates to human measurements of the same data set. Pick the estimate method that correlates best with expert human results. Or maybe pick several methods, the best one for each of various heights, for various separations from other peaks, and etc.
Peak detection in a 2D array
I'm helping a veterinary clinic measuring pressure under a dogs paw. I use Python for my data analysis and now I'm stuck trying to divide the paws into (anatomical) subregions. I made a 2D array of each paw, that consists of the maximal values for each sensor that has been loaded by the paw over time. Here's an example of one paw, where I used Excel to draw the areas I want to 'detect'. These are 2 by 2 boxes around the sensor with local maxima's, that together have the largest sum. So I tried some experimenting and decide to simply look for the maximums of each column and row (can't look in one direction due to the shape of the paw). This seems to 'detect' the location of the separate toes fairly well, but it also marks neighboring sensors. So what would be the best way to tell Python which of these maximums are the ones I want? Note: The 2x2 squares can't overlap, since they have to be separate toes! Also I took 2x2 as a convenience, any more advanced solution is welcome, but I'm simply a human movement scientist, so I'm neither a real programmer or a mathematician, so please keep it 'simple'. Here's a version that can be loaded with np.loadtxt Results So I tried #jextee's solution (see the results below). As you can see, it works very on the front paws, but it works less well for the hind legs. More specifically, it can't recognize the small peak that's the fourth toe. This is obviously inherent to the fact that the loop looks top down towards the lowest value, without taking into account where this is. Would anyone know how to tweak #jextee's algorithm, so that it might be able to find the 4th toe too? Since I haven't processed any other trials yet, I can't supply any other samples. But the data I gave before were the averages of each paw. This file is an array with the maximal data of 9 paws in the order they made contact with the plate. This image shows how they were spatially spread out over the plate. Update: I have set up a blog for anyone interested and I have setup a OneDrive with all the raw measurements. So to anyone requesting more data: more power to you! New update: So after the help I got with my questions regarding paw detection and paw sorting, I was finally able to check the toe detection for every paw! Turns out, it doesn't work so well in anything but paws sized like the one in my own example. Off course in hindsight, it's my own fault for choosing the 2x2 so arbitrarily. Here's a nice example of where it goes wrong: a nail is being recognized as a toe and the 'heel' is so wide, it gets recognized twice! The paw is too large, so taking a 2x2 size with no overlap, causes some toes to be detected twice. The other way around, in small dogs it often fails to find a 5th toe, which I suspect is being caused by the 2x2 area being too large. After trying the current solution on all my measurements I came to the staggering conclusion that for nearly all my small dogs it didn't find a 5th toe and that in over 50% of the impacts for the large dogs it would find more! So clearly I need to change it. My own guess was changing the size of the neighborhood to something smaller for small dogs and larger for large dogs. But generate_binary_structure wouldn't let me change the size of the array. Therefore, I'm hoping that anyone else has a better suggestion for locating the toes, perhaps having the toe area scale with the paw size?
I detected the peaks using a local maximum filter. Here is the result on your first dataset of 4 paws: I also ran it on the second dataset of 9 paws and it worked as well. Here is how you do it: import numpy as np from scipy.ndimage.filters import maximum_filter from scipy.ndimage.morphology import generate_binary_structure, binary_erosion import matplotlib.pyplot as pp #for some reason I had to reshape. Numpy ignored the shape header. paws_data = np.loadtxt("paws.txt").reshape(4,11,14) #getting a list of images paws = [p.squeeze() for p in np.vsplit(paws_data,4)] def detect_peaks(image): """ Takes an image and detect the peaks usingthe local maximum filter. Returns a boolean mask of the peaks (i.e. 1 when the pixel's value is the neighborhood maximum, 0 otherwise) """ # define an 8-connected neighborhood neighborhood = generate_binary_structure(2,2) #apply the local maximum filter; all pixel of maximal value #in their neighborhood are set to 1 local_max = maximum_filter(image, footprint=neighborhood)==image #local_max is a mask that contains the peaks we are #looking for, but also the background. #In order to isolate the peaks we must remove the background from the mask. #we create the mask of the background background = (image==0) #a little technicality: we must erode the background in order to #successfully subtract it form local_max, otherwise a line will #appear along the background border (artifact of the local maximum filter) eroded_background = binary_erosion(background, structure=neighborhood, border_value=1) #we obtain the final mask, containing only peaks, #by removing the background from the local_max mask (xor operation) detected_peaks = local_max ^ eroded_background return detected_peaks #applying the detection and plotting results for i, paw in enumerate(paws): detected_peaks = detect_peaks(paw) pp.subplot(4,2,(2*i+1)) pp.imshow(paw) pp.subplot(4,2,(2*i+2) ) pp.imshow(detected_peaks) pp.show() All you need to do after is use scipy.ndimage.measurements.label on the mask to label all distinct objects. Then you'll be able to play with them individually. Note that the method works well because the background is not noisy. If it were, you would detect a bunch of other unwanted peaks in the background. Another important factor is the size of the neighborhood. You will need to adjust it if the peak size changes (the should remain roughly proportional).
Solution Data file: paw.txt. Source code: from scipy import * from operator import itemgetter n = 5 # how many fingers are we looking for d = loadtxt("paw.txt") width, height = d.shape # Create an array where every element is a sum of 2x2 squares. fourSums = d[:-1,:-1] + d[1:,:-1] + d[1:,1:] + d[:-1,1:] # Find positions of the fingers. # Pair each sum with its position number (from 0 to width*height-1), pairs = zip(arange(width*height), fourSums.flatten()) # Sort by descending sum value, filter overlapping squares def drop_overlapping(pairs): no_overlaps = [] def does_not_overlap(p1, p2): i1, i2 = p1[0], p2[0] r1, col1 = i1 / (width-1), i1 % (width-1) r2, col2 = i2 / (width-1), i2 % (width-1) return (max(abs(r1-r2),abs(col1-col2)) >= 2) for p in pairs: if all(map(lambda prev: does_not_overlap(p,prev), no_overlaps)): no_overlaps.append(p) return no_overlaps pairs2 = drop_overlapping(sorted(pairs, key=itemgetter(1), reverse=True)) # Take the first n with the heighest values positions = pairs2[:n] # Print results print d, "\n" for i, val in positions: row = i / (width-1) column = i % (width-1) print "sum = %f # %d,%d (%d)" % (val, row, column, i) print d[row:row+2,column:column+2], "\n" Output without overlapping squares. It seems that the same areas are selected as in your example. Some comments The tricky part is to calculate sums of all 2x2 squares. I assumed you need all of them, so there might be some overlapping. I used slices to cut the first/last columns and rows from the original 2D array, and then overlapping them all together and calculating sums. To understand it better, imaging a 3x3 array: >>> a = arange(9).reshape(3,3) ; a array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) Then you can take its slices: >>> a[:-1,:-1] array([[0, 1], [3, 4]]) >>> a[1:,:-1] array([[3, 4], [6, 7]]) >>> a[:-1,1:] array([[1, 2], [4, 5]]) >>> a[1:,1:] array([[4, 5], [7, 8]]) Now imagine you stack them one above the other and sum elements at the same positions. These sums will be exactly the same sums over the 2x2 squares with the top-left corner in the same position: >>> sums = a[:-1,:-1] + a[1:,:-1] + a[:-1,1:] + a[1:,1:]; sums array([[ 8, 12], [20, 24]]) When you have the sums over 2x2 squares, you can use max to find the maximum, or sort, or sorted to find the peaks. To remember positions of the peaks I couple every value (the sum) with its ordinal position in a flattened array (see zip). Then I calculate row/column position again when I print the results. Notes I allowed for the 2x2 squares to overlap. Edited version filters out some of them such that only non-overlapping squares appear in the results. Choosing fingers (an idea) Another problem is how to choose what is likely to be fingers out of all the peaks. I have an idea which may or may not work. I don't have time to implement it right now, so just pseudo-code. I noticed that if the front fingers stay on almost a perfect circle, the rear finger should be inside of that circle. Also, the front fingers are more or less equally spaced. We may try to use these heuristic properties to detect the fingers. Pseudo code: select the top N finger candidates (not too many, 10 or 12) consider all possible combinations of 5 out of N (use itertools.combinations) for each combination of 5 fingers: for each finger out of 5: fit the best circle to the remaining 4 => position of the center, radius check if the selected finger is inside of the circle check if the remaining four are evenly spread (for example, consider angles from the center of the circle) assign some cost (penalty) to this selection of 4 peaks + a rear finger (consider, probably weighted: circle fitting error, if the rear finger is inside, variance in the spreading of the front fingers, total intensity of 5 peaks) choose a combination of 4 peaks + a rear peak with the lowest penalty This is a brute-force approach. If N is relatively small, then I think it is doable. For N=12, there are C_12^5 = 792 combinations, times 5 ways to select a rear finger, so 3960 cases to evaluate for every paw.
This is an image registration problem. The general strategy is: Have a known example, or some kind of prior on the data. Fit your data to the example, or fit the example to your data. It helps if your data is roughly aligned in the first place. Here's a rough and ready approach, "the dumbest thing that could possibly work": Start with five toe coordinates in roughly the place you expect. With each one, iteratively climb to the top of the hill. i.e. given current position, move to maximum neighbouring pixel, if its value is greater than current pixel. Stop when your toe coordinates have stopped moving. To counteract the orientation problem, you could have 8 or so initial settings for the basic directions (North, North East, etc). Run each one individually and throw away any results where two or more toes end up at the same pixel. I'll think about this some more, but this kind of thing is still being researched in image processing - there are no right answers! Slightly more complex idea: (weighted) K-means clustering. It's not that bad. Start with five toe coordinates, but now these are "cluster centres". Then iterate until convergence: Assign each pixel to the closest cluster (just make a list for each cluster). Calculate the center of mass of each cluster. For each cluster, this is: Sum(coordinate * intensity value)/Sum(coordinate) Move each cluster to the new centre of mass. This method will almost certainly give much better results, and you get the mass of each cluster which may help in identifying the toes. (Again, you've specified the number of clusters up front. With clustering you have to specify the density one way or another: Either choose the number of clusters, appropriate in this case, or choose a cluster radius and see how many you end up with. An example of the latter is mean-shift.) Sorry about the lack of implementation details or other specifics. I would code this up but I've got a deadline. If nothing else has worked by next week let me know and I'll give it a shot.
Using persistent homology to analyze your data set I get the following result (click to enlarge): This is the 2D-version of the peak detection method described in this SO answer. The above figure simply shows 0-dimensional persistent homology classes sorted by persistence. I did upscale the original dataset by a factor of 2 using scipy.misc.imresize(). However, note that I did consider the four paws as one dataset; splitting it into four would make the problem easier. Methodology. The idea behind this quite simple: Consider the function graph of the function that assigns each pixel its level. It looks like this: Now consider a water level at height 255 that continuously descents to lower levels. At local maxima islands pop up (birth). At saddle points two islands merge; we consider the lower island to be merged to the higher island (death). The so-called persistence diagram (of the 0-th dimensional homology classes, our islands) depicts death- over birth-values of all islands: The persistence of an island is then the difference between the birth- and death-level; the vertical distance of a dot to the grey main diagonal. The figure labels the islands by decreasing persistence. The very first picture shows the locations of births of the islands. This method not only gives the local maxima but also quantifies their "significance" by the above mentioned persistence. One would then filter out all islands with a too low persistence. However, in your example every island (i.e., every local maximum) is a peak you look for. Python code can be found here.
This problem has been studied in some depth by physicists. There is a good implementation in ROOT. Look at the TSpectrum classes (especially TSpectrum2 for your case) and the documentation for them. References: M.Morhac et al.: Background elimination methods for multidimensional coincidence gamma-ray spectra. Nuclear Instruments and Methods in Physics Research A 401 (1997) 113-132. M.Morhac et al.: Efficient one- and two-dimensional Gold deconvolution and its application to gamma-ray spectra decomposition. Nuclear Instruments and Methods in Physics Research A 401 (1997) 385-408. M.Morhac et al.: Identification of peaks in multidimensional coincidence gamma-ray spectra. Nuclear Instruments and Methods in Research Physics A 443(2000), 108-125. ...and for those who don't have access to a subscription to NIM: Spectrum.doc SpectrumDec.ps.gz SpectrumSrc.ps.gz SpectrumBck.ps.gz
I'm sure you have enough to go on by now, but I can't help but suggest using the k-means clustering method. k-means is an unsupervised clustering algorithm which will take you data (in any number of dimensions - I happen to do this in 3D) and arrange it into k clusters with distinct boundaries. It's nice here because you know exactly how many toes these canines (should) have. Additionally, it's implemented in Scipy which is really nice (http://docs.scipy.org/doc/scipy/reference/cluster.vq.html). Here's an example of what it can do to spatially resolve 3D clusters: What you want to do is a bit different (2D and includes pressure values), but I still think you could give it a shot.
Here is an idea: you calculate the (discrete) Laplacian of the image. I would expect it to be (negative and) large at maxima, in a way that is more dramatic than in the original images. Thus, maxima could be easier to find. Here is another idea: if you know the typical size of the high-pressure spots, you can first smooth your image by convoluting it with a Gaussian of the same size. This may give you simpler images to process.
Just a couple of ideas off the top of my head: take the gradient (derivative) of the scan, see if that eliminates the false calls take the maximum of the local maxima You might also want to take a look at OpenCV, it's got a fairly decent Python API and might have some functions you'd find useful.
thanks for the raw data. I'm on the train and this is as far as I've gotten (my stop is coming up). I massaged your txt file with regexps and have plopped it into a html page with some javascript for visualization. I'm sharing it here because some, like myself, might find it more readily hackable than python. I think a good approach will be scale and rotation invariant, and my next step will be to investigate mixtures of gaussians. (each paw pad being the center of a gaussian). <html> <head> <script type="text/javascript" src="http://vis.stanford.edu/protovis/protovis-r3.2.js"></script> <script type="text/javascript"> var heatmap = [[[0,0,0,0,0,0,0,4,4,0,0,0,0], [0,0,0,0,0,7,14,22,18,7,0,0,0], [0,0,0,0,11,40,65,43,18,7,0,0,0], [0,0,0,0,14,61,72,32,7,4,11,14,4], [0,7,14,11,7,22,25,11,4,14,65,72,14], [4,29,79,54,14,7,4,11,18,29,79,83,18], [0,18,54,32,18,43,36,29,61,76,25,18,4], [0,4,7,7,25,90,79,36,79,90,22,0,0], [0,0,0,0,11,47,40,14,29,36,7,0,0], [0,0,0,0,4,7,7,4,4,4,0,0,0] ],[ [0,0,0,4,4,0,0,0,0,0,0,0,0], [0,0,11,18,18,7,0,0,0,0,0,0,0], [0,4,29,47,29,7,0,4,4,0,0,0,0], [0,0,11,29,29,7,7,22,25,7,0,0,0], [0,0,0,4,4,4,14,61,83,22,0,0,0], [4,7,4,4,4,4,14,32,25,7,0,0,0], [4,11,7,14,25,25,47,79,32,4,0,0,0], [0,4,4,22,58,40,29,86,36,4,0,0,0], [0,0,0,7,18,14,7,18,7,0,0,0,0], [0,0,0,0,4,4,0,0,0,0,0,0,0], ],[ [0,0,0,4,11,11,7,4,0,0,0,0,0], [0,0,0,4,22,36,32,22,11,4,0,0,0], [4,11,7,4,11,29,54,50,22,4,0,0,0], [11,58,43,11,4,11,25,22,11,11,18,7,0], [11,50,43,18,11,4,4,7,18,61,86,29,4], [0,11,18,54,58,25,32,50,32,47,54,14,0], [0,0,14,72,76,40,86,101,32,11,7,4,0], [0,0,4,22,22,18,47,65,18,0,0,0,0], [0,0,0,0,4,4,7,11,4,0,0,0,0], ],[ [0,0,0,0,4,4,4,0,0,0,0,0,0], [0,0,0,4,14,14,18,7,0,0,0,0,0], [0,0,0,4,14,40,54,22,4,0,0,0,0], [0,7,11,4,11,32,36,11,0,0,0,0,0], [4,29,36,11,4,7,7,4,4,0,0,0,0], [4,25,32,18,7,4,4,4,14,7,0,0,0], [0,7,36,58,29,14,22,14,18,11,0,0,0], [0,11,50,68,32,40,61,18,4,4,0,0,0], [0,4,11,18,18,43,32,7,0,0,0,0,0], [0,0,0,0,4,7,4,0,0,0,0,0,0], ],[ [0,0,0,0,0,0,4,7,4,0,0,0,0], [0,0,0,0,4,18,25,32,25,7,0,0,0], [0,0,0,4,18,65,68,29,11,0,0,0,0], [0,4,4,4,18,65,54,18,4,7,14,11,0], [4,22,36,14,4,14,11,7,7,29,79,47,7], [7,54,76,36,18,14,11,36,40,32,72,36,4], [4,11,18,18,61,79,36,54,97,40,14,7,0], [0,0,0,11,58,101,40,47,108,50,7,0,0], [0,0,0,4,11,25,7,11,22,11,0,0,0], [0,0,0,0,0,4,0,0,0,0,0,0,0], ],[ [0,0,4,7,4,0,0,0,0,0,0,0,0], [0,0,11,22,14,4,0,4,0,0,0,0,0], [0,0,7,18,14,4,4,14,18,4,0,0,0], [0,4,0,4,4,0,4,32,54,18,0,0,0], [4,11,7,4,7,7,18,29,22,4,0,0,0], [7,18,7,22,40,25,50,76,25,4,0,0,0], [0,4,4,22,61,32,25,54,18,0,0,0,0], [0,0,0,4,11,7,4,11,4,0,0,0,0], ],[ [0,0,0,0,7,14,11,4,0,0,0,0,0], [0,0,0,4,18,43,50,32,14,4,0,0,0], [0,4,11,4,7,29,61,65,43,11,0,0,0], [4,18,54,25,7,11,32,40,25,7,11,4,0], [4,36,86,40,11,7,7,7,7,25,58,25,4], [0,7,18,25,65,40,18,25,22,22,47,18,0], [0,0,4,32,79,47,43,86,54,11,7,4,0], [0,0,0,14,32,14,25,61,40,7,0,0,0], [0,0,0,0,4,4,4,11,7,0,0,0,0], ],[ [0,0,0,0,4,7,11,4,0,0,0,0,0], [0,4,4,0,4,11,18,11,0,0,0,0,0], [4,11,11,4,0,4,4,4,0,0,0,0,0], [4,18,14,7,4,0,0,4,7,7,0,0,0], [0,7,18,29,14,11,11,7,18,18,4,0,0], [0,11,43,50,29,43,40,11,4,4,0,0,0], [0,4,18,25,22,54,40,7,0,0,0,0,0], [0,0,4,4,4,11,7,0,0,0,0,0,0], ],[ [0,0,0,0,0,7,7,7,7,0,0,0,0], [0,0,0,0,7,32,32,18,4,0,0,0,0], [0,0,0,0,11,54,40,14,4,4,22,11,0], [0,7,14,11,4,14,11,4,4,25,94,50,7], [4,25,65,43,11,7,4,7,22,25,54,36,7], [0,7,25,22,29,58,32,25,72,61,14,7,0], [0,0,4,4,40,115,68,29,83,72,11,0,0], [0,0,0,0,11,29,18,7,18,14,4,0,0], [0,0,0,0,0,4,0,0,0,0,0,0,0], ] ]; </script> </head> <body> <script type="text/javascript+protovis"> for (var a=0; a < heatmap.length; a++) { var w = heatmap[a][0].length, h = heatmap[a].length; var vis = new pv.Panel() .width(w * 6) .height(h * 6) .strokeStyle("#aaa") .lineWidth(4) .antialias(true); vis.add(pv.Image) .imageWidth(w) .imageHeight(h) .image(pv.Scale.linear() .domain(0, 99, 100) .range("#000", "#fff", '#ff0a0a') .by(function(i, j) heatmap[a][j][i])); vis.render(); } </script> </body> </html>
Physicist's solution: Define 5 paw-markers identified by their positions X_i and init them with random positions. Define some energy function combining some award for location of markers in paws' positions with some punishment for overlap of markers; let's say: E(X_i;S)=-Sum_i(S(X_i))+alfa*Sum_ij (|X_i-Xj|<=2*sqrt(2)?1:0) (S(X_i) is the mean force in 2x2 square around X_i, alfa is a parameter to be peaked experimentally) Now time to do some Metropolis-Hastings magic: 1. Select random marker and move it by one pixel in random direction. 2. Calculate dE, the difference of energy this move caused. 3. Get an uniform random number from 0-1 and call it r. 4. If dE<0 or exp(-beta*dE)>r, accept the move and go to 1; if not, undo the move and go to 1. This should be repeated until the markers will converge to paws. Beta controls the scanning to optimizing tradeoff, so it should be also optimized experimentally; it can be also constantly increased with the time of simulation (simulated annealing).
Just wanna tell you guys there is a nice option to find local maxima in images with python: from skimage.feature import peak_local_max or for skimage 0.8.0: from skimage.feature.peak import peak_local_max http://scikit-image.org/docs/0.8.0/api/skimage.feature.peak.html
It's probably worth to try with neural networks if you are able to create some training data... but this needs many samples annotated by hand.
Heres another approach that I used when doing something similar for a large telescope: 1) Search for the highest pixel. Once you have that, search around that for the best fit for 2x2 (maybe maximizing the 2x2 sum), or do a 2d gaussian fit inside the sub region of say 4x4 centered on the highest pixel. Then set those 2x2 pixels you have found to zero (or maybe 3x3) around the peak center go back to 1) and repeat till the highest peak falls below a noise threshold, or you have all the toes you need
a rough outline... you'd probably want to use a connected components algorithm to isolate each paw region. wiki has a decent description of this (with some code) here: http://en.wikipedia.org/wiki/Connected_Component_Labeling you'll have to make a decision about whether to use 4 or 8 connectedness. personally, for most problems i prefer 6-connectedness. anyway, once you've separated out each "paw print" as a connected region, it should be easy enough to iterate through the region and find the maxima. once you've found the maxima, you could iteratively enlarge the region until you reach a predetermined threshold in order to identify it as a given "toe". one subtle problem here is that as soon as you start using computer vision techniques to identify something as a right/left/front/rear paw and you start looking at individual toes, you have to start taking rotations, skews, and translations into account. this is accomplished through the analysis of so-called "moments". there are a few different moments to consider in vision applications: central moments: translation invariant normalized moments: scaling and translation invariant hu moments: translation, scale, and rotation invariant more information about moments can be found by searching "image moments" on wiki.
Perhaps you can use something like Gaussian Mixture Models. Here's a Python package for doing GMMs (just did a Google search) http://www.ar.media.kyoto-u.ac.jp/members/david/softwares/em/
Interesting problem. The solution I would try is the following. Apply a low pass filter, such as convolution with a 2D gaussian mask. This will give you a bunch of (probably, but not necessarily floating point) values. Perform a 2D non-maximal suppression using the known approximate radius of each paw pad (or toe). This should give you the maximal positions without having multiple candidates which are close together. Just to clarify, the radius of the mask in step 1 should also be similar to the radius used in step 2. This radius could be selectable, or the vet could explicitly measure it beforehand (it will vary with age/breed/etc). Some of the solutions suggested (mean shift, neural nets, and so on) probably will work to some degree, but are overly complicated and probably not ideal.
It seems you can cheat a bit using jetxee's algorithm. He is finding the first three toes fine, and you should be able to guess where the fourth is based off that.
Well, here's some simple and not terribly efficient code, but for this size of a data set it is fine. import numpy as np grid = np.array([[0,0,0,0,0,0,0,0,0,0,0,0,0,0], [0,0,0,0,0,0,0,0,0.4,0.4,0.4,0,0,0], [0,0,0,0,0.4,1.4,1.4,1.8,0.7,0,0,0,0,0], [0,0,0,0,0.4,1.4,4,5.4,2.2,0.4,0,0,0,0], [0,0,0.7,1.1,0.4,1.1,3.2,3.6,1.1,0,0,0,0,0], [0,0.4,2.9,3.6,1.1,0.4,0.7,0.7,0.4,0.4,0,0,0,0], [0,0.4,2.5,3.2,1.8,0.7,0.4,0.4,0.4,1.4,0.7,0,0,0], [0,0,0.7,3.6,5.8,2.9,1.4,2.2,1.4,1.8,1.1,0,0,0], [0,0,1.1,5,6.8,3.2,4,6.1,1.8,0.4,0.4,0,0,0], [0,0,0.4,1.1,1.8,1.8,4.3,3.2,0.7,0,0,0,0,0], [0,0,0,0,0,0.4,0.7,0.4,0,0,0,0,0,0]]) arr = [] for i in xrange(grid.shape[0] - 1): for j in xrange(grid.shape[1] - 1): tot = grid[i][j] + grid[i+1][j] + grid[i][j+1] + grid[i+1][j+1] arr.append([(i,j),tot]) best = [] arr.sort(key = lambda x: x[1]) for i in xrange(5): best.append(arr.pop()) badpos = set([(best[-1][0][0]+x,best[-1][0][1]+y) for x in [-1,0,1] for y in [-1,0,1] if x != 0 or y != 0]) for j in xrange(len(arr)-1,-1,-1): if arr[j][0] in badpos: arr.pop(j) for item in best: print grid[item[0][0]:item[0][0]+2,item[0][1]:item[0][1]+2] I basically just make an array with the position of the upper-left and the sum of each 2x2 square and sort it by the sum. I then take the 2x2 square with the highest sum out of contention, put it in the best array, and remove all other 2x2 squares that used any part of this just removed 2x2 square. It seems to work fine except with the last paw (the one with the smallest sum on the far right in your first picture), it turns out that there are two other eligible 2x2 squares with a larger sum (and they have an equal sum to each other). One of them is still selects one square from your 2x2 square, but the other is off to the left. Fortunately, by luck we see to be choosing more of the one that you would want, but this may require some other ideas to be used to get what you actually want all of the time.
I am not sure this answers the question, but it seems like you can just look for the n highest peaks that don't have neighbors. Here is the gist. Note that it's in Ruby, but the idea should be clear. require 'pp' NUM_PEAKS = 5 NEIGHBOR_DISTANCE = 1 data = [[1,2,3,4,5], [2,6,4,4,6], [3,6,7,4,3], ] def tuples(matrix) tuples = [] matrix.each_with_index { |row, ri| row.each_with_index { |value, ci| tuples << [value, ri, ci] } } tuples end def neighbor?(t1, t2, distance = 1) [1,2].each { |axis| return false if (t1[axis] - t2[axis]).abs > distance } true end # convert the matrix into a sorted list of tuples (value, row, col), highest peaks first sorted = tuples(data).sort_by { |tuple| tuple.first }.reverse # the list of peaks that don't have neighbors non_neighboring_peaks = [] sorted.each { |candidate| # always take the highest peak if non_neighboring_peaks.empty? non_neighboring_peaks << candidate puts "took the first peak: #{candidate}" else # check that this candidate doesn't have any accepted neighbors is_ok = true non_neighboring_peaks.each { |accepted| if neighbor?(candidate, accepted, NEIGHBOR_DISTANCE) is_ok = false break end } if is_ok non_neighboring_peaks << candidate puts "took #{candidate}" else puts "denied #{candidate}" end end } pp non_neighboring_peaks
Maybe a naive approach is sufficient here: Build a list of all 2x2 squares on your plane, order them by their sum (in descending order). First, select the highest-valued square into your "paw list". Then, iteratively pick 4 of the next-best squares that don't intersect with any of the previously found squares.
What if you proceed step by step: you first locate the global maximum, process if needed the surrounding points given their value, then set the found region to zero, and repeat for the next one.