Finding the transformation between two matched shapes - python

Given two images with similar blobs, is there a simple way to find the transformation between them? As an example, I have two images like the following:
The right is the output of a neural network, while the left is an approximate truth (from a shape perspective only). I'm looking to find the transformation to move the left image to best match the position and orientation of the right. In this case, a rotation of some 150-160 degrees CC, and a translation up and right.
This seems to be a shape matching problem with some added constraints, but I'm wondering if there is a way to do it without having to perform a bunch of test transformations/sliding window. Most of the examples I've found have been for classification, and the positional ones are not rotation tolerant.
Ideas I have had so far... I've looked at Hu moments and openCV's matchShapes, which seem like they would get me the similarity (and mirroring, which is a possibility in the data and thus desirable), but I'm not sure how to use them without still using some sort of window. Another option would be SIFT or another feature based approach, but I don't think it would be particularly good given the low information volume of the data and the less similar shapes (Hough transform as a base?). Another brute force method might be to calculate the difference in the centroids, move the left image over the right and then rotate until I find the orientation with the maximum Jaccard index (or use the moments to find the rotation?), but that's the same kind of thing I'm trying to avoid (and it would always be a bit off given the inaccuracy of the NN predictions).
My first instinct is just to make a neural network to do it, but I feel like there is a better answer that I'm just missing.


Python photo mosaic with abstractly shaped mosaics

Image mosaics use a set of predefined squared images to build a larger image (example here).
There are a lot of solutions and it's quite trivial to achieve this effect. However, it becomes much harder with the following constraints:
The shape of the original mosaics is abstract. Any convex polygon could do.
Each mosaic can only be used once.
There is no need for the mosaics to be absolutely packed (i.e. occupying 100% of the canvas), but they should be as packed as possible without overlapping.
I'm trying to automatize the ancient art of tesselation, specifically the Opus palladianum technique.
My idea is to use simulated annealing or some other heuristic to optimize the position and rotation of each irregular mosaic, swaping two in each iteration, trying to minimize some energy function that reflects the similarity to the target image as well as the "packness" of the tiles.
I'm trying to achieve this in python, any ideas and help would be greatly appreciated.
I expect that you may probably use GA (Genetic Algorithm) with a "non-overlapping" constraint to do this job.
Parameters for individual (each convex polygon) are:
initial position
(size ?)
And your fit function will be build to give best note to each individual when polygon are not overlapping (and close to other individual)
You may see this video and this one as example.

calculate particle size distribution from AFM measurements

I am trying to obtain a radius and diameter distribution from some AFM (Atomic force microscopy) measurements. So far I am trying out Gwyddion, ImageJ and different workflows in Matlab.
At the moment the best results I have found is to use Gwyddion and to take the Phase image, high pass filter it and then try an edge detection with 'Laplacian of Gaussian'. The result is shown in figure 3. However this image is still too noisy and doesnt really capture the edges of all the particles. (some are merged together others do not have a clear perimeter).
In the end I need an image which segments each of the spherical particles which I can use for blob detection/analysis to obtain size/radius information.
Can anyone recommend a different method?
I would definitely try a Granulometry, it was designed for something really similar. There is a good explanation of granulometry here starting page 158.
The granulometry will perform consecutive / increasing openings that will erase the different patterns according to their dimensions. The bigger the pattern, the latter it will be erased. It will give you a curve that represent the pattern dimension distributions in your image, so exactly what you want.
However, it will not give you any information about the position inside the image. If you want to have a rough modeling of the blobs present in your image, you can take a look to the Ultimate Opening.
Maybe you can use Avizo, it's a powerful software for dealing with image issues, especially for three D data (CT)

Using external pose estimates to improve stationary marker contour tracking

Suppose that I have an array of sensors that allows me to come up with an estimate of my pose relative to some fixed rectangular marker. I thus have an estimate as to what the contour of the marker will look like in the image from the camera. How might I use this to better detect contours?
The problem that I'm trying to overcome is that sometimes, the marker is occluded, perhaps by a line cutting across it. As such, I'm left with two contours that if merged, would yield the marker. I've tried opening and closing to try and fix the problem, but it isn't robust to the different types of lighting.
One approach that I'm considering is to use the predicted contour, and perform a local convolution with the gradient of the image, to find my true pose.
Any thoughts or advice?
The obvious advantage of having a pose estimate is that it restricts the image region for searching your target.
Next, if your problem is occlusion, you then need to model that explicitly, rather than just try to paper it over with image processing tricks: add to your detector objective function a term that expresses what your target may look like when partially occluded. This can be either an explicit "occluded appearance" model, or implicit - e.g. using an algorithm that is able to recognize visible portions of the targets independently of the whole of it.

Image registration using python and cross-correlation

I got two images showing exaktly the same content: 2D-gaussian-shaped spots. I call these two 16-bit png-files "left.png" and "right.png". But as they are obtained thru an slightly different optical setup, the corresponding spots (physically the same) appear at slightly different positions. Meaning the right is slightly stretched, distorted, or so, in a non-linear way. Therefore I would like to get the transformation from left to right.
So for every pixel on the left side with its x- and y-coordinate I want a function giving me the components of the displacement-vector that points to the corresponding pixel on the right side.
In a former approach I tried to get the positions of the corresponding spots to obtain the relative distances deltaX and deltaY. These distances then I fitted to the taylor-expansion up to second order of T(x,y) giving me the x- and y-component of the displacement vector for every pixel (x,y) on the left, pointing to corresponding pixel (x',y') on the right.
To get a more general result I would like to use normalized cross-correlation. For this I multiply every pixelvalue from left with a corresponding pixelvalue from right and sum over these products. The transformation I am looking for should connect the pixels that will maximize the sum. So when the sum is maximzied, I know that I multiplied the corresponding pixels.
I really tried a lot with this, but didn't manage. My question is if somebody of you has an idea or has ever done something similar.
import numpy as np
import Image
left = np.array('left.png'))
right = np.array('right.png'))
# for normalization (
left = (left - left.mean()) / left.std()
right = (right - right.mean()) / right.std()
Please let me know if I can make this question more clear. I still have to check out how to post questions using latex.
Thank you very much for input.
I'm afraid, in most cases 16-bit images appear just black (at least on systems I use) :( but of course there is data in there.
I try to clearify my question. I am looking for a vector-field with displacement-vectors that point from every pixel in left.png to the corresponding pixel in right.png. My problem is, that I am not sure about the constraints I have.
where vector r (components x and y) points to a pixel in left.png and vector r-prime (components x-prime and y-prime) points to the corresponding pixel in right.png. for every r there is a displacement-vector.
What I did earlier was, that I found manually components of vector-field d and fitted them to a polynom second degree:
So I fitted:
Does this make sense to you? Is it possible to get all the delta-x(x,y) and delta-y(x,y) with cross-correlation? The cross-correlation should be maximized if the corresponding pixels are linked together thru the displacement-vectors, right?
So the algorithm I was thinking of is as follows:
Deform right.png
Get the value of cross-correlation
Deform right.png further
Get the value of cross-correlation and compare to value before
If it's greater, good deformation, if not, redo deformation and do something else
After maximzied the cross-correlation value, know what deformation there is :)
About deformation: could one do first a shift along x- and y-direction to maximize cross-correlation, then in a second step stretch or compress x- and y-dependant and in a third step deform quadratic x- and y-dependent and repeat this procedure iterativ?? I really have a problem to do this with integer-coordinates. Do you think I would have to interpolate the picture to obtain a continuous distribution?? I have to think about this again :( Thanks to everybody for taking part :)
OpenCV (and with it the python Opencv binding) has a StarDetector class which implements this algorithm.
As an alternative you might have a look at the OpenCV SIFT class, which stands for Scale Invariant Feature Transform.
Regarding your comment, I understand that the "right" transformation will maximize the cross-correlation between the images, but I don't understand how you choose the set of transformations over which to maximize. Maybe if you know the coordinates of three matching points (either by some heuristics or by choosing them by hand), and if you expect affinity, you could use something like cv2.getAffineTransform to have a good initial transformation for your maximization process. From there you could use small additional transformations to have a set over which to maximize. But this approach seems to me like re-inventing something which SIFT could take care of.
To actually transform your test image you can use cv2.warpAffine, which also can take care of border values (e.g. pad with 0). To calculate the cross-correlation you could use scipy.signal.correlate2d.
Your latest update did indeed clarify some points for me. But I think that a vector field of displacements is not the most natural thing to look for, and this is also where the misunderstanding came from. I was thinking more along the lines of a global transformation T, which applied to any point (x,y) of the left image gives (x',y')=T(x,y) on the right side, but T has the same analytical form for every pixel. For example, this could be a combination of a displacement, rotation, scaling, maybe some perspective transformation. I cannot say whether it is realistic or not to hope to find such a transformation, this depends on your setup, but if the scene is physically the same on both sides I would say it is reasonable to expect some affine transformation. This is why I suggested cv2.getAffineTransform. It is of course trivial to calculate your displacement Vector field from such a T, as this is just T(x,y)-(x,y).
The big advantage would be that you have only very few degrees of freedom for your transformation, instead of, I would argue, 2N degrees of freedom in the displacement vector field, where N is the number of bright spots.
If it is indeed an affine transformation, I would suggest some algorithm like this:
identify three bright and well isolated spots on the left
for each of these three spots, define a bounding box so that you can hope to identify the corresponding spot within it in the right image
find the coordinates of the corresponding spots, e.g. with some correlation method as implemented in cv2.matchTemplate or by also just finding the brightest spot within the bounding box.
once you have three matching pairs of coordinates, calculate the affine transformation which transforms one set into the other with cv2.getAffineTransform.
apply this affine transformation to the left image, as a check if you found the right one you could calculate if the overall normalized cross-correlation is above some threshold or drops significantly if you displace one image with respect to the other.
if you wish and still need it, calculate the displacement vector field trivially from your transformation T.
It seems cv2.getAffineTransform expects an awkward input data type 'float32'. Let's assume the source coordinates are (sxi,syi) and destination (dxi,dyi) with i=0,1,2, then what you need is
src = np.array( ((sx0,sy0),(sx1,sy1),(sx2,sy2)), dtype='float32' )
dst = np.array( ((dx0,dy0),(dx1,dy1),(dx2,dy2)), dtype='float32' )
result = cv2.getAffineTransform(src,dst)
I don't think a cross correlation is going to help here, as it only gives you a single best shift for the whole image. There are three alternatives I would consider:
Do a cross correlation on sub-clusters of dots. Take, for example, the three dots in the top right and find the optimal x-y shift through cross-correlation. This gives you the rough transform for the top left. Repeat for as many clusters as you can to obtain a reasonable map of your transformations. Fit this with your Taylor expansion and you might get reasonably close. However, to have your cross-correlation work in any way, the difference in displacement between spots must be less than the extend of the spot, else you can never get all spots in a cluster to overlap simultaneously with a single displacement. Under these conditions, option 2 might be more suitable.
If the displacements are relatively small (which I think is a condition for option 1), then we might assume that for a given spot in the left image, the closest spot in the right image is the corresponding spot. Thus, for every spot in the left image, we find the nearest spot in the right image and use that as the displacement in that location. From the 40-something well distributed displacement vectors we can obtain a reasonable approximation of the actual displacement by fitting your Taylor expansion.
This is probably the slowest method, but might be the most robust if you have large displacements (and option 2 thus doesn't work): use something like an evolutionary algorithm to find the displacement. Apply a random transformation, compute the remaining error (you might need to define this as sum of the smallest distance between spots in your original and transformed image), and improve your transformation with those results. If your displacements are rather large you might need a very broad search as you'll probably get lots of local minima in your landscape.
I would try option 2 as it seems your displacements might be small enough to easily associate a spot in the left image with a spot in the right image.
I assume your optics induce non linear distortions and having two separate beampaths (different filters in each?) will make the relationship between the two images even more non-linear. The affine transformation PiQuer suggests might give a reasonable approach but can probably never completely cover the actual distortions.
I think your approach of fitting to a low order Taylor polynomial is fine. This works for all my applications with similar conditions. Highest orders probably should be something like xy^2 and x^2y; anything higher than that you won't notice.
Alternatively, you might be able to calibrate the distortions for each image first, and then do your experiments. This way you are not dependent on the distribution of you dots, but can use a high resolution reference image to get the best description of your transformation.
Option 2 above still stands as my suggestion for getting the two images to overlap. This can be fully automated and I'm not sure what you mean when you want a more general result.
Update 2
You comment that you have trouble matching dots in the two images. If this is the case, I think your iterative cross-correlation approach may not be very robust either. You have very small dots, so overlap between them will only occur if the difference between the two images is small.
In principle there is nothing wrong with your proposed solution, but whether it works or not strongly depends on the size of your deformations and the robustness of your optimization algorithm. If you start off with very little overlap, then it may be hard to find a good starting point for your optimization. Yet if you have sufficient overlap to begin with, then you should have been able to find the deformation per dot first, but in a comment you indicate that this doesn't work.
Perhaps you can go for a mixed solution: find the cross correlation of clusters of dots to get a starting point for your optimization, and then tweak the deformation using something like the procedure you describe in your update. Thus:
For a NxN pixel segment find the shift between the left and right images
Repeat for, say, 16 of those segments
Compute an approximation of the deformation using those 16 points
Use this as the starting point of your optimization approach
You might want to have a look at bunwarpj which already does what you're trying to do. It's not python but I use it in exactly this context. You can export a plain text spline transformation and use it if you wish to do so.

Detect the location of an image within a larger image

How do you detect the location of an image within a larger image? I have an unmodified copy of the image. This image is then changed to an arbitrary resolution and placed randomly within a much larger image which is of an arbitrary size. No other transformations are conducted on the resulting image. Python code would be ideal, and it would probably require libgd. If you know of a good approach to this problem you'll get a +1.
There is a quick and dirty solution, and that's simply sliding a window over the target image and computing some measure of similarity at each location, then picking the location with the highest similarity. Then you compare the similarity to a threshold, if the score is above the threshold, you conclude the image is there and that's the location; if the score is below the threshold, then the image isn't there.
As a similarity measure, you can use normalized correlation or sum of squared differences (aka L2 norm). As people mentioned, this will not deal with scale changes. So you also rescale your original image multiple times and repeat the process above with each scaled version. Depending on the size of your input image and the range of possible scales, this may be good enough, and it's easy to implement.
A proper solution is to use affine invariants. Try looking up "wide-baseline stereo matching", people looked at that problem in that context. The methods that are used are generally something like this:
Preprocessing of the original image
Run an "interest point detector". This will find a few points in the image which are easily localizable, e.g. corners. There are many detectors, a detector called "harris-affine" works well and is pretty popular (so implementations probably exist). Another option is to use the Difference-of-Gaussians (DoG) detector, it was developed for SIFT and works well too.
At each interest point, extract a small sub-image (e.g. 30x30 pixels)
For each sub-image, compute a "descriptor", some representation of the image content in that window. Again, many descriptors exist. Things to look at are how well the descriptor describes the image content (you want two descriptors to match only if they are similar) and how invariant it is (you want it to be the same even after scaling). In your case, I'd recommend using SIFT. It is not as invariant as some other descriptors, but can cope with scale well, and in your case scale is the only thing that changes.
At the end of this stage, you will have a set of descriptors.
Testing (with the new test image).
First, you run the same interest point detector as in step 1 and get a set of interest points. You compute the same descriptor for each point, as above. Now you have a set of descriptors for the target image as well.
Next, you look for matches. Ideally, to each descriptor from your original image, there will be some pretty similar descriptor in the target image. (Since the target image is larger, there will also be "leftover" descriptors, i.e. points that don't correspond to anything in the original image.) So if enough of the original descriptors match with enough similarity, then you know the target is there. Moreover, since the descriptors are location-specific, you will also know where in the target image the original image is.
You probably want cross-correlation. (Autocorrelation is correlating a signal with itself; cross correlating is correlating two different signals.)
What correlation does for you, over simply checking for exact matches, is that it will tell you where the best matches are, and how good they are. Flip side is that, for a 2-D picture, it's something like O(N^3), and it's not that simple an algorithm. But it's magic once you get it to work.
EDIT: Aargh, you specified an arbitrary resize. That's going to break any correlation-based algorithm. Sorry, you're outside my experience now and SO won't let me delete this answer. is my first instinct.
Take a look at Scale-Invariant Feature Transforms; there are many different flavors that may be more or less tailored to the type of images you happen to be working with.
