Related
I have been trying Python+OpenCV for quite long time already and followed many tutorials in order to identify particles in the following image:
My ultimate goal is to identify every particle, from there I will be able to e.g. count number of particles, calculate a size distribution, etc.
I have already tried to customize many examples several sites.
I got good hints based on:
How to define the markers for Watershed in OpenCV?
Counting particles using image processing in python
Although I was not able to achieve decent results.
How can I identify particles in this image using Python and OpenCV?
IMO, the only hope to get meaningful results is to use the fact that the particles are round. By using some homogeneity criterion, you could find candidate particle centers, and from these grow contours in such a way that they remain round and stop at edges. An option could be to draw rays from the seed point, find the closest edge points and use a robust fit of a circle or an ellipse.
Reject the shapes that are too far from roundness. This should allow you to find the unoccluded particles. Then you can continue the game from other seed points, this time growing contours that can be occluded by the already detected particles. (When an edge is hit, if it is known to belong to a particle, ignore it.)
Let's pretend the goal is to get an estimated number of particles. Also, let's assume those particles are spheres.
With that being said it should be possible to build a model, based on highlight, shadow, halftone to make the final result as accurate as it can be.
With that being said a simple proof of the concept based on highlight segmentation can be verified.
Initial result doesn't seem to be promising, but a tiny change of the contrast improves it:
Should be enough to get estimated number of stones and apply more advanced models for identified regions.
I am new to the Computer Vision field and looking for your guidance to identify approach to tackle the following scenario:
What approach to follow to do Quality Control on small and thin metal rings using Computer Vision
Putting below the detailed requirement(this is the best I can share):
To begin with, I have attached a picture of the ring we need to do QC of.
Ring_for_QC
Ring diameter = 3 inch
Following checks we need to do:
1.Surface coating of the ring peeled off
2.Portion of ring chipped off
3.Scratch on the ring's Surface
4.Width of the ring is uneven
5.Dent on the ring
6.Entire surface of the ring is not completely horizontal to the plane;
may be due to some dent a part of the ring is resting on the plane surface creating some 1 or 2 degree angle
(I have marked no.6 as 'uneven surface' in the attached picture)
I have also attached another picture marking the quality issues found on a random ring.elevated view with marked QC issues
Scenario:
One single ring can have one or more than one of the above mentioned 6 defects
Issue 1 & 3 can occur at either surface of the ring and we need to check both the surfaces
We need to QC on one single ring at a time
Challenge:
- Need to set up a work station to capture image or video of each ring under check
How many cameras will be there in that work station and what would be the angle for the camera
As we need to check both the sides of the ring we need to decide whether:
we will place the ring on a trasperent surface and take image
or
we need to flip the ring after image is taken on one side
Next challenge is what computer vision technique we should employ to identify all these issues
For the time being we are doing some research around opencv's background substraction methods
It will be helpful to get some insight from you on
what should be a better/feasible approach
Since this is for a student project I'll emphasize image processing more than other aspects of an application. See the bottom section for considerations for real-world applications.
That aside, a general comment: implementing vision for quality control (QC) is hard to get right. If the product to be inspected is cheap (e.g. a ring, a small plastic thing), and if the result of the vision inspection is a borderline pass/fail, or uncertain, you can reject the part. If the part to be inspected is expensive (e.g. a large assembly for a tractor, individual CPUs, medical devices near the end of the production line), then you have to have very well defined specifications, and the system needs to be made as robust as possible.
In general, you want to optimize imaging for each type of defect. For example, the camera location, lens, and lighting to detect scratches may be quite different than what is needed for dimensional gauging (a.k.a. dimensional measurement).
Machine Vision vs. Computer Vision
When you search online for algorithms, equipment, and techniques specific to vision for industrial automation, including the quality control of parts on production lines, then for English-language websites favor the term "machine vision" instead of "computer vision."
https://en.wikipedia.org/wiki/Machine_vision
Machine vision is the common industry term for image processing (+ cameras + lighting + ...) for industrial use. Although different people may use different terminology, and the terminology isn't as important as learning techniques, you'll find a lot of material by searching for "machine vision." The term "computer vision" tends to be used for non-industrial applications, and for academic research, though in languages other than English the terms "machine vision" and "computer vision" may be the same. By comparison, "medical imaging" is similar to machine vision, but involves application of image processing to medical applications.
Lighting
Most importantly, you must control the lighting. Ambient lighting, such as desk lamps, overhead lights, etc., are not only useless for a vision system inspecting parts in production, but will typically interfere with image processing. You might find some defects sometimes with poorly controlled light, but to generate the most consistent results, you'll need to set up lights in specific locations, run the lights at specific, verifiable intensities, and have your vision system detect when something has gone wrong with the lighting.
There are "machine vision lights" designed especially for specific applications such as finding scratches in shiny surfaces, making shiny surfaces look less shiny, to backlight parts (which is useful for dimensional gauging), to illuminate parts from low angles, and so on. Read about different types of lighting.
https://smartvisionlights.com/
https://www.vision-systems.com/content/dam/VSD/solutionsinvision/Resources/lighting_tips_white_paper.pdf
Rather than spend a lot of money on special lights, you can mock them up:
LED flashlight or single LED (as a "point" light source)
Bright light + translucent sheet of plastic (for backlighting)
White tissue paper or some other diffusing material in front of a bright light
...
The importance of lighting can not be underestimated. Controlling lighting conditions improves the chance of success, and is typically necessary to achieve the accuracy of measurement or pass/fail assessment required in real-world environments.
Accuracy, Correctness, Usefulness
At some point you'll probably wonder whether machine learning is useful or necessary for the application. The question to ask yourself (or the customer) is this: what percentage of defects would need to be detected?
For example, if a chip is missing from the ring that could be a fatal defect. Is the ring used in some safety-critical application? If so, vision inspection for QC would have to be extraordinarily robust.
Even if you're familiar with the terms "accuracy" and "precision," make sure they have very clear meanings as you consider image processing problems:
https://en.wikipedia.org/wiki/Accuracy_and_precision
So, what percentage of chip defects needs to be found? 90%? 95%? 98%?
Using the term "accurate" more loosely to mean "the vision system gets the measurement correct and/or finds the defects we know are there," what is the accuracy of the most accurate machine learning algorithm you've read about? Or at least, what would qualify as reasonably impressive accuracy for machine learning? 95%? 98%?
If you're making measurements of machine parts on a production line, then you would typically want the accuracy of dimensional measurements and defect detection to be 99% or better. For high-value products, and products such as electronic components that are highly sensitive to defects, accuracy may need to be 99.999% or better. Think of it this way: if a manufacturer is making thousands or tens of thousands of parts, they don't want garbage parts to make it past your vision system several times a day.
Machine learning for image processing has been around a long time. Processing speeds, memory, and training set sizes have improved, and there have been improvements in algorithms as well, but it's important to note that machine learning is suitable only for some applications, and will fail miserably at other applications.
Techniques
To begin with, I have attached a picture of the ring we need to do QC
of.
Ring_for_QC
Ring diameter = 3 inch
Get the exact diameter, including tolerances. If the nominal diameter is 3.000 inches, then then tolerance might be expressed in terms of thousands of an inch. You may not need to know that for a student project, but if you were proposing a solution for a factory owner you wouldn't want to even suggest a price or timeline for delivery without having complete specs for the part, and numerous samples of the part.
From the one image it's not possible to be too specific about what a defect might look like--the same part can have different defects in different factories, or even on different production lines of the same factory--but we can make some guesses.
1.Surface coating of the ring peeled off
From the one image it's not clear what the surface coating is supposed to look like, or what's underneath. You must provide at least one image of a good part, and at least one image for each type of defect.
What is the surface coating? Anodization? Paint? Enamel? Plastic? Cheese? Whatever the case, knowing what material it is, and how that material degrades, will give some clues about what sort of vision setup may help detect problems with the coating. Changes in coating quality can affect apparent texture (e.g. edge content), brightness/darkness (intensity), color, shininess, and so on.
For the moment, let's assume the coating peeling off changes the brightness or texture of the uncoated surface vs. the remaining coated surface. Then your image processing might look something like the following:
Determine whether a ring is in the image
Segment the ring from the background. That is, use an algorithm such as connected components (OpenCV's findContours()), SIFT, or some other technique to identify the presence and location of a rigid object of known size and shape from the background.
Isolate further processing to just those pixels corresponding to the surface of the part.
Use some technique to find clusters of different texture differences, brightness differences, etc. This is where a better description of the coating is required. If lighting and lens parameters are "fixed," you can consider generating a histogram of brightness values in the image (0 = black, 255 = white) and then comparing the histogram of good parts and bad parts--is there some statistical difference? Or you might use connected components (findContours() again) to cluster pixels of different colors, assuming the lack of coating changes the apparent color of the part: maybe the coating is brown and the part is silvery.
It's hard to guess what technique would be relevant here without photos and/or a much more specific description of the coating. Hopefully this makes it clear why specs are important.
Coatings can be absent in different ways: peeling, small absences (voids), partially scraped away, etc. It can be difficult to predict in advance what the shape and size of missing coating may be.
When the size and shape of a defect is hard to predict, but when the defect is associated with a difference in image intensity (pixel brightness) or color, then explore these ideas:
Generate an "edge image" in which you find brightness/color transitions. You start with the grayscale or color image, then use Sobel or Canny or some other algorithm to generate an image of edge intensities.
Apply statistical methods to determine how "edgy" an image is. Are there more than N pixels (or more than 5% of all pixels) with an edge strength greater than S?
Once you have some basic algorithm that identifies the difference between good parts and parts with some missing coating, then you could consider using machine learning to review lots (lots!) of samples to help determine the best parameterization. For example, how do you know what number of edge pixels or edge pixel strength should be considered "bad"?
2.Portion of ring chipped off
It depends on whether the chip is visible just from the part's outline. For example, if you placed the part on a light table (a.k.a. "backlight"), would you always see a defect considered to be a "chip"? Or could the chip just be on the top surface facing the camera?
To find chips on edges, having the part on a backlight simplifies matters greatly.
Identify the location and orientation of the part (e.g. using connect components, normalized correlation, SIFT, or whatever algorithm is suitable for the part and accuracy of location required).
Find edges corresponding to the outer and inner rings of the part.
Fit a circle or nearly circle ellipse to the edge points using Hough circle fit, RANSAC circle fit, or (meh) least square circle fit parameterized to the known dimensions (in pixels) of the outer ring and inner rind diameters.
For the points used for the circle fits, find the point-to-circle (or point-to-ellipse) shortest distance. The larger this distance, the more likely you have a chip or missing chunk.
To ensure you're finding identations, chips, or whatever, and not just individual "noise" edge points, examine points in order going clockwise or anticlockwise, and only consider a series of perimeter points as defects if N successive points have a median or possibly mean point-to-edge distance greater than N.
A simpler approach could be to fit a black-and-white mask--a template--representing a good part to the current location and rotation of the part to be inspected. If the template and sample part are aligned very precisely, and if you perform image subtraction, then you may be fortunate enough to get clusters or pixels where there are defects. But this method is fairly crude, and harder to make robust.
There are machine learning techniques to identify chips on edges, but you'd need lots of part samples to train the techniques. Optionally, if you don't have enough samples, you can use the sample samples with slightly modified lighting, at different locations in the image, with manually added defects, etc., to help train the algorithm. But that's another discussion altogether.
3.Scratch on the ring's Surface
See the link above about different types of lighting. You'll need to experiment with a few different lighting configurations to figure out what works for your part.
Generally, though, scratches are likely to have difference in brightness and "edginess" (image edge content) relative to the rest of the part. If you're lucky, a scratch can reveal a different color.
Scratches can vary so much in appearance, area, and shape that it would be hard to parameterize an algorithm to catch them all. Once again, statistical analysis of edge content, brightness, and color tends to be useful.
In general: to achieve the best results for a particular QC inspection, you'll need to engineer a system specifically for the part. Your vision system may be configurable, and there can be different combinations of lights and cameras for different types of QC inspection, but for any particular defect detection you want to control the appearance of the part as much as possible. Relying on software to do all the work yields a less robust system that customers will typically yank out and throw away.
4.Width of the ring is uneven
This is almost an example of dimensional gauging or optical gauging. If you're just looking for unevenness, you don't necessarily need to measurement diameter in engineering units such as millimeters: you can just measure pixels. BUT the effort required to ensure your measurement in pixels is accurate will typically lead you to measuring in millimeters anyway.
Assuming the optical setup is correct and (more or less) calibrated, which I'll describe below, here's a basic process:
Identify the position and location of the part
From the algorithm that find the part, or from a follow-on algorithm that identifies edge pixels (e.g. Sobel, Canny, ...), find the edge pixels just for the outer diameter of the ring.
Perform a circle/ellipse fit to the edge pixels, and eliminate outlier pixels that don't actually belong to the circle/ellipse.
Have your algorithm start with the 1st pixel in the list of edge pixels corresponding to the outer diameter.
From that 1st pixel, find the edge pixel farthest away. Ideally, this would be the point diametrically opposite.
Cycle through all pixels, finding the distance to the farthest pixel. (This is not optimal in terms of speed, but simpler to code.)
Generate a histogram of all distances.
Make a determination of good/bad based on the histogram of point-to-point distances.
You might call a part "bad" for one or more of the following conditions:
At least N point-to-point distances exceed a distance of P pixels
The standard deviation of point-to-point distances exceeds some threshold T
...
Measurement of distance depends on the consistency of point-to-point distances at different locations within the image. If you perform accurate, precise measurements of distance, you'll notice that an object of fixed length appears to vary in length depending on its location in the image: if the object is located in the center of the image it may appear to be 57.5 pixels long, but in one corner of the image it may appear to be 56.2 pixels long.
To correct for these irregularities, you can...
Perform a nonlinear flatness correction. This will also correct for non-normal alignment of the camera to part, though you want to start with the optical axis of the camera as normal (perpendicular) to the surface of the part as possib.e.
Make a few quick measurements to estimate how much measurements vary.
5.Dent on the ring
6.Entire surface of the ring is not completely horizontal to the plane; may be due to some dent a part of the ring is resting on the
plane surface creating some 1 or 2 degree angle (I have marked no.6 as
'uneven surface' in the attached picture)
Use cameras imaging from the sides. Make sure the background is simple.
A 1- to 2-degree difference could be hard to detect using a camera placed directed overhead. If you're lucky you could detect that the outer edge of the part is more elliptical than circular, but the ability to detect this would depend on the color and thickness of the part. Also, you wouldn't necessarily be able to distinguish between a misshapen part and one resting at an angle--but for some inspections that's okay since both are defects.
HOWEVER, in a real-world application the customer might not be happy if you reject parts that are otherwise good, but happen to be sitting at a slight angle. A mechanical fixture might fix the problem by ensure parts are lying flat.
I have also attached another picture marking the quality issues found
on a random ring.elevated view with marked QC issues
The image isn't clear enough. Put the part on a simpler background and tinker with lighting to make it more obvious what the differences are between good and bad.
One single ring can have one or more than one of the above mentioned 6 defects
Run one algorithm after the other. You may also have to turn different lights on and off before running each algorithm (or rather, each chain of algorithms).
Issue 1 & 3 can occur at either surface of the ring and we need to check both the surfaces
We need to QC on one single ring at a time
You may have to write an algorithm to detect whether multiple rings happen to be present. Even if you weren't asked to do this specifically, this happens in production, and your professor may surprise you with it. At least have an idea how you would detect the presence of multiple rings.
That's another aspect of vision: you may start thinking of what algorithms and lighting are necessary to solve "the problem," but you'll also spend a lot of time figuring out everything that could go wrong, and writing software to detect those conditions to ensure you don't yield a false result. For example, what happens if the lights turn off? What if two rings are present? What if the ring isn't fully within the field of view? What if dirt gets on the surface the part is resting on? What if the lens gets dirty (which it will)?
A few principles:
Provide the best image for image processing before you consider what algorithm would work best.
Understand what accuracy/success rate is necessary, and measure it.
Get as many samples as you possibly can: hundreds, thousands if possible. Having a chance to measure "online" (in real production) is helpful.
Real-world applications
If it were a real-world application--that is, if you went into the field of vision professionally--there are many more steps that may seem less difficult, but that turn out to be critical:
How rings come into view (or into "station"): on a moving conveyor? placed by a robot? in some container?
What triggers vision inspection of the ring -- a programmable logic controller, a "light curtain" the ring passes through, or whether the vision system itself has to determine when a ring is ready for inspection.
How results are communicated to other equipment. (This can be a huge hassle, and an otherwise good vision system can be rejected by a customer if communications aren't designed and implemented properly.)
Whether you are guaranteed to see only one ring at a time
This isn't to say university isn't the real world: just that you probably won't lose tens or hundreds of thousands of Euros/pounds/dollars if you happen to overlook something.
You can see how to makes face recognition.
Face detection.
Face alignment and normalization.
Features extraction.
Comparing features with pattern.
But in your case, you can skip paragraph 3 and compare 2 with the reference image. Depending on the conditions, additional filtering may be necessary.
I am an experimental physicist (grad student) that is trying to take an AutoCAD model of the experiment I've built and find the gravitational potential from the whole instrument over a specified volume. Before I find the potential, I'm trying to make a map of the mass density at each point in the model.
What's important is that I already have a model and in the end I'll have a something that says "At (x,y,z) the value is d". If that's an crazy csv file, a numpy array, an excel sheet, or... whatever, I'll be happy.
Here's what I've come up with so far:
Step 1: I color code the AutoCAD file so that color associates with material.
Step 2: I send the new drawing/model to a slicer (made for 3D printing). This takes my 3D object and turns it into equally spaced (in z-direction) 2d objects... but then that's all output as g-code. But hey! G-code is a way of telling a motor how to move.
Step 3: This is the 'hard part' and the meat of this question. I'm thinking that I take that g-code, which is in essence just a set of instructions on how to move a nozzle and use it to populate a numpy array. Basically I have 3D array, each level corresponds to one position in z, and the grid left is my x-y plane. It reads what color is being put where, and follows the nozzle and puts that mass into those spots. It knows the mass because of the color. It follows the path by parsing the g-code.
When it is done with that level, it moves to the next grid and repeats.
Does this sound insane? Better yet, does it sound plausible? Or maybe someone has a smarter way of thinking about this.
Even if you just read all that, thank you. Seriously.
Does this sound insane? Better yet, does it sound plausible?
It's very reasonable and plausible. Using the g-code could do that, but it would require a g-code interpreter that could map the instructions to a 2D path. (Not 3D, since you mentioned that you're taking fixed z-slices.) That could be problematic, but, if you found one, it could work, but may require some parser manipulation. There are several of these in a variety of languages, that could be useful.
SUGGESTION
From what you describe, it's akin to doing a MRI scan of the object, and trying to determine its constituent mass profile along a given axis. In this case, and unlike MRI, you have multiple colors, so that can be used to your advantage in region selection / identification.
Even if you used a g-code interpreter, it would reproduce an image whose area you'll still have to calculate, so noting that and given that you seek to determine and classify material composition by path (in that the path defines the boundary of a particular material, which has a unique color), there may be a couple ways to approach this without resorting to g-code:
1) If the colors of your material are easily (or reasonably) distinguishable, you can create a color mask which will quantify the occupied area, from which you can then determine the mass.
That is, if you take a photograph of the slice, load the image into a numpy array, and then search for a specific value (say red), you can identify the area of the region. Then, you apply a mask on your array. Once done, you count the occupied elements within your array, and then you divide it by the array size (i.e. rows by columns), which would give you the relative area occupied. Since you know the mass of the material, and there is a constant z-thickness, this will give you the relative mass. An example of color masking using numpy alone is shown here: http://scikit-image.org/docs/dev/user_guide/numpy_images.html
As such, let's define an example that's analogous to your problem - let's say we have a picture of a red cabbage, and we want to know which how much of the picture contains red / purple-like pixels.
To simplify our life, we'll set any pixel above a certain threshold to white (RGB: 255,255,255), and then count how many non-white pixels there are:
from copy import deepcopy
import numpy as np
import matplotlib.pyplot as plt
def plot_image(fname, color=128, replacement=(255, 255, 255), plot=False):
# 128 is a reasonable guess since most of the pixels in the image that have the
# purplish hue, have RGB's above this value.
data = imread(fname)
image_data = deepcopy(data) # copy the original data (for later use if need be)
mask = image_data[:, :, 0] < color # apply the color mask over the image data
image_data[mask] = np.array(replacement) # replace the match
if plot:
plt.imshow(image_data)
plt.show()
return data, image_data
data, image_data = plot_image('cabbage.jpg') # load the image, and apply the mask
# Find the locations of all the pixels that are non-white (i.e. 255)
# This returns 3 arrays of the same size)
indices = np.where(image_data != 255)
# Now, calculate the area: in this case, ~ 62.04 %
effective_area = indices[0].size / float(data.size)
The selected region in question is shown here below:
Note that image_data contains the pixel information that has been masked, and would provide the coordinates (albeit in pixel space) of where each occupied (i.e. non-white) pixel occurs. The issue with this of course is that these are pixel coordinates and not a physical one. But, since you know the physical dimensions, extrapolating those quantities are easily done.
Furthermore, with the effective area known, and knowledge of the physical dimension, you have a good estimate of the real area occupied. To obtain better results, tweak the value of the color threshold (i.e. color). In your real-life example, since you know the color, search within a pixel range around that value (to offset noise and lighting issues).
The above method is a bit crude - but effective - and, it may be worth exploring using it in tandem with edge-detection, as that could help improve the region identification, and area selection. (Note that isn't always strictly true!) Also, color deconvolution may be useful: http://scikit-image.org/docs/dev/auto_examples/color_exposure/plot_ihc_color_separation.html#sphx-glr-auto-examples-color-exposure-plot-ihc-color-separation-py
The downside to this is that the analysis requires a high quality image, good lighting; and, most importantly, it's likely that you'll lose some of the more finer details of the edges, which would impact your masses.
2) Instead of resorting to camera work, and given that you have the AutoCAD model, you can use that and the software itself in addition to the above prescribed method.
Since you've colored each material in the model differently, you can use AutoCAD's slicing tool, and can do something similar to what the first method suggests doing physically: slicing the model, and taking pictures of the slice to expose the surface. Then, using a similar method described above of color masking / edge detection / region determination through color selection, you should obtain a much better and (arguably) very accurate result.
The downside to this, is that you're also limited by the image quality used. But, as it's software, that shouldn't be much of an issue, and you can get extremely high accuracy - close to its actual result.
The last suggestion to improve these results would be to script numerous random thin slicing of the AutoCAD model along a particular directional vector shared by every subsequent slice, exporting each exposed surface, analyzing each image in the manner described above, and then collecting those results to given you a Monte Carlo-like and statistically quantifiable determination of the mass (to correct for geometry effects due to slicing along one given axis).
I have a grid on pictures (they are from camera). After binarization they look like this (red is 255, blue is 0):
What is the best way to detect grid nodes (crosses) on these pictures?
Note: grid is distorted from cell to cell non-uniformly.
Update:
Some examples of different grids and thier distortions before binarization:
In cases like this I first try to find the best starting point.
So, first I thresholded your image (however I could also skeletonize it and just then threshold. But this way some data is lost irrecoverably):
Then, I tried loads of tools to get the most prominent features emphasized in bulk. Finally, playing with Gimp's G'MIC plugin I found this:
Based on the above I prepared a universal pattern that looks like this:
Then I just got a part of this image:
To help determine angle I made local Fourier freq graph - this way you can obtain your pattern local angle:
Then you can make a simple thick that works fast on modern GPUs - get difference like this (missed case):
When there is hit the difference is minimal; what I had in mind talking about local maximums refers more or less to how the resulting difference should be treated. It wouldn't be wise to weight outside of the pattern circle difference the same as inside due to scale factor sensitivity. Thus, inside with cross should be weighted more in used algorithm. Nevertheless differenced pattern with image looks like this:
As you can see it's possible to differentiate between hit and miss. What is crucial is to set proper tolerance and use Fourier frequencies to obtain angle (with thresholded images Fourier usually follows overall orientation of image analyzed).
The above way can be later complemented by Harris detection, or Harris detection can be modified using above patterns to distinguish two to four closely placed corners.
Unfortunately, all techniques are scale dependent in such case and should be adjusted to it properly.
There are also other approaches to your problem, for instance by watershedding it first, then getting regions, then disregarding foreground, then simplifying curves, then checking if their corners form a consecutive equidistant pattern. But to my nose it would not produce correct results.
One more thing - libgmic is G'MIC library from where you can directly or through bindings use transformations shown above. Or get algorithms and rewrite them in your app.
I suppose that this can be a potential answer (actually mentioned in comments): http://opencv.itseez.com/2.4/modules/imgproc/doc/feature_detection.html?highlight=hough#houghlinesp
There can also be other ways using skimage tools for feature detection.
But actually I think that instead of Hough transformation that could contribute to huge bloat and and lack of precision (straight lines), I would suggest trying Harris corner detection - http://docs.opencv.org/2.4/doc/tutorials/features2d/trackingmotion/harris_detector/harris_detector.html .
This can be further adjusted (cross corners, so local maximum should depend on crossy' distribution) to your specific issue. Then some curves approximation can be done based on points got.
Maybe you cloud calculate Hough Lines and determine the intersections. An OpenCV documentation can be found here
I got two images showing exaktly the same content: 2D-gaussian-shaped spots. I call these two 16-bit png-files "left.png" and "right.png". But as they are obtained thru an slightly different optical setup, the corresponding spots (physically the same) appear at slightly different positions. Meaning the right is slightly stretched, distorted, or so, in a non-linear way. Therefore I would like to get the transformation from left to right.
So for every pixel on the left side with its x- and y-coordinate I want a function giving me the components of the displacement-vector that points to the corresponding pixel on the right side.
In a former approach I tried to get the positions of the corresponding spots to obtain the relative distances deltaX and deltaY. These distances then I fitted to the taylor-expansion up to second order of T(x,y) giving me the x- and y-component of the displacement vector for every pixel (x,y) on the left, pointing to corresponding pixel (x',y') on the right.
To get a more general result I would like to use normalized cross-correlation. For this I multiply every pixelvalue from left with a corresponding pixelvalue from right and sum over these products. The transformation I am looking for should connect the pixels that will maximize the sum. So when the sum is maximzied, I know that I multiplied the corresponding pixels.
I really tried a lot with this, but didn't manage. My question is if somebody of you has an idea or has ever done something similar.
import numpy as np
import Image
left = np.array(Image.open('left.png'))
right = np.array(Image.open('right.png'))
# for normalization (http://en.wikipedia.org/wiki/Cross-correlation#Normalized_cross-correlation)
left = (left - left.mean()) / left.std()
right = (right - right.mean()) / right.std()
Please let me know if I can make this question more clear. I still have to check out how to post questions using latex.
Thank you very much for input.
[left.png] http://i.stack.imgur.com/oSTER.png
[right.png] http://i.stack.imgur.com/Njahj.png
I'm afraid, in most cases 16-bit images appear just black (at least on systems I use) :( but of course there is data in there.
UPDATE 1
I try to clearify my question. I am looking for a vector-field with displacement-vectors that point from every pixel in left.png to the corresponding pixel in right.png. My problem is, that I am not sure about the constraints I have.
where vector r (components x and y) points to a pixel in left.png and vector r-prime (components x-prime and y-prime) points to the corresponding pixel in right.png. for every r there is a displacement-vector.
What I did earlier was, that I found manually components of vector-field d and fitted them to a polynom second degree:
So I fitted:
and
Does this make sense to you? Is it possible to get all the delta-x(x,y) and delta-y(x,y) with cross-correlation? The cross-correlation should be maximized if the corresponding pixels are linked together thru the displacement-vectors, right?
UPDATE 2
So the algorithm I was thinking of is as follows:
Deform right.png
Get the value of cross-correlation
Deform right.png further
Get the value of cross-correlation and compare to value before
If it's greater, good deformation, if not, redo deformation and do something else
After maximzied the cross-correlation value, know what deformation there is :)
About deformation: could one do first a shift along x- and y-direction to maximize cross-correlation, then in a second step stretch or compress x- and y-dependant and in a third step deform quadratic x- and y-dependent and repeat this procedure iterativ?? I really have a problem to do this with integer-coordinates. Do you think I would have to interpolate the picture to obtain a continuous distribution?? I have to think about this again :( Thanks to everybody for taking part :)
OpenCV (and with it the python Opencv binding) has a StarDetector class which implements this algorithm.
As an alternative you might have a look at the OpenCV SIFT class, which stands for Scale Invariant Feature Transform.
Update
Regarding your comment, I understand that the "right" transformation will maximize the cross-correlation between the images, but I don't understand how you choose the set of transformations over which to maximize. Maybe if you know the coordinates of three matching points (either by some heuristics or by choosing them by hand), and if you expect affinity, you could use something like cv2.getAffineTransform to have a good initial transformation for your maximization process. From there you could use small additional transformations to have a set over which to maximize. But this approach seems to me like re-inventing something which SIFT could take care of.
To actually transform your test image you can use cv2.warpAffine, which also can take care of border values (e.g. pad with 0). To calculate the cross-correlation you could use scipy.signal.correlate2d.
Update
Your latest update did indeed clarify some points for me. But I think that a vector field of displacements is not the most natural thing to look for, and this is also where the misunderstanding came from. I was thinking more along the lines of a global transformation T, which applied to any point (x,y) of the left image gives (x',y')=T(x,y) on the right side, but T has the same analytical form for every pixel. For example, this could be a combination of a displacement, rotation, scaling, maybe some perspective transformation. I cannot say whether it is realistic or not to hope to find such a transformation, this depends on your setup, but if the scene is physically the same on both sides I would say it is reasonable to expect some affine transformation. This is why I suggested cv2.getAffineTransform. It is of course trivial to calculate your displacement Vector field from such a T, as this is just T(x,y)-(x,y).
The big advantage would be that you have only very few degrees of freedom for your transformation, instead of, I would argue, 2N degrees of freedom in the displacement vector field, where N is the number of bright spots.
If it is indeed an affine transformation, I would suggest some algorithm like this:
identify three bright and well isolated spots on the left
for each of these three spots, define a bounding box so that you can hope to identify the corresponding spot within it in the right image
find the coordinates of the corresponding spots, e.g. with some correlation method as implemented in cv2.matchTemplate or by also just finding the brightest spot within the bounding box.
once you have three matching pairs of coordinates, calculate the affine transformation which transforms one set into the other with cv2.getAffineTransform.
apply this affine transformation to the left image, as a check if you found the right one you could calculate if the overall normalized cross-correlation is above some threshold or drops significantly if you displace one image with respect to the other.
if you wish and still need it, calculate the displacement vector field trivially from your transformation T.
Update
It seems cv2.getAffineTransform expects an awkward input data type 'float32'. Let's assume the source coordinates are (sxi,syi) and destination (dxi,dyi) with i=0,1,2, then what you need is
src = np.array( ((sx0,sy0),(sx1,sy1),(sx2,sy2)), dtype='float32' )
dst = np.array( ((dx0,dy0),(dx1,dy1),(dx2,dy2)), dtype='float32' )
result = cv2.getAffineTransform(src,dst)
I don't think a cross correlation is going to help here, as it only gives you a single best shift for the whole image. There are three alternatives I would consider:
Do a cross correlation on sub-clusters of dots. Take, for example, the three dots in the top right and find the optimal x-y shift through cross-correlation. This gives you the rough transform for the top left. Repeat for as many clusters as you can to obtain a reasonable map of your transformations. Fit this with your Taylor expansion and you might get reasonably close. However, to have your cross-correlation work in any way, the difference in displacement between spots must be less than the extend of the spot, else you can never get all spots in a cluster to overlap simultaneously with a single displacement. Under these conditions, option 2 might be more suitable.
If the displacements are relatively small (which I think is a condition for option 1), then we might assume that for a given spot in the left image, the closest spot in the right image is the corresponding spot. Thus, for every spot in the left image, we find the nearest spot in the right image and use that as the displacement in that location. From the 40-something well distributed displacement vectors we can obtain a reasonable approximation of the actual displacement by fitting your Taylor expansion.
This is probably the slowest method, but might be the most robust if you have large displacements (and option 2 thus doesn't work): use something like an evolutionary algorithm to find the displacement. Apply a random transformation, compute the remaining error (you might need to define this as sum of the smallest distance between spots in your original and transformed image), and improve your transformation with those results. If your displacements are rather large you might need a very broad search as you'll probably get lots of local minima in your landscape.
I would try option 2 as it seems your displacements might be small enough to easily associate a spot in the left image with a spot in the right image.
Update
I assume your optics induce non linear distortions and having two separate beampaths (different filters in each?) will make the relationship between the two images even more non-linear. The affine transformation PiQuer suggests might give a reasonable approach but can probably never completely cover the actual distortions.
I think your approach of fitting to a low order Taylor polynomial is fine. This works for all my applications with similar conditions. Highest orders probably should be something like xy^2 and x^2y; anything higher than that you won't notice.
Alternatively, you might be able to calibrate the distortions for each image first, and then do your experiments. This way you are not dependent on the distribution of you dots, but can use a high resolution reference image to get the best description of your transformation.
Option 2 above still stands as my suggestion for getting the two images to overlap. This can be fully automated and I'm not sure what you mean when you want a more general result.
Update 2
You comment that you have trouble matching dots in the two images. If this is the case, I think your iterative cross-correlation approach may not be very robust either. You have very small dots, so overlap between them will only occur if the difference between the two images is small.
In principle there is nothing wrong with your proposed solution, but whether it works or not strongly depends on the size of your deformations and the robustness of your optimization algorithm. If you start off with very little overlap, then it may be hard to find a good starting point for your optimization. Yet if you have sufficient overlap to begin with, then you should have been able to find the deformation per dot first, but in a comment you indicate that this doesn't work.
Perhaps you can go for a mixed solution: find the cross correlation of clusters of dots to get a starting point for your optimization, and then tweak the deformation using something like the procedure you describe in your update. Thus:
For a NxN pixel segment find the shift between the left and right images
Repeat for, say, 16 of those segments
Compute an approximation of the deformation using those 16 points
Use this as the starting point of your optimization approach
You might want to have a look at bunwarpj which already does what you're trying to do. It's not python but I use it in exactly this context. You can export a plain text spline transformation and use it if you wish to do so.