I have an image find- and "blur-compare"-task. I could not figure out which methods I should use.
The setup is this: A, say, 100x100 box either is mostly filled by an object or not. To the human eye this object is always almost the same, but might change by blur, slight rescaling, tilting 3-dimensionally, moving to the side or up/down by a or two pixel or other very small graphical changes.
What is a simple quick robust and reliable way to check if the transformed object is there or not? Points to python packages as well as code would be nice.
Not sure I entirely understand your question, but I'll give it a shot..
Assuming:
we just want to know if there is some object in a box.
the empty box is always the same
perfect box alignment etc.
You can do this:
subtract the query image from your empty box image.
sum all pixels
if the value is zero the images are identical, therefore no change, so no object.
Obviously there actually is some difference between the box parts of the two images, but the key thing is that the non-object part of the images are as similar as possible for both pictures, if this is the case, then we can use the above method but with a threshold test as the 3rd step. Provided the threshold is set reasonably, it should give a decent prediction of whether the box is empty or not..
Related
I have raw microscopy images like this:
And I want to segment the objects, as you see some of them are really close and I have a great range of intensity values.
background: 700 a.u.
fluorescent shapes: from 7000 to 32000 a.u.
To segment them I use Otsu binary segmentation from skimage package (without prior processing of the image)
thresh, imgthresh=cv2.threshold(image, 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)
The result is pretty good, but still fails in detecting the brightest shapes as individual objects.
I have tried a lot of things: watershed algorithm, image preprocessing (blurring), eroding , adaptive thresholding, but nothing works properly since the main problem is the difference in fluorescent values of the image.
Any smart idea on how to solve this?
Because your data have such a large range in intensity values, single histogram based methods on the whole image (e.g. Otsu) are going to have a little trouble accomplishing this task. I think that your best bet is going to be either:
threshold_multiotsu: and choose number of classes based on number of 'clusters' of intensities. Unfortunately, you will likely need to alter the number of classes on an image by image basis so this isn't super robust.
threshold_local: I know you said that you tried this but you might revisit this and alter the block_size parameter until you get something that looks reasonable. Based on your example images (and assuming a little bit about why the objects in your example images are green) it looks like that objects in close spatial proximity to one another generally have similar intensity values. Furthermore, you likely won't have to go through and alter the parameters as much as you would in option 1.
I suspect that these will be the simplest and most straight forward approaches but you could also delve into identifying the object edges using something from skimage.feature and then filling objects. Maybe something like outline here: https://scikit-image.org/docs/stable/auto_examples/features_detection/plot_blob.html. This will be a bit more involved, but these methods should be more robust with identifying objects with largely varied intensity values.
If all else fails you can try a couple of SOTA packages. The main ones that I am thinking of are https://github.com/stardist/stardist and https://github.com/MouseLand/cellpose but these seem like a bit of overkill based on your example data here.
Most neural nets for images are great a detecting objects and labeling them. They can take a picture and label some of the objects in it. -- think yolo5
Template matching, on the other hand, looks for a template that is mostly the same in a larger image. -- opencv2.templateMatching
What I hope to have is something kind of "inbetween" the two version. Given a manual entered template image, give me the Rectangles in a larger picture where this template occurs - but must be scale invariant, and transform invariant (within reason).
The opencv2 version is too strict in what it counts as matches -- 10% size change can make matches fail, slight rotations can cause it to fail. This makes it not robust enough to be useful for.
Take for instance the following (below), where we see highlighted pictures of airplanes.
This would be the ideal output.
The input would be 1 of the small green squares, ideally any 1 of them would work.
Are there things out there that can do this already?
Essentially, a opencv2.templateMatching that is more reasonably "fuzzy".
Or if I was doing this with Balls, I would use as template a picture of a ball or even baseball as a clean template, and then highlight 3 balls in the following image.
I don't need image recognition, I need image...similarity with a given template (that is better than opencv2.templateMatching cause that one is terrible)
For those interested in the future, I ultimately had to go with a full YOLOv5 network to do a custom training.
I was unable to find a "cheaper" solution.
I'd like to ask if there's a better or faster alternative way to get the largest rectangle inside an almost rectangular contour.
The rectangle should be aligned to both x and y axis and should be completely inside the rectangular contour. That means it would not contain any external white pixels, yet occupy the largest area in the contour.
Test image is here:
I've tried these two but I'm looking if there's a faster and neater way to go around this.
I also tried going through the points of a contour and getting the minimum and maximum points like in here but of course, it just shows similar results to what cv2.boundingRect already does.
Maybe this is a bit of lateral thinking, but looking at your examples and spec when not fill out white pikels contiguouys with the outside bounding box instead. (Like a 'paint pot' brush in Paint-type application).
E.g. (red pixels being the ones you would turn black from white):
You could probably even limit the process to the outer N pixels.
============================
So how might one implement this? It is essentially a version of the "flood fill" algorithm used in pixel graphics programmes, except that you start not from a single seed pixel but checking every point on the edge of the outside bounding rectangle. You start filling in and build a stack of points you need to come back to because you can't necessarily follow every area at once and may need to go back on your self.
You can look that algorithm up, but a 'pure' version will be very stack-heavy if you push every point you can't follow right now, particularly starting with the whole boundary of the shape.
I haven't implemented it this way, but my first thought would be a scan from a boundary inwards, taking a whole line of pixels at a time and mark all the 'white' pixels with a new 3rd colour, then on the next row you fill all the white pixels touching the previously marked pixels and so on. (doesn't matter whether you mark the changed pixels as a 3rd colour, a mask, or alpha-channel or whatever - but you must be able to tell newly filled in pixels from the old black ones.
As you go, you need to check for any 'stranded' areas where you need to work backwards to fill in white areas that are not directly connected to the outside:
Start filling from the edge...
Watch out for stranded areas - if you find one, scan backwards to fill before going to where you were before, to carry one (you may need to recurse if you stranded area turns back on itself again, though in your particular application this shouldn't be a huge issue, unlike some graphics applications)
And continue, not forgetting to fill in from the other edges if required (see note below) until you come to a row with no further pixels to fill and no more back-filling to do. Then restart at the far side of the image as you need to start a backward pass from the far side to catch anything else on that side.
For a practical implementation there is some thinking to do. Your examples will have a lot of filling at the edge but not much by way of complex internal shapes to follow, which keeps things simple. But you need to work from all 4 sides to do it efficiently - perhaps working in as a series of concentric rectangles rather than one side at a time. More complexity working through the design but massively more efficient in this example.
Food for thought anyhow.
I have an image containing cells. I can't provide it, but it is similar to the image used as an example here: http://blogs.mathworks.com/steve/2006/06/02/cell-segmentation/ but without the characteristic nuclei.
I have done some processing and am now left with a pretty good segmentation, but some cells are close to each other and I need to split them. Most of them consist of more or less overlapping ellipses.
I am certain that a few iterations of simple erosion will split almost all of those regions. But some of the other cells are so small, they will disappear before the others split. Therefore I need an algorithm that erodes the image, allowing region splitting, but does not delete the last pixel of a region.
I want to use watershed afterwards to segment the cells.
I guess I could implement this on my own by searching for cennected regions and then tracking that I don't lose any or something like that, but the implementation seems messy even in my head and I think there must be an easier way. So my question is basically, what's the name of this so I can google an implementation? Or if there is no off-the-shelf solution, what's an elegant way of implementing this without dozens of iterations and for loops etc.
(Language is python)
It's a classical problem, and if the overlap between cells is too important, let's say 40% or more, then there is not a good solution.
However, if the overlap is not important, here is the solution:
You start from the segmentation you have, let's call it S
You computer the ultimate eroded UE(S). It will give you the center of each cell. It will give you something like the red points on this image. In this image, they use a distance map, an ultimate eroded will be more stable. If there are still many red points per cell, then a dilation of the UE(S) will fix your problem like this example.
You invert Inv(S) or compute the voronoi diagram Voi(S) in order to have a marker in the background.
Watershed on the gradient image of S, using the UE(S) as inner marker (perfect because you have one point by cell) and Inv(S) or Voi(S) as background/outer marker.
You will get something like this example.
How do you detect the location of an image within a larger image? I have an unmodified copy of the image. This image is then changed to an arbitrary resolution and placed randomly within a much larger image which is of an arbitrary size. No other transformations are conducted on the resulting image. Python code would be ideal, and it would probably require libgd. If you know of a good approach to this problem you'll get a +1.
There is a quick and dirty solution, and that's simply sliding a window over the target image and computing some measure of similarity at each location, then picking the location with the highest similarity. Then you compare the similarity to a threshold, if the score is above the threshold, you conclude the image is there and that's the location; if the score is below the threshold, then the image isn't there.
As a similarity measure, you can use normalized correlation or sum of squared differences (aka L2 norm). As people mentioned, this will not deal with scale changes. So you also rescale your original image multiple times and repeat the process above with each scaled version. Depending on the size of your input image and the range of possible scales, this may be good enough, and it's easy to implement.
A proper solution is to use affine invariants. Try looking up "wide-baseline stereo matching", people looked at that problem in that context. The methods that are used are generally something like this:
Preprocessing of the original image
Run an "interest point detector". This will find a few points in the image which are easily localizable, e.g. corners. There are many detectors, a detector called "harris-affine" works well and is pretty popular (so implementations probably exist). Another option is to use the Difference-of-Gaussians (DoG) detector, it was developed for SIFT and works well too.
At each interest point, extract a small sub-image (e.g. 30x30 pixels)
For each sub-image, compute a "descriptor", some representation of the image content in that window. Again, many descriptors exist. Things to look at are how well the descriptor describes the image content (you want two descriptors to match only if they are similar) and how invariant it is (you want it to be the same even after scaling). In your case, I'd recommend using SIFT. It is not as invariant as some other descriptors, but can cope with scale well, and in your case scale is the only thing that changes.
At the end of this stage, you will have a set of descriptors.
Testing (with the new test image).
First, you run the same interest point detector as in step 1 and get a set of interest points. You compute the same descriptor for each point, as above. Now you have a set of descriptors for the target image as well.
Next, you look for matches. Ideally, to each descriptor from your original image, there will be some pretty similar descriptor in the target image. (Since the target image is larger, there will also be "leftover" descriptors, i.e. points that don't correspond to anything in the original image.) So if enough of the original descriptors match with enough similarity, then you know the target is there. Moreover, since the descriptors are location-specific, you will also know where in the target image the original image is.
You probably want cross-correlation. (Autocorrelation is correlating a signal with itself; cross correlating is correlating two different signals.)
What correlation does for you, over simply checking for exact matches, is that it will tell you where the best matches are, and how good they are. Flip side is that, for a 2-D picture, it's something like O(N^3), and it's not that simple an algorithm. But it's magic once you get it to work.
EDIT: Aargh, you specified an arbitrary resize. That's going to break any correlation-based algorithm. Sorry, you're outside my experience now and SO won't let me delete this answer.
http://en.wikipedia.org/wiki/Autocorrelation is my first instinct.
Take a look at Scale-Invariant Feature Transforms; there are many different flavors that may be more or less tailored to the type of images you happen to be working with.