Character segmentation of an image with overlapping parts

Character segmentation of an image with overlapping parts - python

I try to segment captcha with overlapping characters, but nothing works at all.
I have read some articles concerning character segmentation and tried to implement an algorithm of summing over pixels by columns and finding local minima which should constitute to start of different character. However, the algorithm doesn't work as characters are very skewed.
Also I tried to erode away overlaps, but it ends up completely eroding significant part of text.
Here are some examples:
img = cv.imread('captcha.png')
cv.threshold(img, 127, 255, cv.THRESH_BINARY_INV)
gray = FindDividingCols(gray)
### algo for summing over pixels and finding local minima:
col_pix = np.apply_along_axis(lambda row: np.sum(row)//255, 0, img)
loc_min = np.r_[True, lst[1:] < lst[:-1]] & np.r_[lst[:-1] < lst[1:],True]
I would like to know what did I miss, or what other ways there exist to segment?

if you really want and need to segment these heavily distorted letters into different character by character segmented input for the Neural Network to be detected then the best (and i think only way) is via the same Neural Network to segment them into different entity. so you would end up having 2 neural networks
1- for segmentation
2- for detection

These captchas are deliberately distorted to make it extremely hard for OCR algorithms to read them. If it would be reasonably easy to do, the captcha would be pointless. Thus, you probably have a problem that requires a lot of research and work to solve; I don't think Stack Overflow will readily provide an answer (and if it does, the captchas will get harder) ;)

Related

Image segmentation with a wide range of intensity values (Otsu thresholding)

I have raw microscopy images like this:
And I want to segment the objects, as you see some of them are really close and I have a great range of intensity values.
background: 700 a.u.
fluorescent shapes: from 7000 to 32000 a.u.
To segment them I use Otsu binary segmentation from skimage package (without prior processing of the image)
thresh, imgthresh=cv2.threshold(image, 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)
The result is pretty good, but still fails in detecting the brightest shapes as individual objects.
I have tried a lot of things: watershed algorithm, image preprocessing (blurring), eroding , adaptive thresholding, but nothing works properly since the main problem is the difference in fluorescent values of the image.
Any smart idea on how to solve this?

Because your data have such a large range in intensity values, single histogram based methods on the whole image (e.g. Otsu) are going to have a little trouble accomplishing this task. I think that your best bet is going to be either:
threshold_multiotsu: and choose number of classes based on number of 'clusters' of intensities. Unfortunately, you will likely need to alter the number of classes on an image by image basis so this isn't super robust.
threshold_local: I know you said that you tried this but you might revisit this and alter the block_size parameter until you get something that looks reasonable. Based on your example images (and assuming a little bit about why the objects in your example images are green) it looks like that objects in close spatial proximity to one another generally have similar intensity values. Furthermore, you likely won't have to go through and alter the parameters as much as you would in option 1.
I suspect that these will be the simplest and most straight forward approaches but you could also delve into identifying the object edges using something from skimage.feature and then filling objects. Maybe something like outline here: https://scikit-image.org/docs/stable/auto_examples/features_detection/plot_blob.html. This will be a bit more involved, but these methods should be more robust with identifying objects with largely varied intensity values.
If all else fails you can try a couple of SOTA packages. The main ones that I am thinking of are https://github.com/stardist/stardist and https://github.com/MouseLand/cellpose but these seem like a bit of overkill based on your example data here.

OpenCV find subjective contours like the human eye does

When humans see markers suggesting the form of a shape, they immediately perceive the shape itself, as in https://en.wikipedia.org/wiki/Illusory_contours. I'm trying to accomplish something similar in OpenCV in order to detect the shape of a hand in a depth image with very heavy noise. In this question, assume that skin color based detection is not working (actually it is the best I've achieved so far but it is not robust under changing light conditions, shadows or skin colors. Also various paper shapes (flat and colorful) are on the table, confusing color-based approaches. This is why I'm attempting to use the depth cam instead).
Here's a sample image of the live footage that is already pre-processed for better contrast and with background gradient removed:
I want to isolate the exact shape of the hand from the rest of the picture. For a human eye this is a trivial thing to do. So here are a few attempts I did:
Here's the result with canny edge detection applied. The problem here is that the black shape inside the hand is larger than the actual hand, causing the detected hand to overshoot in size. Also, the lines are not connected and I fail at detecting contours.
Update: Combining Canny and a morphological closing (4x4 px ellipse) makes contour detection possible with the following result. It is still waaay too noisy.
Update 2: The result can be slightly enhanced by drawing that contour to an empty mask, save that in a buffer and re-detect yet another contour on a merge of three buffered images. The line that combines the buffered images is is hand_img = np.array(np.minimum(255, np.multiply.reduce(self.buf)), np.uint8) which is then morphed once again (closing) and finally contour detected. The results are slightly less horrible than in the picture above but laggy instead.
Alternatively I tried to use an existing CNN (https://github.com/victordibia/handtracking) for detecting the approximate position of the hand's center (this step works) and then flood from there. In order to detect contours the result is put into an OTSU filter and then the largest contour is taken, resulting in the following picture (ignore black rectangles in the left). The problem is that some of the noise is flooded as well and the results are mediocre:
Finally, I tried background removers such as MOG2 or GMG. They are confused by the enormous amount of fast-moving noise. Also they cut off the fingertips (which are crucial for this project). Finally, they don't see enough details in the hand (8 bit plus further color reduction via equalizeHist yield a very poor grayscale resolution) to reliably detect small movements.
It's ridiculous how simple it is for a human to see the exact precise shape of the hand in the first picture and how incredibly hard it is for the computer to draw a shape.
What would be your recommended method to achieve an exact hand segmentation?

After two days of desperate testing, the solution was to VERY carefully apply thresholding to an well-preprocessed image.
Here are the steps:
Remove as much noise as you possibly can. In my case, denoising was done using Intel's pyrealsense2 (I'm using an Intel RealSense depth camera and the algorithms were written for that camera family, thus they work very well). I used rs.temporal_filter() and directly after rs.hole_filling_filter() on every frame.
Capture the very first frame. Besides capturing the exact distance to the table (for later thresholding), this step also saves a still picture that is blurred by a 100x100 px kernel. Since the camera is never mounted perfectly but slightly tilted, there's an ugly grayscale gradient going over the picture and making operations impossible. This still picture is then subtracted from every single later frame, eliminating the gradient. BTW: this gradient removal step is already incorporated in the screenshots shown in the question above
Now the picture is almost noise-free. Do not use equalizeHist. This does not simply increase the general contrast regularly but instead empathizes the remaining noise way too much. This was my main error I did in almost all experiments. Instead, apply a threshold (binary with fixed border) directly. The border is extremely thin, setting it at 104 instead of 205 makes a huge difference.
Invert colors (unless you have taken BINARY_INV in the previous step), apply contours, take the largest one and write it to a mask
Voilà!

Erosion without losing regions

I have an image containing cells. I can't provide it, but it is similar to the image used as an example here: http://blogs.mathworks.com/steve/2006/06/02/cell-segmentation/ but without the characteristic nuclei.
I have done some processing and am now left with a pretty good segmentation, but some cells are close to each other and I need to split them. Most of them consist of more or less overlapping ellipses.
I am certain that a few iterations of simple erosion will split almost all of those regions. But some of the other cells are so small, they will disappear before the others split. Therefore I need an algorithm that erodes the image, allowing region splitting, but does not delete the last pixel of a region.
I want to use watershed afterwards to segment the cells.
I guess I could implement this on my own by searching for cennected regions and then tracking that I don't lose any or something like that, but the implementation seems messy even in my head and I think there must be an easier way. So my question is basically, what's the name of this so I can google an implementation? Or if there is no off-the-shelf solution, what's an elegant way of implementing this without dozens of iterations and for loops etc.
(Language is python)

It's a classical problem, and if the overlap between cells is too important, let's say 40% or more, then there is not a good solution.
However, if the overlap is not important, here is the solution:
You start from the segmentation you have, let's call it S
You computer the ultimate eroded UE(S). It will give you the center of each cell. It will give you something like the red points on this image. In this image, they use a distance map, an ultimate eroded will be more stable. If there are still many red points per cell, then a dilation of the UE(S) will fix your problem like this example.
You invert Inv(S) or compute the voronoi diagram Voi(S) in order to have a marker in the background.
Watershed on the gradient image of S, using the UE(S) as inner marker (perfect because you have one point by cell) and Inv(S) or Voi(S) as background/outer marker.
You will get something like this example.

Segment, crop (bounding boxes) and labelling characters with openCV

l have a set of images which represent a sequence of characters. l'm wonderning whether OpenCV or other techniques can segment and crop each character from the image. for instance :
l have as input
l want to get :
is 5
is 0
is 4
is 1
is 9
is 2

You have two problems here for going from your input to your output :
The first is seperating your characters. If your images always look like this, with numbers neatly seperated, then you should have no problem at all seperating them using findContours or connectedComponents, maybe along with a bounding box function like minAreaRect.
The second problem is once you have seperated your digits, how to tell which digit the image represents. This problem has a name : OCR.
If you have a lot of images, it is also possible to train a classification algorithm, as your tagging of this question suggests. The "hot topic" right now is deep learning with neural networks, but for simple applications, regular machine learning classification with hand-designed features might do the trick.

If you want to segment the numbers, I would first try to play with opening operations (because your letters are black on a white background, it would be closing if it was the opposite) in order to fill the holes that you have in your numbers. Then I would project vertically the pixels and analyze the shape that you get. If you find the valley points in this projected shape you will get the vertical limits between characters. You can do the same horizontally to get the upper and bottom limits of your chars. This approach will only work if the text is horizontal.
Then you could use an standard OCR library or go for deep learning. Since these number appear to be from MNIST dataset, you will find a lot of examples to do OCR using deep learning or other techniques with this dataset:
http://yann.lecun.com/exdb/mnist/

How do I find Wally with Python?

Shamelessly jumping on the bandwagon :-)
Inspired by How do I find Waldo with Mathematica and the followup How to find Waldo with R, as a new python user I'd love to see how this could be done. It seems that python would be better suited to this than R, and we don't have to worry about licenses as we would with Mathematica or Matlab.
In an example like the one below obviously simply using stripes wouldn't work. It would be interesting if a simple rule based approach could be made to work for difficult examples such as this.
I've added the [machine-learning] tag as I believe the correct answer will have to use ML techniques, such as the Restricted Boltzmann Machine (RBM) approach advocated by Gregory Klopper in the original thread. There is some RBM code available in python which might be a good place to start, but obviously training data is needed for that approach.
At the 2009 IEEE International Workshop on MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP 2009) they ran a Data Analysis Competition: Where's Wally?. Training data is provided in matlab format. Note that the links on that website are dead, but the data (along with the source of an approach taken by Sean McLoone and colleagues can be found here (see SCM link). Seems like one place to start.

Here's an implementation with mahotas
from pylab import imshow
import numpy as np
import mahotas
wally = mahotas.imread('DepartmentStore.jpg')
wfloat = wally.astype(float)
r,g,b = wfloat.transpose((2,0,1))
Split into red, green, and blue channels. It's better to use floating point arithmetic below, so we convert at the top.
w = wfloat.mean(2)
w is the white channel.
pattern = np.ones((24,16), float)
for i in xrange(2):
pattern[i::4] = -1
Build up a pattern of +1,+1,-1,-1 on the vertical axis. This is wally's shirt.
v = mahotas.convolve(r-w, pattern)
Convolve with red minus white. This will give a strong response where the shirt is.
mask = (v == v.max())
mask = mahotas.dilate(mask, np.ones((48,24)))
Look for the maximum value and dilate it to make it visible. Now, we tone down the whole image, except the region or interest:
wally -= .8*wally * ~mask[:,:,None]
imshow(wally)
And we get !

You could try template matching, and then taking down which produced the highest resemblance, and then using machine learning to narrow it more. That is also very difficult, and with the accuracy of template matching, it may just return every face or face-like image. I am thinking you will need more than just machine learning if you hope to do this consistently.

maybe you should start with breaking the problem into two smaller ones:
create an algorithm that separates people from the background.
train a neural network classifier with as many positive and negative examples as possible.
those are still two very big problems to tackle...
BTW, I would choose c++ and open CV, it seems much more suited for this.

This is not impossible but very difficult because you really have no example of a successful match. There are often multiple states(in this case, more examples of find walleys drawings), you can then feed multiple pictures into an image reconization program and treat it as a hidden markov model and use something like the viterbi algorithm for inference ( http://en.wikipedia.org/wiki/Viterbi_algorithm ).
Thats the way I would approach it, but assuming you have multiple images that you can give it examples of the correct answer so it can learn. If you only have one picture, then I'm sorry there maybe another approach you need to take.

I recognized that there are two main features which are almost always visible:
the red-white striped shirt
dark brown hair under the fancy cap
So I would do it the following way:
search for striped shirts:
filter out red and white color (with thresholds on the HSV converted image). That gives you two mask images.
add them together -> that's the main mask for searching striped shirts.
create a new image with all the filtered out red converted to pure red (#FF0000) and all the filtered out white converted to pure white (#FFFFFF).
now correlate this pure red-white image with a stripe pattern image (i think all the waldo's have quite perfect horizontal stripes, so rotation of the pattern shouldn't be necessary). Do the correlation only inside the above mentioned main mask.
try to group together clusters which could have been resulted from one shirt.
If there are more than one 'shirts', to say, more than one clusters of positive correlation, search for other features, like the dark brown hair:
search for brown hair
filter out the specific brown hair color using the HSV converted image and some thresholds.
search for a certain area in this masked image - not too big and not too small.
now search for a 'hair area' that is just above a (before) detected striped shirt and has a certain distance to the center of the shirt.

Here's a solution using neural networks that works nicely.
The neural network is trained on several solved examples that are marked with bounding boxes indicating where Wally appears in the picture. The goal of the network is to minimize the error between the predicted box and the actual box from training/validation data.
The network above uses Tensorflow Object Detection API to perform training and predictions.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.