I would like to get the coordinates of framed text on an image. The paragraphs have thin black borders. The rest of the image contains usual paragraphs and sketchs.
Here is an example:
Do you have any idea of what kind of algorithms should I use in Python with an image library to achieve this ? Thanks.
A few ideas to detect a framed text which largely comes down to searching boxes/rectangles of substantial size:
find contours with OpenCV, analyze shapes using cv2.approxPolyDP() polygon approximation algorithm (also known as Ramer–Douglas–Peucker algorithm). You could additionally check the aspect ratio of the bounding box to make sure the shape is a rectangle as well as check the page width as this seems to be a known metric in your case. PyImageSearch did this amazing article:
OpenCV shape detection
in a related question, there is also a suggestion to look into Hough Lines to detect a horizontal line, taking a turn a detecting vertical lines the same way. Not 100% sure how reliable this approach would be.
Once you find the box frames, the next step would be to check if there is any text inside them. Detecting text is a broader problem in general and there are many ways of doing it, here are a few examples:
apply EAST text detector
PixelLink
tesseract (e.g. via pytesseract) but not sure if this would not have too many false positives
if it is a simpler case of boxes being empty or not, you could check for average pixel values inside - e.g. with cv2.countNonZero(). Examples:
How to identify empty rectangle using OpenCV
Count the black pixels using OpenCV
Additional references:
ideas on quadrangle/rectangle detection using convolutional neural networks
I am trying to analyse an image and extract each number to then process using a CNN trained with MNIST. The images show garments with a grid-like pattern in each intersection of the grid there is a number (e.g. 0412). I want to analyse and detect which number it is to then store it's coordinates. Does anyone have any recommendations on how to preprocess the image given that it is quite noisy and with multiple numbers. I have tried using contours and it didn't work. I also put the image into binary and there are areas of the image which are unreadable. My initial idea was to isolate each number to then process.
Thanks in advance!
I try to segment captcha with overlapping characters, but nothing works at all.
I have read some articles concerning character segmentation and tried to implement an algorithm of summing over pixels by columns and finding local minima which should constitute to start of different character. However, the algorithm doesn't work as characters are very skewed.
Also I tried to erode away overlaps, but it ends up completely eroding significant part of text.
Here are some examples:
img = cv.imread('captcha.png')
cv.threshold(img, 127, 255, cv.THRESH_BINARY_INV)
gray = FindDividingCols(gray)
### algo for summing over pixels and finding local minima:
col_pix = np.apply_along_axis(lambda row: np.sum(row)//255, 0, img)
loc_min = np.r_[True, lst[1:] < lst[:-1]] & np.r_[lst[:-1] < lst[1:],True]
I would like to know what did I miss, or what other ways there exist to segment?
if you really want and need to segment these heavily distorted letters into different character by character segmented input for the Neural Network to be detected then the best (and i think only way) is via the same Neural Network to segment them into different entity. so you would end up having 2 neural networks
1- for segmentation
2- for detection
These captchas are deliberately distorted to make it extremely hard for OCR algorithms to read them. If it would be reasonably easy to do, the captcha would be pointless. Thus, you probably have a problem that requires a lot of research and work to solve; I don't think Stack Overflow will readily provide an answer (and if it does, the captchas will get harder) ;)
I am working on a project where I have to read the document from an image. In initial stage I will read the machine printed documents and then eventually move to handwritten document's image. However I am doing this for learning purpose, so I don't intend to use apis like Tesseract etc.
I intend to do in steps:
Preprocessing(Blurring, Thresholding, Erosion&Dilation)
Character Segmentation
OCR (or ICR in later stages)
So I am doing the character segmentation right now, I recently did it through the Horizontal and Vertical Histogram. I was not able to get very good results for some of the fonts, like the image as shown I was not able to get good results.
Is there any other method or algorithm to do the same?
Any help will be appreciated!
Edit 1:
The result I got after detecting blobs using cv2.SimpleBlobDetector.
The result I got after using cv2.findContours.
A first option is by deskewing, i.e. measuring the skew angle. You can achieve this for instance by Gaussian filtering or erosion in the horizontal direction, so that the characters widen and come into contact. Then binarize and thin or find the lower edges of the blobs (or directly the directions of the blobs). You will get slightly oblique line segments which give you the skew direction.
When you know the skew direction, you can counter-rotate to perform de-sekwing. The vertical histogram will then reliably separate the lines, and you can use an horizontal histogram in each of them.
A second option, IMO much better, is to binarize the characters and perform blob detection. Then proximity analysis of the bounding boxes will allow you to determine chains of characters. They will tell you the lines, and where spacing is larger, delimit the words.
Shamelessly jumping on the bandwagon :-)
Inspired by How do I find Waldo with Mathematica and the followup How to find Waldo with R, as a new python user I'd love to see how this could be done. It seems that python would be better suited to this than R, and we don't have to worry about licenses as we would with Mathematica or Matlab.
In an example like the one below obviously simply using stripes wouldn't work. It would be interesting if a simple rule based approach could be made to work for difficult examples such as this.
I've added the [machine-learning] tag as I believe the correct answer will have to use ML techniques, such as the Restricted Boltzmann Machine (RBM) approach advocated by Gregory Klopper in the original thread. There is some RBM code available in python which might be a good place to start, but obviously training data is needed for that approach.
At the 2009 IEEE International Workshop on MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP 2009) they ran a Data Analysis Competition: Where's Wally?. Training data is provided in matlab format. Note that the links on that website are dead, but the data (along with the source of an approach taken by Sean McLoone and colleagues can be found here (see SCM link). Seems like one place to start.
Here's an implementation with mahotas
from pylab import imshow
import numpy as np
import mahotas
wally = mahotas.imread('DepartmentStore.jpg')
wfloat = wally.astype(float)
r,g,b = wfloat.transpose((2,0,1))
Split into red, green, and blue channels. It's better to use floating point arithmetic below, so we convert at the top.
w = wfloat.mean(2)
w is the white channel.
pattern = np.ones((24,16), float)
for i in xrange(2):
pattern[i::4] = -1
Build up a pattern of +1,+1,-1,-1 on the vertical axis. This is wally's shirt.
v = mahotas.convolve(r-w, pattern)
Convolve with red minus white. This will give a strong response where the shirt is.
mask = (v == v.max())
mask = mahotas.dilate(mask, np.ones((48,24)))
Look for the maximum value and dilate it to make it visible. Now, we tone down the whole image, except the region or interest:
wally -= .8*wally * ~mask[:,:,None]
imshow(wally)
And we get !
You could try template matching, and then taking down which produced the highest resemblance, and then using machine learning to narrow it more. That is also very difficult, and with the accuracy of template matching, it may just return every face or face-like image. I am thinking you will need more than just machine learning if you hope to do this consistently.
maybe you should start with breaking the problem into two smaller ones:
create an algorithm that separates people from the background.
train a neural network classifier with as many positive and negative examples as possible.
those are still two very big problems to tackle...
BTW, I would choose c++ and open CV, it seems much more suited for this.
This is not impossible but very difficult because you really have no example of a successful match. There are often multiple states(in this case, more examples of find walleys drawings), you can then feed multiple pictures into an image reconization program and treat it as a hidden markov model and use something like the viterbi algorithm for inference ( http://en.wikipedia.org/wiki/Viterbi_algorithm ).
Thats the way I would approach it, but assuming you have multiple images that you can give it examples of the correct answer so it can learn. If you only have one picture, then I'm sorry there maybe another approach you need to take.
I recognized that there are two main features which are almost always visible:
the red-white striped shirt
dark brown hair under the fancy cap
So I would do it the following way:
search for striped shirts:
filter out red and white color (with thresholds on the HSV converted image). That gives you two mask images.
add them together -> that's the main mask for searching striped shirts.
create a new image with all the filtered out red converted to pure red (#FF0000) and all the filtered out white converted to pure white (#FFFFFF).
now correlate this pure red-white image with a stripe pattern image (i think all the waldo's have quite perfect horizontal stripes, so rotation of the pattern shouldn't be necessary). Do the correlation only inside the above mentioned main mask.
try to group together clusters which could have been resulted from one shirt.
If there are more than one 'shirts', to say, more than one clusters of positive correlation, search for other features, like the dark brown hair:
search for brown hair
filter out the specific brown hair color using the HSV converted image and some thresholds.
search for a certain area in this masked image - not too big and not too small.
now search for a 'hair area' that is just above a (before) detected striped shirt and has a certain distance to the center of the shirt.
Here's a solution using neural networks that works nicely.
The neural network is trained on several solved examples that are marked with bounding boxes indicating where Wally appears in the picture. The goal of the network is to minimize the error between the predicted box and the actual box from training/validation data.
The network above uses Tensorflow Object Detection API to perform training and predictions.