Tesseract can't recognize image after Clearing image incomprehensibly - python

I want to ocr two-digit-image after erasing right number for good accuracy.
Example) Original, Modified
The image is PNG file(52*26px) and background color is (192,192,192,255) and color of each digit is different.
But amazingly after erasig right number, tesseract cannot recognize the number.
Result:
original:60
left:
from PIL import Image
from pytesseract.pytesseract import *
im=Image.open('NA2WK.png')
#calculate far left xpos of color
color={}
for i in range(52):
for j in range(26):
if im.load()[i,j]!=(192,192,192,255):
if color.get(im.load()[i,j])==None:
color[im.load()[i,j]]=9999
if i<color[im.load()[i,j]]:
color[im.load()[i,j]]=i
#get color of left character
if color.values()[0]<color.values()[1]:
left=color.keys()[0]
right=color.keys()[1]
else:
left=color.keys()[1]
right=color.keys()[0]
#left processing
imleft=Image.open('test.png')
pix=imleft.load()
for i in range(52):
for j in range(26):
if pix[i,j]==(192,192,192,255) or pix[i,j]==right:
pix[i,j]=(255,255,255,255)
else:
pix[i,j]=(0,0,0,255)
print('original:'+image_to_string(im))
print('left:'+image_to_string(imleft))

Tesseract performs a connected component analysis internally. It does try to group blocks of text together and might be causing an issue due to lack of too many characters within the page. There is page segmentation mode wherein you can ask tesseract to treat the image as a single character. Try this approach it may give you the required results.

Related

Tesseract OCR fails to detect varying font size and letters that are not horizontally aligned

I am trying to detect these price labels text which is always clearly preprocessed. Although it can easily read the text written above it, it fails to detect price values. I am using python bindings pytesseract although it also fails to read from the CLI commands. Most of the time it tries to recognize the part where the price as one or two characters.
Sample 1:
tesseract D:\tesseract\tesseract_test_images\test.png output
And the output of the sample image is this.
je Beutel
13
However if I crop and stretch the price to look like they are seperated and are the same font size, output is just fine.
Processed image(cropped and shrinked price):
je Beutel
1,89
How do get OCR tesseract to work as I intended, as I will be going over a lot of similar images?
Edit: Added more price tags:
sample5 sample6 sample7
The problem is the image you are using is of small size. Now when tesseract processes the image it considers '8', '9' and ',' as a single letter and thus predicts it to '3' or may consider '8' and ',' as one letter and '9' as a different letter and so produces wrong output. The image shown below explains it.
A simple solution could be increasing its size by factor of 2 or 3 or even more as per the size of your original image and then passing to tesseract so that it detects each letter individually as shown below. (Here I increased its size by factor of 2)
Bellow is a simple python script that will solve your purpose
import pytesseract
import cv2
img = cv2.imread('dKC6k.png')
img = cv2.resize(img, None, fx=2, fy=2)
data = pytesseract.image_to_string(img)
print(data)
Detected text:
je Beutel
89
1.
Now you can simply extract the required data from the text and format it as per your requirement.
data = data.replace('\n\n', '\n')
data = data.split('\n')
dollars = data[2].strip(',').strip('.')
cents = data[1]
print('{}.{}'.format(dollars, cents))
Desired Format:
1.89
The problem is that the Tesseract engine was not trained to read this kind of text topology.
You can:
train your own model, and you'll need in particular to provide images with variations of topology (position of characters). You can actually use the same image, and shuffle the positions of the characters.
reorganize the image into clusters of text and use tesseract, in particular, I would consider the cents part and move it on the right of the coma, in that case you can use tesseract out of the box. Few relevant criterions would be the height of the clusters (to differenciate cents and integers), and the position of the clusters (read from the left to the right).
In general computer vision algorithms (including CNNs) are giving you tool to have a higher representation of an image (features or descriptors), but they fail to create a logic or an algorithm to process intermediate results in a certain way.
In your case that would be:
"if the height of those letters are smaller, it's cents",
"if the height, and vertical position is the same, it's about the
same number, either on left of coma, or on the right of coma".
The thing is that it's difficult to reach that through training, and at the same time it's extremely simple to write this for a human as an algorithm. Sorry for not giving you an actual implementation, but my text is the pseudo code.
TrainingTesseract2
TrainingTesseract4
Joint Unsupervised Learning of Deep Representations and Image Clusters

Defining color range for histologic image mask within HSV colorspace (Python, OpenCV, Image-Analysis):

In an effort to separate histologic slides into several layers based on color, I modified some widely distributed code (1) available through OpenCV's community. Our staining procedure marks different cell types of tissue cross sections with different colors (B cells are red, Macrophages are brown, background nuceli have a bluish color).
I'm interested in selecting only the magenta-colored and brown parts of the image.
Here's my attempt to create a mask for the magenta pigment:
import cv2
import numpy as np
def mask_builder(filename,hl,hh,sl,sh,vl,vh):
#load image, convert to hsv
bgr = cv2.imread(filename)
hsv = cv2.cvtColor(bgr, cv2.COLOR_BGR2HSV)
#set lower and upper bounds of range according to arguements
lower_bound = np.array([hl,sl,vl],dtype=np.uint8)
upper_bound = np.array([hh,sh,vh],dtype=np.uint8)
return cv2.inRange(hsv, lower_bound,upper_bound)
mask = mask_builder('sample 20 138 1.jpg', 170,180, 0,200, 0,230)
cv2.imwrite('mask.jpg', mask)
So far a trial and error approach has produced poor results:
The can anyone suggest a smarter method to threshhold within the HSV colorspace? I've done my best to search for answers in previous posts, but it seems that these color ranges are particularly difficult to define due to the nature of the image.
References:
Separation with Colorspaces: http://opencv-python-tutroals.readthedocs.org/en/latest/py_tutorials/py_imgproc/py_colorspaces/py_colorspaces.html
python opencv color tracking
BGR separation: http://www.pyimagesearch.com/2014/08/04/opencv-python-color-detection/
UPDATE:
I've found a working solution to my problem. I increased the lower bound of 'S' and 'V' by regular intervals using a simple FOR control structure, outputing the results for each test image and choosing the best. I found my lower bounds for S and V should be set at 100 and 125. This systematic method of trial and error produced better results:
I am happy you found your answer.
I will suggest an alternate method that might work. Unfortunately I am not proficient with python so you'll need to find out how to code that in python (its basic).
If I had the firs image you have after the HSV threshold, I would use morphological operations to get the information I want.
I would probably give it a go to "closing", but if it doesnt work I would first dilate, then fill and then erode the same amount firstly dilated.
Probably after this first step you'll need to delete the small "noise" blobs you have around and you'll get the image.
This is how it would be in Matlab (showing this mainly so you can see the results):
I=imread('http://i.stack.imgur.com/RlH4V.jpg');
I=I>230; % Create Black and white image (this is because in stackoverflow its a jpg)
ker=strel('square',3); % Create a 3x3 square kernel
I1=imdilate(I,ker); % Dilate
I2=imfill(I1,'holes'); % Close
I3=imerode(I2,ker); % Erode
Ilabel=bwlabel(I3,8); % Get a label per independent blob
% Get maximum area blob (you can do this with a for in python easily)
areas = regionprops(Ilabel,'Centroid','Area','PixelIdxList');
[~,index] = max([areas.Area]); % Get the maximum area
Imask=Ilabel==index; % Get the image with only the max area.
% Plot: This is just matlab code, no relevance
figure;
subplot(131)
title('Dialted')
imshow(I1);
subplot(132)
title('Closed')
imshow(I2);
subplot(133)
title('Eroded')
imshow(I3);
figure;
imshow(imread('http://i.stack.imgur.com/ZqrF9.jpg'))
hold on
h=imshow(bwperim(Imask));
set(h,'alphadata',Imask/2)
Note that I started from the "bad" HSV segmentation. If you try a better one the results may improve. Also, play with the kernel size for the erosion and dilation.
Through trial-and-error (incrementing down and up the "S" and "V" scales), I found that my desired colors require a relaxed range for "S" and "V" values. I'll refrain from sharing the particular values I use because I don't think anyone would find such information useful.
Note that the original code shared works fine once more representitive ranges are used.

Removing lines from an image a notebook for digit detection python

I need to remove the lines from the image below of numbers on a piece of ruled paper without causing my digits any distortion. Without this, my digit detection algorithm to fails as there are artefacts of the ruling lines of the piece of paper in the region of interest's.
a cleaner version of the file without any artefacts
Classic task for Fourier-domain transform.
Perform Fourier transform:
import numpy as np
from scipy.misc import imshow, imsave, imread
img = imread("10XIn.jpg")[:,:,:3]
imggray = np.mean(img, -1)
imfft = np.fft.fft2(imggray)
mags = np.abs(np.fft.fftshift(imfft))
angles = np.angle(np.fft.fftshift(imfft))
visual = np.log(mags)
visual2 = (visual - visual.min()) / (visual.max() - visual.min())*255
visual2 will look like following:
Note the diagonal line across the center - it represents your lines.
Now, I've manually created the mask for this line, but idealy you could filter it out programmatucally
Then we read the mask and paint out the line:
mask = imread("fftimg4_mask.jpg")[:,:,:3]
mask = (np.mean(mask,-1) > 20)
visual[mask] = np.mean(visual)
And then reverse the fft:
newmagsshift = np.exp(visual)
newffts = newmagsshift * np.exp(1j*angles)
newfft = np.fft.ifftshift(newffts)
imrev = np.fft.ifft2(newfft)
newim2 = 255 - np.abs(imrev).astype(np.uint8)
imsave("fftimg2.jpg", newim2 )
Here is newim2
Of course, you could do more accurate patching in fourier space and also you could apply the result back to the original image to keep colors, but I think this post illustrates the idea.
Okay, this might be a bitcomplicated as the color of the notebook lines is quite close to the color of digits, as it seems from your example. I presume, that the green boxes are you addition and not part of the data itself.
You don't state which framework you use, so I will provide only some general tips how to approach this problem.
First step would be some thresholding. You can use either binary thresholding or better some adaptive thresholding with correctly sized windows. You will have to experiment on this. Result of threshholding will be binary image. Still with lines.
Second step will be to use morphological operations to clear the image. If you are not sure what morphology is, look at this morphology tutorial.
Around half way through, there are some examples of removing lines from images. The biggest problem is, that some number also contain horizontal lines. So one option will be to use rather small morphology kernel (maybe 3 rows and 1 column), as the notebook lines are thinner. And update the recognizer, to recognize even distorted numbers. This should be doable, because all the digits will be distored in same way.
Another way to do it is to exploit the known structure.
deskew the image (skew can be found with Hough transform in opencv)
locate peaks in row sums
physically clone pixels above and below lines
I just implemented this for another dataset, example attached. This could be tuned further.

Cases where Morphological Opening and Closing yields the same results?

I would like to know if there are any examples or cases where Opening and Closing Morphology operations on an single image produce the same results.
As an example, let's say we have an image X, and we have done opening operation to produce Y. Similarly, we have done a closing operation on the original X to produce the same Y. I would like to know if there are examples for these type of images X. Programming examples in Python or MATLAB are also appreciated.
Yes there are. As one small example, if you had a binary image where it consists of a bunch of squares that are disconnected and distinct. Provided that you specify a structuring element that is square, and choosing the structuring element so that it is smaller than the smallest square in the image, then doing either operation will give you the same results.
If you did an opening on this image and a closing on this image, you will produce the same results. Remember, an opening is an erosion followed by a dilation where a closing is a dilation followed by an erosion. In terms of analyzing the shapes, erosion slightly shrinks the area of the image while dilation slightly enlarges it.
By doing an erosion followed by a dilation (opening), you're shrinking the object and then growing it again. This will bring the image back to where it was before, provided that you choose the structuring element like what we talked about before. Similarly, if you did an dilation followed by an erosion (closing), you're growing the object and then shrinking it again, also bringing the image back to where it was before... following that same guideline I just talked about of course.
If you were to choose a structuring element where it is larger than the smallest object, doing an opening will remove this object from the image, and so you won't get the original image back. Also, you need to make sure that the objects are well far away from each other, and that the size of the structuring element does not overlap any of the objects as you slide over and do the morphology operations. The reason why is because if you were to do a closing, you would join these two objects together and so that won't get you the same results either!
Here's an example image that I generated that is binary:
To generate this image in MATLAB, you can do:
A = false(200,200);
A(30:60,30:60) = true;
A(90:110,90:110) = true;
A(10:30, 135:155) = true;
A(150:180,100:120) = true;
In Python, you can do this with numpy:
import numpy as np
A = np.zeros((200,200), dtype='uint8')
A[29:60,29:60] = 255
A[89:110,89:110] = 255
A[9:30, 134:155] = 255
A[149:180, 99:120] = 255
The reason why I had to create the array as uint8 in numpy is because when we want to show this image, I'm going to use OpenCV and it requires that the image be at least a uint8 type.
Now, let's choose a 5 x 5 square structuring element, and let's perform a closing and an opening with this image. We will display the results in a single figure going from left to right:
se = strel('square', 5);
A_close = imclose(A, se);
A_open = imopen(A, se);
figure;
subplot(1,3,1);
imshow(A);
title('Original');
subplot(1,3,2);
imshow(A_close);
title('Closed');
subplot(1,3,3);
imshow(A_open);
title('Open');
This is the result:
It certainly looks the same! To really show the difference, let's subtract the closed and opened result from the original image. You should get a blank image in the end if they're both equal to the original image.
figure;
subplot(1,2,1);
imshow(abs(double(A) - double(A_close)));
subplot(1,2,2);
imshow(abs(double(A) - double(A_open)));
Bear in mind that I converted the images to double to facilitate subtraction, and I used abs to ensure that negative differences are reflected. This is what I get:
As you can see, both results are totally blank, meaning they're exact copies of the original image after each result.
The equivalent code in Python for the first part is the following:
import cv2
se = np.ones((5,5), dtype='uint8')
A_close = cv2.morphologyEx(A, cv2.MORPH_CLOSE, se)
A_open = cv2.morphologyEx(A, cv2.MORPH_OPEN, se)
cv2.imshow('Original', A)
cv2.imshow('Close', A_close)
cv2.imshow('Open', A_open)
cv2.waitKey(0)
cv2.destroyAllWindows()
Here's what I get:
You'll need to install the OpenCV package for this Python code. I displayed all of the images as three separate figures, then left the windows there until you choose any one of them and push a key. Once you do this, all of the windows will close. If you want to show the subtraction stuff, this is the code in Python:
A_close_diff = A - A_close
A_open_diff = A - A_open
cv2.imshow('Close Diff', A_close_diff)
cv2.imshow('Open Diff', A_open_diff)
cv2.waitKey(0)
cv2.destroyAllWindows()
I didn't name the figures in MATLAB because what we're showing is obvious, but for OpenCV, you need to name the windows, and so I put names that describe what we're showing for each. I also didn't need to take the absolute value, because in numpy, doing arithmetic operations that result in an overflow or underflow will simply wrap around itself, while for MATLAB, the values get clipped. That's why for MATLAB, I needed to convert to double and take the absolute value because imshow doesn't display negative intensities or if we were to have a situation where we did 0 - 1, the output would be 0 and you wouldn't be able to show that this location has a difference. With Python, doing 0 - 1 for uint8, will result in 255, so we can certainly see a difference here.... so there's no need to do any of this abs and casting stuff that we did in MATLAB. Here's what I get:
In general, you can reproduce what I did with any kind of shape and any size shape, so long as you choose a structuring element that mimics the properties of the shape that is in your image, and you choose a structuring element that is smaller than the smallest shape seen in that image. I'm sure there are many more examples that don't have to follow these specific guidelines, but this is the best example that I can think of at this moment.
This should hopefully get you started.
Good luck!
Yes, there are such images. One of the properties of opening (it's mentioned in wiki article, for example) is that it is an anti-extensive operation, i.e. if Y is opening of X, then Y ⊆ X. Similarly, closing is an extensive operation (i.e. X ⊆ Y), therefore for any such image X = Y. Any image invariant to both opening and closing will satisfy your requirement (and, as I have just shown, only such images will).
Concrete examples depend on structuring element used when performing erosion or dilation. For example, if it is a square n x n matrix with all elements equal to 1, then any rectangle with both height and width greater than n (and located far enough, i.e. at least n/2 pixels, from image edges) will satisfy this requirement.

PIL - Identifying an object with a virtual box

I have an image (sorry cannot link it for copyright purposes) that has a character outlined in a black line. The black line that outlines the character is the darkest thing on the picture (planned on using this fact to help find it). What I need to do is obtain four coordinates that draw a virtual box around the character. The box should be as small as possible while still keeping the outlined character inside its contents. I intend on using the box to help pinpoint what would be the central point of the character's figure by using the center point of the box.
I started with trying to identify parts of the outline. Since it's the darkest line on the image, I used getextrema() to obtain at least one point on the outline, but I can't figure out how to get more points and then combine those points to make a box.
Any insight into this problem is greatly appreciated. Cheers!
EDIT *
This is what I have now:
im = Image.open("pic.jpg")
im = im.convert("L")
lo, hi = im.getextrema()
im = im.point(lambda p: p == lo)
rect = im.getbbox()
x = 0.5 * (rect[0] + rect[2])
y = 0.5 * (rect[1] + rect[3])
It seems to be pretty consistent to getting inside the figure, but it's really not that close to the center. Any idea why?
Find an appropriate threshold that separates the outline from the rest of the image, perhaps using the extrema you already have. If the contrast is big enough this shouldn't be too hard, just add some value to the minimum.
Threshold the image with the value you found, see this question. You want the dark part to become white in the binary thresholded image, so use a smaller-than threshold (lambda p: p < T).
Use thresholdedImage.getbbox() to get the bounding box of the outline

Categories