Tips on performing OCR - not getting desired results - python

So I have the following image:
I'm trying to extract three arrays:
var a = [30,31,32,35,37,40,44];
var b = [6,7,11,15,18,21,22];
var c = [5,11,15,18,23,37,28];
I tried feeding this image into tesseract ~/Desktop/test.png out to no avail:
9 % ooenesew #
5 ‘ 904399
And here is the result from ocrad ~/Desktop/test.ppm:
o
?
28
Can any OCR experts suggest what I might try next? I'm comfortable using Python/OpenCV, but will try anything.

If your images always look like in the example, you might have to do some tidy up to remove anything that is not a number (all the black background and the circle). Then the method described in the accepted answer on the linked question might be sufficient for your needs, since it looks like you are not dealing with different fonts and sizes:
Simple Digit Recognition OCR in OpenCV-Python

Related

Tesseract OCR fails to detect varying font size and letters that are not horizontally aligned

I am trying to detect these price labels text which is always clearly preprocessed. Although it can easily read the text written above it, it fails to detect price values. I am using python bindings pytesseract although it also fails to read from the CLI commands. Most of the time it tries to recognize the part where the price as one or two characters.
Sample 1:
tesseract D:\tesseract\tesseract_test_images\test.png output
And the output of the sample image is this.
je Beutel
13
However if I crop and stretch the price to look like they are seperated and are the same font size, output is just fine.
Processed image(cropped and shrinked price):
je Beutel
1,89
How do get OCR tesseract to work as I intended, as I will be going over a lot of similar images?
Edit: Added more price tags:
sample5 sample6 sample7
The problem is the image you are using is of small size. Now when tesseract processes the image it considers '8', '9' and ',' as a single letter and thus predicts it to '3' or may consider '8' and ',' as one letter and '9' as a different letter and so produces wrong output. The image shown below explains it.
A simple solution could be increasing its size by factor of 2 or 3 or even more as per the size of your original image and then passing to tesseract so that it detects each letter individually as shown below. (Here I increased its size by factor of 2)
Bellow is a simple python script that will solve your purpose
import pytesseract
import cv2
img = cv2.imread('dKC6k.png')
img = cv2.resize(img, None, fx=2, fy=2)
data = pytesseract.image_to_string(img)
print(data)
Detected text:
je Beutel
89
1.
Now you can simply extract the required data from the text and format it as per your requirement.
data = data.replace('\n\n', '\n')
data = data.split('\n')
dollars = data[2].strip(',').strip('.')
cents = data[1]
print('{}.{}'.format(dollars, cents))
Desired Format:
1.89
The problem is that the Tesseract engine was not trained to read this kind of text topology.
You can:
train your own model, and you'll need in particular to provide images with variations of topology (position of characters). You can actually use the same image, and shuffle the positions of the characters.
reorganize the image into clusters of text and use tesseract, in particular, I would consider the cents part and move it on the right of the coma, in that case you can use tesseract out of the box. Few relevant criterions would be the height of the clusters (to differenciate cents and integers), and the position of the clusters (read from the left to the right).
In general computer vision algorithms (including CNNs) are giving you tool to have a higher representation of an image (features or descriptors), but they fail to create a logic or an algorithm to process intermediate results in a certain way.
In your case that would be:
"if the height of those letters are smaller, it's cents",
"if the height, and vertical position is the same, it's about the
same number, either on left of coma, or on the right of coma".
The thing is that it's difficult to reach that through training, and at the same time it's extremely simple to write this for a human as an algorithm. Sorry for not giving you an actual implementation, but my text is the pseudo code.
TrainingTesseract2
TrainingTesseract4
Joint Unsupervised Learning of Deep Representations and Image Clusters

how to use simpleitk get inverse displacement field

I just moved from matlab to python recently so I can use simpleitk and sorry if this is a dumb question.
I have a transformation tx after demons registration using simpleitk. I wish to get the displacement field and its inverse by doing the following,
disp_field = tx.GetDisplacementField()
disp_field_inv = tx.GetInverseDisplacementField()
It turns out disp_field is exactly what I need --- an image volume of 256*256*176. But disp_field_inv is an empty array. Does anyone know why?
Then I tried the following,
disp_field_inv = sitk.InverseDisplacementField(disp_field,disp_field.GetSize(),disp_field.GetOrigin(),disp_field.GetSpacing(),
subsamplingFactor=16)
But python is just running like forever. Does anybody know how to do it properly?
The following is the specification for running the InvertDisplacementField procedural interface
Image itk::simple::InvertDisplacementField (const Image & image1,
uint32_t maximumNumberOfIterations = 10u,
double maxErrorToleranceThreshold = 0.1,
double meanErrorToleranceThreshold = 0.001,
bool enforceBoundaryCondition = true)
So I think that by you passing the
disp_field.GetSize(),disp_field.GetOrigin(),disp_field.GetSpacing(), subsamplingFactor=16
as parameters 2 to 5 means you are passing the interface not what is expected?
Try just running disp_field_inv = sitk.InverseDisplacementField(disp_field)
and see if it iterates to a result!
For what it's worth after all these years, just wanted to point out that the original question and (so far only) answer by g.stevo mix-up two different filters available in SimpleITK, namely:
sitk.InverseDisplacementField
sitk.InvertDisplacementField
Each of these procedural APIs and their respective image filters have different Execute function arguments.

variable limits when cutting pixels from a fits image with imcopy (IRAF) in python (pyraf)

I am trying to use pyraf to use the iraf's task imcopy in one code in python. My problem is that you have to specify the range in x and y in which you want to cut the image, but I want those limits to be variables, since everything is within a loop and I have to copy a several regions.
For example, I have this:
img_seg =raw_input('Name of the segmentation image? ')
iraf.imcopy(input=img_seg+'[200:220,300:400]',output='out')
But if I try with for example:
x1 = 200
x2 = 220
y1 = 300
y2 = 400
iraf.imcopy(input=img_seg+'[x1:x2,y1:y2]',output='out'),
that does not work. It gives me a syntax error and the message ERROR (1, "Number of input and output images not the same")
I have been trying for a while but I have not been able to make it. So it would me nice if someone can explain me how to do this, and thanks in advance!
ps. My anwer is similar to this one How to run a function on a list of objects in python/pyraf?, but the answer there is basically what is not working for me.
Ok, so the point was that IRAF needs to see only the interval, but you can use python to build the interval as is needed for IRAF.
So all one needs to do is:
iraf.imcopy(input=img_seg+'['+str(x1)+':'+str(x2)+','+str(y1)+':'+str(y2)+']',output='out')

How to find a template in an image using a mask (or transparency) with OpenCV and Python?

Let us assume we are looking for this template:
The corners of our template are transparent, so the background will vary, like so:
Assuming we could use the following mask with our template:
It would be very easy to find it.
What I have tried:
I have tried matchTemplate but it doesn't support masks (as far as I know), and using the alpha channel (transparency) in the template does not achieve this, as it compares the alpha channels instead of ignoring those pixels.
I have also looked into "region of interest", which I thought would be the solution, but with it you can only specify a rectangular area. I'm not even sure if it works on the template or not.
I'm sure this is possible to do by writing my own algorithm, but I was hoping this is possible via. standard OpenCV to avoid reinventing the wheel. Not to mention, it would most likely be more optimised than my own.
So, how could I do something like this with OpenCV + Python?
This could be achieved using only matchTemplate function, but a little workaround is needed.
Lets analyse the default metrics(CV_TM_SQDIFF_NORMED). According to matchTemplate documentation
the default metrics looks like this
R(x, y) = sum (I(x+x', y+y') - T(x', y'))^2
Where I is image matrix, T is template, R is result matrix. Summation is done over template coordinates x' and y',
So, lets alter this metrics by inserting weight matrix W, which has the same dimensions as
T.
Q(x, y) = sum W(x', y')*(I(x+x', y+y') - T(x', y'))^2
In this case, by setting W(x', y') = 0 you can actually make pixel be ignored. So, how to make such metrics? With simple math:
Q(x, y) = sum W(x', y')*(I(x+x', y+y') - T(x', y'))^2
= sum W(x', y')*(I(x+x', y+y')^2 - 2*I(x+x', y+y')*T(x', y') + T(x', y')^2)
= sum {W(x', y')*I(x+x', y+y')^2} - sum{W(x', y')*2*I(x+x', y+y')*T(x', y')} + sum{W(x', y')*T(x', y')^2)}
So, we divided Q metrics into tree separate sums. And all those sums could be calculated
with matchTemplate function (using CV_TM_CCORR method). Namely
sum {W(x', y')*I(x+x', y+y')^2} = matchTemplate(I^2, W, method=2)
sum{W(x', y')*2*I(x+x', y+y')*T(x', y')} = matchTemplate(I, 2*W*T, method=2)
sum{W(x', y')*T(x', y')^2)} = matchTemplate(T^2, W, method=2) = sum(W*T^2)
The last element is a constant, so, for minimisation it does not have any effect. On the other hand, it still might me useful to see if our template have perfect match (if Q is approaching to zero). Nonetheless, for last element we actually do not need matchTemplate function, since it could be calculated directly.
The final pseudocode looks like this:
result = matchTemplate(I^2, W, method=2) - matchTemplate(I, 2*W*T, method=2) + as.scalar(sum(W*T^2))
Does it really do exactly as defined? Mathematically yes.
Practically, there is some small rounding error, because matchTemplate function
works on 32-bit floating-point, but I believe it is not a big problem.
Please note, that you can extent analysis and have weighted equivalents for any metrics offered by matchTemplate.
This actually worked for me. I am sorry I don't give actual code. I am working in R, so
I don't have the code in Python. But idea is quite straightforward.
I hope this will help.
What worked for me the one time I needed this was to fill the "mask" areas with white noise. Then it gets effectively washed out of the correlation when looking for matches. Otherwise I got, as I presume you did, false matches on the masked areas.
One answer to your question is convolution. Use the template as kernel and filter the image.
The destination Mat will have dense bright areas where your template might be. You'll have to cluster the results (e.g. Mean-shift).
In that way, you'll have a very simplistic implementation of the Generalized Hough Transform or a Template-based convolution matching.
Imagemagick 7.0.3.9 now has a masked compare capability so that you can limit the template matching region. See http://www.imagemagick.org/discourse-server/viewtopic.php?f=4&t=31053
Also, I see that OpenCV 3.0 now has masked template matching. See http://docs.opencv.org/3.0.0/df/dfb/group__imgproc__object.html#ga586ebfb0a7fb604b35a23d85391329be
However, it is only for method == CV_TM_SQDIFF and method == CV_TM_CCORR_NORMED. see python opencv matchTemplate is mask feature implemented?
ImageMagick has logic for finding subimages in other images and it works quite well.
compare -verbose -dissimilarity-threshold 0.1 -subimage-search subimage bigimage
I've used it to find and blur watermarks off some products. Don't ask.
(Sometimes you have to do what you have to do..)
2021 Update: I've been trying to find a solution for transparency in templates throughout the day, and I think I finally found a way to do it. matchTemplate() has a mask parameter, which apparently works exactly like OP wants it to: ignore certain pixels from a template when searching for it in another image. And since my templates already contain transparency in them, I decided to use my template as both a template and mask parameter. Surprisingly, it worked.
I'm using JavaScript with opencv4nodejs, so the following python code snippet might be completely off, but the theory is there and I'm fairly positive it should work.
# Import OpenCV
import cv2 as cv
# Read both the image and the template
image = cv.imread("image.png", cv.IMREAD_COLOR)
template = cv.imread("template.png", cv.IMREAD_COLOR)
# Match with template as both template and mask parameter
result = cv.matchTemplate(image, template, cv.TM_CCORR_NORMED, None, template)
Here's a gist for JavaScript with opencv4nodejs if you're interested.
Now that I think about it, it seems really stupid and way too good to be true, but I've been getting good matches (0.98+) on most tests. Hope this helps!

How to find an image within another image using python

I'm trying to use python to determine if one (small) image is within another (large) image.
Any suggestions before I take myself completely down the wrong path?
/edit: Ok, some ideas: I'm using PIL, and I'm converting each image to the 'P' mode so I can compare each pixel as an integer. I'm trying to implement something like a Boyer–Moore string search or the Knuth–Morris–Pratt algorithm, but in 2 dimensions.
Maybe this will help: instead of searching for ABC in XXXABCXXX (answer=4) we are searching for
ABC
DEF
GHI
in
XXXXX
XABCX
XDEFX
XGHIX
XXXXX
(answer=(2,2))
EDIT: Ok, here is the naive way to do this:
import Image, numpy
def subimg(img1,img2):
img1=numpy.asarray(img1)
img2=numpy.asarray(img2)
#img1=numpy.array([[1,2,3],[4,5,6],[7,8,9]])
#img2=numpy.array([[0,0,0,0,0],[0,1,2,3,0],[0,4,5,6,0],[0,7,8,9,0],[0,0,0,0,0]])
img1y=img1.shape[0]
img1x=img1.shape[1]
img2y=img2.shape[0]
img2x=img2.shape[1]
stopy=img2y-img1y+1
stopx=img2x-img1x+1
for x1 in range(0,stopx):
for y1 in range(0,stopy):
x2=x1+img1x
y2=y1+img1y
pic=img2[y1:y2,x1:x2]
test=pic==img1
if test.all():
return x1, y1
return False
small=Image.open('small.tif')
big=Image.open('big.tif')
print subimg(small, big)
It works just fine, but I want to SPEED IT UP. I think the key is in the array 'test' which we might be able to use to skip some positions in the image.
Edit 2: Make sure you use images in a loss-less format to test this.
On Mac, install Pillow and from PIL import Image
Sikuli does it using OpenCV, see here how match_by_template works and then use the Python OpenCV bindings to do the same. Doing it without OpenCV should be hard, take a look at OpenCV documentation, search for template matching, etc...
pyautogui module does the job using pyautogui.locate(small_image, large_image) method which returns 4-integer tuple: (left, top, width, height).
I know it's a little late, but you can use Boyer-Moore to search for the first line of the small image in each of the lines of the large image. The moment you find a match you have the X and Y position and you just have to check if the remainder of the lines of the smaller image match the remainder of the lines of the larger image starting at position X and Y+1,2,3,... At the first mismatch continue with the search of the first line. I don't think you can get faster than this.
Have a look at my answer to a similar question for a code example using OpenCV. The conversion from PIL to numpy is straight forward, e.g. just use np.array(pilimage).

Categories