Python Tesseract OCR isn't properly detecting both numbers - python

This is the image I'm using. This is the output that was returned with print(pytesseract.image_to_string(img)):
#& [COE] daKizz
6 Grasslands
Q Badlands
#g Swamp
VALUE: # 7,615,; HIRE)
I've tried using the config second parameter (from psm 0 - 13, and "digits") but it doesn't make much difference. I'm considering looking into training tesseract to recognize the type of numbers in an image like that but I'm hoping there are other options.
I've also tried cleaning up the picture but I'm getting the same output
How can I fix this?

I guess you are missing the image processing part.
From the documentation "Improving the quality of the output" you may be interested in Image Processing section.
One way of reaching the desired numbers is inRange thresholding
To apply in-range, we first need to convert the image in hsv colorspace. Since we want the image between the range of pixels.
The result will be:
Now if you read it with "digits" option:
53.8.
53.8.
453.3.
7.615.725.834
One possible question is: Why inRange threshold?
Well, if you try with the other methods (simple or adaptive) you will see inRange is more accurate for the given input image.
Another possible question is: Can we use the values of the cv2.inRange method for other images.
You may not get the desired result. For other images you need to find other range values.
Code:
import cv2
import pytesseract
import numpy as np
bgr_image = cv2.imread("vNYcf.jpg")
hsv_image = cv2.cvtColor(bgr_image, cv2.COLOR_BGR2HSV)
mask = cv2.inRange(hsv_image, np.array([0, 180, 218]), np.array([60, 255, 255]))
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 3))
dilate = cv2.dilate(mask, kernel, iterations=1)
thresh = 255 - cv2.bitwise_and(dilate, mask)
text = pytesseract.image_to_string(thresh, config="--psm 6 digits")
print(text)

Related

How to improve python/tesseract Image to Text accuracy?

How can I grab an image from a region and properly use tesseract to translate to text? I got this currently:
img = ImageGrab.grab(bbox =(1341,182, 1778, 213))
tesstr = pytesseract.image_to_string(np.array(img), lang ='eng')
print (tesstr)
Issue is that it translates it incredibly wrong because the region it's getting the text from is in red with blue background, how can I improve its accuracy? Example of what it's trying to turn from image to text:
*Issue is that it translates it incredibly wrong because the region it's getting the text from is in red with blue background, how can I improve its accuracy? *
You should know the Improving the quality of the output. You need to try each of the suggested method listed. If you still can't achieve the desired result, you should look at the other methods:
Thresholding Operations using inRange
Changing Colorspaces
Image segmentation
To get the desired result, you need to get the binary mask of the image. Both simple threshold, and adaptive-threshold won't work for the input image.
To get the binary mask
Up-sample and convert input image to the HSV color-space
Set lower and higher color boundaries.
Result:
The OCR output for 0.37 version will be:
Day 20204, 16:03:12: Your ‘Metal Triangle Foundation’
was destroved!
Code:
import cv2
import numpy as np
import pytesseract
# Load the image
img = cv2.imread("b.png")
# Up-sample
img = cv2.resize(img, (0, 0), fx=2, fy=2)
# Convert to HSV color-space
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
# Get the binary mask
msk = cv2.inRange(hsv, np.array([0, 0, 123]), np.array([179, 255, 255]))
# OCR
txt = pytesseract.image_to_string(msk)
print(txt)
# Display
cv2.imshow("msk", msk)
cv2.waitKey(0)
There is an option in the Tesseract API such that you are able to increase the DPI at which you examine the image to detect text. Higher the DPI, hihger the precision, till diminishing returns set in. More processing power is required. DPI should not exceed original image DPI.

Pytesseract Image to String issue

Does anyone know how I can get these results better?
Total Kills: 15,230,550
Kill Details: (recorded after 2019/10,/Z3]
993,151 331,129
1,330,450 33,265,533
5,031,168
This is what it returns however it is meant to be the same as the image posted below, I am new to python so are there any parameters that I can add to make it read the image better?
img = cv2.imread("kills.jpeg")
text = pytesseract.image_to_string(img)
print(text)
This is my code to read the image, Is there anything I can add to make it read better? Also, the black boxes are to cover images that were interfering with the reading. I would like to also say that I have added the 2 black boxes to see if the images behind them were causing the issue, but I still get the same issue.
The missing knowledge is page-segmentation-mode (psm). You need to use them, when you can't get the desired result.
If we look at your image, the only artifacts are the black columns. Other than that, the image looks like a binary image. Suitable for tesseract to recognize the characters and the digits.
Lets try reading the image by setting the psm to 6.
6 Assume a single uniform block of text.
print(pytesseract.image_to_string(img, config="--psm 6")
The result will be:
Total Kills: 75,230,550
Kill Details: (recorded after 2019/10/23)
993,161 331,129
1,380,450 33,265,533
5,031,168
Update
The second way to solve the problem is getting binary mask and applying OCR to the mask features.
Binary-mask
Features of the binary-mask
As we can see the result is slightly different from the input image. Now when we apply OCR result will be:
Total Kills: 75,230,550
Kill Details: (recorded after 2019/10/23)
993,161 331,129
1,380,450 33,265,533
5,031,168
Code:
import cv2
import numpy as np
import pytesseract
# Load the image
img = cv2.imread("LuKz3.jpg")
# Convert to hsv
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
# Get the binary mask
msk = cv2.inRange(hsv, np.array([0, 0, 0]), np.array([179, 255, 154]))
# Extract
krn = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 3))
dlt = cv2.dilate(msk, krn, iterations=5)
res = 255 - cv2.bitwise_and(dlt, msk)
# OCR
txt = pytesseract.image_to_string(res, config="--psm 6")
print(txt)
# Display
cv2.imshow("res", res)
cv2.waitKey(0)

Reading numbers using PyTesseract

I am trying to read numbers from images and cannot find a way to get it to work consistently (not all images have numbers). These are the images:
(here is the link to the album in case the images are not working)
This is the command I'm using to run tesseract on the images: pytesseract.image_to_string(image, timeout=2, config='--psm 13 --oem 3 -c tessedit_char_whitelist=0123456789'). I have tried multiple configurations, but this seems to work best.
As far as preprocessing goes, this works the best:
gray = cv2.cvtColor(np.array(img), cv2.COLOR_RGB2GRAY)
gray = cv2.bilateralFilter(gray, 11, 17, 17)
im_bw = cv2.threshold(gray, thresh, 255, cv2.THRESH_BINARY_INV)[1]
This works for all images except the 3rd one. To solve the problem of lines in the 3rd image, i tried getting the edges with cv2.Canny and a pretty large threshold which works, but when drawing them back, even though it gets more than 95% of each number's edges, tesseract does not read them correctly.
I have also tried resizing the image, using cv2.morphologyEx, blurring it etc. I cannot find a way to get it to work for each case.
Thank you.
cv2.resize has consistently worked for me with INTER_CUBIC interpolation.
Adding this last step to pre-processing would most likely solve your problem.
im_bw_scaled = cv2.resize(im_bw, (0, 0), fx=4, fy=4, interpolation=cv2.INTER_CUBIC)
You could play around with the scale. I have used '4' above.
EDIT:
The following code worked with your images very well, even special characters. Please try it out with the rest of your dataset. Scaling, OTSU and erosion was the best combination.
import cv2
import numpy
import pytesseract
pytesseract.pytesseract.tesseract_cmd = "<path to tesseract.exe>"
# Page segmentation mode, PSM was changed to 6 since each page is a single uniform text block.
custom_config = r'--psm 6 --oem 3 -c tessedit_char_whitelist=0123456789'
# load the image as grayscale
img = cv2.imread("5.png",cv2.IMREAD_GRAYSCALE)
# Change all pixels to black, if they aren't white already (since all characters were white)
img[img != 255] = 0
# Scale it 10x
scaled = cv2.resize(img, (0,0), fx=10, fy=10, interpolation = cv2.INTER_CUBIC)
# Retained your bilateral filter
filtered = cv2.bilateralFilter(scaled, 11, 17, 17)
# Thresholded OTSU method
thresh = cv2.threshold(filtered, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]
# Erode the image to bulk it up for tesseract
kernel = numpy.ones((5,5),numpy.uint8)
eroded = cv2.erode(thresh, kernel, iterations = 2)
pre_processed = eroded
# Feed the pre-processed image to tesseract and print the output.
ocr_text = pytesseract.image_to_string(pre_processed, config=custom_config)
if len(ocr_text) != 0:
print(ocr_text)
else: print("No string detected")

Proper image thresholding to prepare it for OCR in python using opencv

I am really new to opencv and a beginner to python.
I have this image:
I want to somehow apply proper thresholding to keep nothing but the 6 digits.
The bigger picture is that I intend to try to perform manual OCR to the image for each digit separately, using the k-nearest neighbours algorithm on a per digit level (kNearest.findNearest)
The problem is that I cannot clean up the digits sufficiently, especially the '7' digit which has this blue-ish watermark passing through it.
The steps I have tried so far are the following:
I am reading the image from disk
# IMREAD_UNCHANGED is -1
image = cv2.imread(sys.argv[1], cv2.IMREAD_UNCHANGED)
Then I'm keeping only the blue channel to get rid of the blue watermark around digit '7', effectively converting it to a single channel image
image = image[:,:,0]
# openned with -1 which means as is,
# so the blue channel is the first in BGR
Then I'm multiplying it a bit to increase contrast between the digits and the background:
image = cv2.multiply(image, 1.5)
Finally I perform Binary+Otsu thresholding:
_,thressed1 = cv2.threshold(image,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
As you can see the end result is pretty good except for the digit '7' which has kept a lot of noise.
How to improve the end result? Please supply the image example result where possible, it is better to understand than just code snippets alone.
You can try to medianBlur the gray(blur) image with different kernels(such as 3, 51), divide the blured results, and threshold it. Something like this:
#!/usr/bin/python3
# 2018/09/23 17:29 (CST)
# (中秋节快乐)
# (Happy Mid-Autumn Festival)
import cv2
import numpy as np
fname = "color.png"
bgray = cv2.imread(fname)[...,0]
blured1 = cv2.medianBlur(bgray,3)
blured2 = cv2.medianBlur(bgray,51)
divided = np.ma.divide(blured1, blured2).data
normed = np.uint8(255*divided/divided.max())
th, threshed = cv2.threshold(normed, 100, 255, cv2.THRESH_OTSU)
dst = np.vstack((bgray, blured1, blured2, normed, threshed))
cv2.imwrite("dst.png", dst)
The result:
Why not just keep values in the image that are above a certain threshold?
Like this:
import cv2
import numpy as np
img = cv2.imread("./a.png")[:,:,0] # the last readable image
new_img = []
for line in img:
new_img.append(np.array(list(map(lambda x: 0 if x < 100 else 255, line))))
new_img = np.array(list(map(lambda x: np.array(x), new_img)))
cv2.imwrite("./b.png", new_img)
Looks great:
You could probably play with the threshold even more and get better results.
It doesn't seem easy to completely remove the annoying stamp.
What you can do is flattening the background intensity by
computing a lowpass image (Gaussian filter, morphological closing); the filter size should be a little larger than the character size;
dividing the original image by the lowpass image.
Then you can use Otsu.
As you see, the result isn't perfect.
I tried a slightly different approach then Yves on the blue channel:
Apply median filter (r=2):
Use Edge detection (e.g. Sobel operator):
Automatic thresholding (Otsu)
Closing of the image
This approach seems to make the output a little less noisy. However, one has to address the holes in the numbers. This can be done by detecting black contours which are completely surrounded by white pixels and simply filling them with white.

Remove features from binarized image

I wrote a little script to transform pictures of chalkboards into a form that I can print off and mark up.
I take an image like this:
Auto-crop it, and binarize it. Here's the output of the script:
I would like to remove the largest connected black regions from the image. Is there a simple way to do this?
I was thinking of eroding the image to eliminate the text and then subtracting the eroded image from the original binarized image, but I can't help thinking that there's a more appropriate method.
Sure you can just get connected components (of certain size) with findContours or floodFill, and erase them leaving some smear. However, if you like to do it right you would think about why do you have the black area in the first place.
You did not use adaptive thresholding (locally adaptive) and this made your output sensitive to shading. Try not to get the black region in the first place by running something like this:
Mat img = imread("desk.jpg", 0);
Mat img2, dst;
pyrDown(img, img2);
adaptiveThreshold(255-img2, dst, 255, ADAPTIVE_THRESH_MEAN_C,
THRESH_BINARY, 9, 10); imwrite("adaptiveT.png", dst);
imshow("dst", dst);
waitKey(-1);
In the future, you may read something about adaptive thresholds and how to sample colors locally. I personally found it useful to sample binary colors orthogonally to the image gradient (that is on the both sides of it). This way the samples of white and black are of equal size which is a big deal since typically there are more background color which biases estimation. Using SWT and MSER may give you even more ideas about text segmentation.
I tried this:
import numpy as np
import cv2
im = cv2.imread('image.png')
gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
grayout = 255*np.ones((im.shape[0],im.shape[1],1), np.uint8)
blur = cv2.GaussianBlur(gray,(5,5),1)
thresh = cv2.adaptiveThreshold(blur,255,1,1,11,2)
contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)
wcnt = 0
for item in contours:
area =cv2.contourArea(item)
print wcnt,area
[x,y,w,h] = cv2.boundingRect(item)
if area>10 and area<200:
roi = gray[y:y+h,x:x+w]
cntd = 0
for i in range(x,x+w):
for j in range(y,y+h):
if gray[j,i]==0:
cntd = cntd + 1
density = cntd/(float(h*w))
if density<0.5:
for i in range(x,x+w):
for j in range(y,y+h):
grayout[j,i] = gray[j,i];
wcnt = wcnt + 1
cv2.imwrite('result.png',grayout)
You have to balance two things, removing the black spots but balance that with not losing the contents of what is on the board. The output I got is this:
Here is a Python numpy implementation (using my own mahotas package) of the method for the top answer (almost the same, I think):
import mahotas as mh
import numpy as np
Imported mahotas & numpy with standard abbreviations
im = mh.imread('7Esco.jpg', as_grey=1)
Load the image & convert to gray
im2 = im[::2,::2]
im2 = mh.gaussian_filter(im2, 1.4)
Downsample and blur (for speed and noise removal).
im2 = 255 - im2
Invert the image
mean_filtered = mh.convolve(im2.astype(float), np.ones((9,9))/81.)
Mean filtering is implemented "by hand" with a convolution.
imc = im2 > mean_filtered - 4
You might need to adjust the number 4 here, but it worked well for this image.
mh.imsave('binarized.png', (imc*255).astype(np.uint8))
Convert to 8 bits and save in PNG format.

Categories