Reading numbers using PyTesseract - python

I am trying to read numbers from images and cannot find a way to get it to work consistently (not all images have numbers). These are the images:
(here is the link to the album in case the images are not working)
This is the command I'm using to run tesseract on the images: pytesseract.image_to_string(image, timeout=2, config='--psm 13 --oem 3 -c tessedit_char_whitelist=0123456789'). I have tried multiple configurations, but this seems to work best.
As far as preprocessing goes, this works the best:
gray = cv2.cvtColor(np.array(img), cv2.COLOR_RGB2GRAY)
gray = cv2.bilateralFilter(gray, 11, 17, 17)
im_bw = cv2.threshold(gray, thresh, 255, cv2.THRESH_BINARY_INV)[1]
This works for all images except the 3rd one. To solve the problem of lines in the 3rd image, i tried getting the edges with cv2.Canny and a pretty large threshold which works, but when drawing them back, even though it gets more than 95% of each number's edges, tesseract does not read them correctly.
I have also tried resizing the image, using cv2.morphologyEx, blurring it etc. I cannot find a way to get it to work for each case.
Thank you.

cv2.resize has consistently worked for me with INTER_CUBIC interpolation.
Adding this last step to pre-processing would most likely solve your problem.
im_bw_scaled = cv2.resize(im_bw, (0, 0), fx=4, fy=4, interpolation=cv2.INTER_CUBIC)
You could play around with the scale. I have used '4' above.
EDIT:
The following code worked with your images very well, even special characters. Please try it out with the rest of your dataset. Scaling, OTSU and erosion was the best combination.
import cv2
import numpy
import pytesseract
pytesseract.pytesseract.tesseract_cmd = "<path to tesseract.exe>"
# Page segmentation mode, PSM was changed to 6 since each page is a single uniform text block.
custom_config = r'--psm 6 --oem 3 -c tessedit_char_whitelist=0123456789'
# load the image as grayscale
img = cv2.imread("5.png",cv2.IMREAD_GRAYSCALE)
# Change all pixels to black, if they aren't white already (since all characters were white)
img[img != 255] = 0
# Scale it 10x
scaled = cv2.resize(img, (0,0), fx=10, fy=10, interpolation = cv2.INTER_CUBIC)
# Retained your bilateral filter
filtered = cv2.bilateralFilter(scaled, 11, 17, 17)
# Thresholded OTSU method
thresh = cv2.threshold(filtered, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]
# Erode the image to bulk it up for tesseract
kernel = numpy.ones((5,5),numpy.uint8)
eroded = cv2.erode(thresh, kernel, iterations = 2)
pre_processed = eroded
# Feed the pre-processed image to tesseract and print the output.
ocr_text = pytesseract.image_to_string(pre_processed, config=custom_config)
if len(ocr_text) != 0:
print(ocr_text)
else: print("No string detected")

Related

Unable to decode Aztec barcode

I'm trying to decode an Aztec barcode using the following script:
import zxing
reader = zxing.BarCodeReader()
barcode = reader.decode("test.png")
print(barcode)
Here is the input image:
Following is the output:
BarCode(raw=None, parsed=None,
path='/Users/dhiwatdg/Desktop/test.png', format=None, type=None,
points=None)
I'm sure it a valid Aztec barcode. Not the script is not able to decode.
The barcode is not detected because there is a strong "ringing" artifact around the black bars (could be result of "defocus" issue).
We may apply binary threshold for getting higher contrast between black and while.
Finding the correct threshold is difficult...
For finding the correct threshold, we may start with the automatic threshold value returned by cv2.THRESH_OTSU, and increasing the threshold until the barcode is detected.
(Note that need to increase [and not decrease] the threshold is specific to the image above).
Note:
The suggested solution is a bit of an overfit, and it's not very efficient.
Other solution I tried, like sharpening were not working...
Code sample:
import cv2
import zxing
reader = zxing.BarCodeReader()
img = cv2.imread('test.png', cv2.IMREAD_GRAYSCALE) # Read the image as grayscale.
thresh, bw = cv2.threshold(img, 0, 255, cv2.THRESH_OTSU) # Apply automatic binary thresholding.
while thresh < 245:
print(f'thresh = {thresh}')
cv2.imshow('bw', bw) # Show bw image for testing
cv2.waitKey(1000) # Wait 1 second (for testing)
cv2.imwrite('test1.png', bw) # Save bw as input image to the barcode reader.
barcode = reader.decode("test1.png", try_harder=True, possible_formats=['AZTEC'], pure_barcode=True) # Try to read the barcode
if barcode.type is not None:
break # Break the loop when barcode is detected.
thresh += 10 # Increase the threshold in steps of 10
thresh, bw = cv2.threshold(img, thresh, 255, cv2.THRESH_BINARY) # Apply binary threshold
cv2.imwrite('bw.png', bw) # Save bw for debugging
cv2.destroyAllWindows()
print(barcode)
Last value of thresh = 164.
Last bw image:
#Rotem's solution works perfectly. I also found the below solution workable by applying Wiener filter.
import cv2
import zxing
img = cv2.imread("test.png")
dst = cv2.fastNlMeansDenoisingColored(img, None, 10, 10, 7, 21)
cv2.imwrite("test_tmp.png", dst)
reader = zxing.BarCodeReader()
barcode = reader.decode("test_tmp.png")
print(barcode)

Python Tesseract OCR isn't properly detecting both numbers

This is the image I'm using. This is the output that was returned with print(pytesseract.image_to_string(img)):
#& [COE] daKizz
6 Grasslands
Q Badlands
#g Swamp
VALUE: # 7,615,; HIRE)
I've tried using the config second parameter (from psm 0 - 13, and "digits") but it doesn't make much difference. I'm considering looking into training tesseract to recognize the type of numbers in an image like that but I'm hoping there are other options.
I've also tried cleaning up the picture but I'm getting the same output
How can I fix this?
I guess you are missing the image processing part.
From the documentation "Improving the quality of the output" you may be interested in Image Processing section.
One way of reaching the desired numbers is inRange thresholding
To apply in-range, we first need to convert the image in hsv colorspace. Since we want the image between the range of pixels.
The result will be:
Now if you read it with "digits" option:
53.8.
53.8.
453.3.
7.615.725.834
One possible question is: Why inRange threshold?
Well, if you try with the other methods (simple or adaptive) you will see inRange is more accurate for the given input image.
Another possible question is: Can we use the values of the cv2.inRange method for other images.
You may not get the desired result. For other images you need to find other range values.
Code:
import cv2
import pytesseract
import numpy as np
bgr_image = cv2.imread("vNYcf.jpg")
hsv_image = cv2.cvtColor(bgr_image, cv2.COLOR_BGR2HSV)
mask = cv2.inRange(hsv_image, np.array([0, 180, 218]), np.array([60, 255, 255]))
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 3))
dilate = cv2.dilate(mask, kernel, iterations=1)
thresh = 255 - cv2.bitwise_and(dilate, mask)
text = pytesseract.image_to_string(thresh, config="--psm 6 digits")
print(text)

Pytesseract Image to String issue

Does anyone know how I can get these results better?
Total Kills: 15,230,550
Kill Details: (recorded after 2019/10,/Z3]
993,151 331,129
1,330,450 33,265,533
5,031,168
This is what it returns however it is meant to be the same as the image posted below, I am new to python so are there any parameters that I can add to make it read the image better?
img = cv2.imread("kills.jpeg")
text = pytesseract.image_to_string(img)
print(text)
This is my code to read the image, Is there anything I can add to make it read better? Also, the black boxes are to cover images that were interfering with the reading. I would like to also say that I have added the 2 black boxes to see if the images behind them were causing the issue, but I still get the same issue.
The missing knowledge is page-segmentation-mode (psm). You need to use them, when you can't get the desired result.
If we look at your image, the only artifacts are the black columns. Other than that, the image looks like a binary image. Suitable for tesseract to recognize the characters and the digits.
Lets try reading the image by setting the psm to 6.
6 Assume a single uniform block of text.
print(pytesseract.image_to_string(img, config="--psm 6")
The result will be:
Total Kills: 75,230,550
Kill Details: (recorded after 2019/10/23)
993,161 331,129
1,380,450 33,265,533
5,031,168
Update
The second way to solve the problem is getting binary mask and applying OCR to the mask features.
Binary-mask
Features of the binary-mask
As we can see the result is slightly different from the input image. Now when we apply OCR result will be:
Total Kills: 75,230,550
Kill Details: (recorded after 2019/10/23)
993,161 331,129
1,380,450 33,265,533
5,031,168
Code:
import cv2
import numpy as np
import pytesseract
# Load the image
img = cv2.imread("LuKz3.jpg")
# Convert to hsv
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
# Get the binary mask
msk = cv2.inRange(hsv, np.array([0, 0, 0]), np.array([179, 255, 154]))
# Extract
krn = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 3))
dlt = cv2.dilate(msk, krn, iterations=5)
res = 255 - cv2.bitwise_and(dlt, msk)
# OCR
txt = pytesseract.image_to_string(res, config="--psm 6")
print(txt)
# Display
cv2.imshow("res", res)
cv2.waitKey(0)

I want to increase brightness and contrast of images in dynamic way so that the program is applicable for any new images

I have few images where I need to increase or decrease the contrast and brightness of the image in a dynamic way so that it is visible clearly. And the program needs to be dynamic so that it even works for new images also. I also want character should be dark.
I was able to increase brightness and contrast but it is not working properly for each image.
import cv2
import numpy as np
img = cv2.imread('D:\Bright.png')
image = cv2.GaussianBlur(img, (5, 5), 0)
#image = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY)[1]
#kernel = np.ones((2,1),np.uint8)
#dilation = cv2.dilate(img,kernel)
cv2.imshow('test', image)
cv2.waitKey(0)
cv2.destroyAllWindows()
imghsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
imghsv[:,:,2] = [[max(pixel - 25, 0) if pixel < 190 else min(pixel + 25, 255) for pixel in row] for row in imghsv[:,:,2]]
cv2.imshow('contrast', cv2.cvtColor(imghsv, cv2.COLOR_HSV2BGR))
#cv2.imwrite('D:\\112.png',cv2.cvtColor(imghsv, cv2.COLOR_HSV2BGR))
cv2.waitKey(0)
cv2.destroyAllWindows()
#raw_input()
I want a program which works fine for every image and words are a little darker so that they are easily visible.
As Tilarion suggested, you could try "Auto Brightness And Contrast" to see if it works well. The theory behind this is explained well here in the solution section. The solution is in C++. I've written a version of it in python which you can directly use, works only on 1 channel at a time for colour images:
def auto_brightandcontrast(input_img, channel, clip_percent=1):
histSize=180
alpha=0
beta=0
minGray=0
maxGray=0
accumulator=[]
if(clip_percent==0):
#min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(hist)
return input_img
else:
hist = cv2.calcHist([input_img],[channel],None,[256],[0, 256])
accumulator.insert(0,hist[0])
for i in range(1,histSize):
accumulator.insert(i,accumulator[i-1]+hist[i])
maxx=accumulator[histSize-1]
minGray=0
clip_percent=clip_percent*(maxx/100.0)
clip_percent=clip_percent/2.0
while(accumulator[minGray]<clip_percent[0]):
minGray=minGray+1
maxGray=histSize-1
while(accumulator[maxGray]>=(maxx-clip_percent[0])):
maxGray=maxGray-1
inputRange=maxGray-minGray
alpha=(histSize-1)/inputRange
beta=-minGray*alpha
out_img=input_img.copy()
cv2.convertScaleAbs(input_img,out_img,alpha,beta)
return out_img
It is a very few lines of code to do it in Python Wand (which is based upon ImageMagick). Here is a script.
#!/bin/python3.7
from wand.image import Image
with Image(filename='task4.jpg') as img:
img.contrast_stretch(black_point=0.02, white_point=0.99)
img.save(filename='task4_stretch2_99.jpg')
Input:
Result:
Increase the black point value to make the text darker and/or decrease the white point value to make the lighter parts brighter.
Thanks to Eric McConville (the Wand developer) for correcting my arguments to make the code work.

How to implement imbinarize in OpenCV

I developed script in Matlab which is analysing engraved text on a colour steal. I'm using range of morphological techniques to extract the text and read it with OCR. I need to implement it on Raspberry Pi therefore I decided to transfer my Matlab code into OpenCV (in python). I tried to transfer some methods and they work similarly but how do I implement imreconstruct and imbinarize (shown below) to OpenCV? (the challenge here is appropriate differentiate foreground and background).
Maybe I should try adding grabCut or getStructuringElement or morphologyEx or dilate? I tried them in range of combinations but have not found a perfect solution.
I will put the whole script for both if anyone could give me suggestions on how to generally improve this extraction and accuracy of OCR process I would greatly appreciate it.
Based on bin values of grey-scale image. I change some parameters in
those functions:
Matlab:
se = strel('disk', 300);
img = imtophat(img, se);
maker = imerode(img, strel('line',100,0)); %for whiter ones
maker = imerode(img, strel('line',85,0)); %for medium
maker = imerode(img, strel('line',5,0));
imgClear = imreconstruct(maker, img);
imgBlur = imgaussfilt(imgClear,1); %less blur for whiter frames
BW = imbinarize(imgBlur,'adaptive','ForegroundPolarity','Bright',...
'Sensitivity',0.7); %process for medium
BW = imbinarize(imgBlur, 'adaptive', 'ForegroundPolarity',...
'Dark', 'Sensitivity', 0.4); % process for black and white
res = ocr(BW, 'CharacterSet', '0123456789', 'TextLayout', 'Block');
res.Text;
OpenCv
kernel = numpy.ones((5,5),numpy.uint8)
blur = cv2.GaussianBlur(img,(5,5),0)
erosion = cv2.erode(blur,kernel,iterations = 1)
opening = cv2.morphologyEx(erosion, cv2.MORPH_OPEN, kernel)
#bremove = cv2.grabCut(opening,mask,rect,bgdModelmode==GC_INIT_WITH_MASK)
#th3 = cv2.adaptiveThreshold(opening,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU,11,2)
ret, thresh= cv2.threshold(opening,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
ocr = pytesseract.image_to_string(Image.open('image2.png'),config='stdout -c tessedit_char_whitelist=0123456789')
Here is the input image:
I am surprised at how much difference between matlab and opencv there is when they both appear to use the same algorithm. Why do you run imbinarize twice? What does the sensitivity keyword actually do (mathematically, behind the background). Because they obviously have several steps more than just the bare OTSU.
import cv2
import numpy as np
import matplotlib.pyplot as plt
def show(img):
plt.imshow(img, cmap="gray")
plt.show()
img = cv2.imread("letters.jpg", cv2.IMREAD_GRAYSCALE)
kernel = np.ones((3,3), np.uint8)
blur = cv2.GaussianBlur(img,(3,3), 0)
erosion = cv2.erode(blur, kernel, iterations=3)
opening = cv2.dilate(erosion, kernel)
th3 = cv2.adaptiveThreshold(opening, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY, 45, 2)
show(th3)
kernel2 = cv2.getGaussianKernel(6, 2) #np.ones((6,6))
kernel2 = np.outer(kernel2, kernel2)
th3 = cv2.dilate(th3, kernel2)
th3 = cv2.erode(th3, kernel)
show(th3)
The images that get displayed are:
After a bit of cleaning up:
So all in all not the same and certainly not as nice as matlab. But the basic principle seems the same, it's just that the numbers need playing with.
A better approach would probably be to do a threshold by the mean of the image and then use the output of that as a mask to adaptive threshold the original image. Hopefully then the results would be better than both opencv and matlab.
Try doing it with ADAPTIVE_THRESH_MEAN_C you can get some really nice results but there's more trash lying around. Again, maybe if you can use it as a mask to isolate the text and then do tresholding again it might turn out to be better. Also the shape of the erosion and dilation kernels will make a big difference here.
I worked out the code to have a positive result based on your engraved text sample.
import cv2
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
def show(img):
plt.imshow(img, cmap="gray")
plt.show()
# load the input image
img = cv2.imread('./imagesStackoverflow/engraved_text.jpg',0);
show(img)
ret, mask = cv2.threshold(img, 60, 120, cv2.THRESH_BINARY) # turn 60, 120 for the best OCR results
kernel = np.ones((5,3),np.uint8)
mask = cv2.erode(mask,kernel,iterations = 1)
show(mask)
# I used a version of OpenCV with Tesseract, you may use your pytesseract and set the modes as:
# OCR Enginer Mode (OEM) = 3 (defualt = 3)
# Page Segmentation mode (PSmode) = 11 (defualt = 3)
tesser = cv2.text.OCRTesseract_create('C:/Program Files/Tesseract 4.0.0/tessdata/','eng','0123456789',11,3)
retval = tesser.run(mask, 0) # return string type
print 'OCR:' + retval
Processed image and OCR output:
It would be great if you can feedback your test results with more sample images.
opencvpythontesseractocr
What I can see from your code is you have used tophat filtering in your Matlab code as the first step. However, I couldn't see the same in your python OpenCV code.
Python has built in tophat filter try applying that for getting similar result
kernel = np.ones((5,5),np.uint8)
tophat = cv2.morphologyEx(img, cv2.MORPH_TOPHAT, kernel)
Also, try using CLAHE it gives better contrast to your image and then apply blackhat to filter out small details.
clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8,8))
cl1 = clahe.apply(img)
I have got better results by applying these transformations.
Tried below, it works to recognize the lighter engraved text sample. Hope it helps.
def show(img):
plt.imshow(img, cmap="gray")
plt.show()
# load the input image
img = cv2.imread('./imagesStackoverflow/engraved_text2.jpg',0);
show(img)
# apply CLAHE to adjust the contrast
clahe = cv2.createCLAHE(clipLimit=5.1, tileGridSize=(5,3))
cl1 = clahe.apply(img)
img = cl1.copy()
show(img)
img = cv2.GaussianBlur(img,(3,3), 1)
ret, mask = cv2.threshold(img, 125, 150, cv2.THRESH_BINARY) # turn 125, 150 for the best OCR results
kernel = np.ones((5,3),np.uint8)
mask = cv2.erode(mask,kernel,iterations = 1)
show(mask)
# I used a version of OpenCV with Tesseract, you may use your pytesseract and set the modes as:
# Page Segmentation mode (PSmode) = 11 (defualt = 3)
# OCR Enginer Mode (OEM) = 3 (defualt = 3)
tesser = cv2.text.OCRTesseract_create('C:/Program Files/Tesseract 4.0.0/tessdata/','eng','0123456789',11,3)
retval = tesser.run(mask, 0) # return string type
print 'OCR:' + retval

Categories