I have these two images:
the first one has clearly an higher quality then the second one (even if it hasn't such a bad quality). I process the two images with OpenCV in order to read the text with Tesseract like that:
import tesseract
import cv2
img = cv2.cvtColor(scr_crop, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(img, 220, 255, cv2.THRESH_BINARY)[1]
# Create custom kernel
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))
# Perform closing (dilation followed by erosion)
close = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel)
# Invert image to use for Tesseract
result = 255 - close
# result = cv2.resize(result, (0, 0), fx=2, fy=2)
text = pytesseract.image_to_string(result, lang="ita")
So I perform first a dilation and then an erosion for the gray-scaled versions of the two images obtaining these two results for the two images
So, as you can see, for the first image I obtain a great result and tessaract is able to read the text while I obtain a bad result for the second image and tesseract is not able to read the text. How can I improve the quality of the second image in order to obtain a better result for tesseract?
For the second image, just apply only thresholding with different threshold types.
Instead of cv2.THRESH_BINARY, use cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU
Image will become:
and if you read:
txt = pytesseract.image_to_string(threshold)
print(txt)
Result will be:
Esiti Positivi: 57
Esiti Negativi: 1512
Numerosita: 1569
Tasso di Conversione: 3.63%
Now what does cv2.THRESH_BINARY_INV and cv2.THRESH_OTSU means?
cv2.THRESH_BINARY_INV is the opposite operation of the cv2.THRESH_BINARY if the current pixel value is greater than the threshold set to the 0. maxval ((255 in our case), otherwise.
source
cv2.THRESH_OTSU finds the optimal threshold value using the OTSU's algorithm. [page 3]
Code:
import cv2
import pytesseract
img = cv2.imread("c7xq9.png")
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.threshold(gry, 220, 255, cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)[1]
txt = pytesseract.image_to_string(thr)
print(txt)
cv2.imshow("thr", thr)
cv2.waitKey(0)
Related
I have been working on project which involves extracting text from an image. I have researched that tesseract is one of the best libraries available and I decided to use the same along with opencv. Opencv is needed for image manipulation.
I have been playing a lot with tessaract engine and it does not seems to be giving the expected results to me. I have attached the image as an reference. Output I got is:
1] =501 [
Instead, expected output is
TM10-50%L
What I have done so far:
Remove noise
Adaptive threshold
Sending it tesseract ocr engine
Are there any other suggestions to improve the algorithm?
Thanks in advance.
Snippet of the code:
import cv2
import sys
import pytesseract
import numpy as np
from PIL import Image
if __name__ == '__main__':
if len(sys.argv) < 2:
print('Usage: python ocr_simple.py image.jpg')
sys.exit(1)
# Read image path from command line
imPath = sys.argv[1]
gray = cv2.imread(imPath, 0)
# Blur
blur = cv2.GaussianBlur(gray,(9,9), 0)
# Binarizing
thres = cv2.adaptiveThreshold(blur, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 5, 3)
text = pytesseract.image_to_string(thresh)
print(text)
Images attached.
First image is original image. Original image
Second image is what has been fed to tessaract. Input to tessaract
Before performing OCR on an image, it's important to preprocess the image. The idea is to obtain a processed image where the text to extract is in black with the background in white. For this specific image, we need to obtain the ROI before we can OCR.
To do this, we can convert to grayscale, apply a slight Gaussian blur, then adaptive threshold to obtain a binary image. From here, we can apply morphological closing to merge individual letters together. Next we find contours, filter using contour area filtering, and then extract the ROI. We perform text extraction using the --psm 6 configuration option to assume a single uniform block of text. Take a look here for more options.
Detected ROI
Extracted ROI
Result from Pytesseract OCR
TM10=50%L
Code
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
# Grayscale, Gaussian blur, Adaptive threshold
image = cv2.imread('1.jpg')
original = image.copy()
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (3,3), 0)
thresh = cv2.adaptiveThreshold(blur, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 5, 5)
# Perform morph close to merge letters together
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5))
close = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel, iterations=3)
# Find contours, contour area filtering, extract ROI
cnts, _ = cv2.findContours(close, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)[-2:]
for c in cnts:
area = cv2.contourArea(c)
if area > 1800 and area < 2500:
x,y,w,h = cv2.boundingRect(c)
ROI = original[y:y+h, x:x+w]
cv2.rectangle(image, (x, y), (x + w, y + h), (36,255,12), 3)
# Perform text extraction
ROI = cv2.GaussianBlur(ROI, (3,3), 0)
data = pytesseract.image_to_string(ROI, lang='eng', config='--psm 6')
print(data)
cv2.imshow('ROI', ROI)
cv2.imshow('close', close)
cv2.imshow('image', image)
cv2.waitKey()
I am using pytesseract to read number from the screen in real-time.
The image mostly number, dot and 2 letters (M and R) as below.
In real-time number will keep changing but the letter M and R will stay the same place.
Background will always green with black letters.
As you can see the number on image is very clear but the pytesseract read the number and the result is not really satisfy. Sometime its read 7 become 1.
I would like to find the algorithms that help improce OCR result.
Currently I am using Pillow to convert image to gray scale and also try resize image bigger or smaller but still improve result much. Also applied filter on the image as below but result still not 100% correct.
img = cv2.imread('screenshot.png')
img = cv2.resize(img, None, fx=scale_factor, fy=scale_factor, interpolation=cv2.INTER_CUBIC)
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img = cv2.threshold(cv2.bilateralFilter(img, 5, 75, 75), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
text = tess.image_to_string(img)
Please help suggest any algorithms that will help improve this OCR result.
You can easily detect applying simple-thresholding
Threshold
Result
3845.86 M51.31 M 309.12 3860.43 R191.90 R23.44
Thresholding will show the features of the image.
Code:
import cv2
import pytesseract
img = cv2.imread("UEWHj.png")
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.threshold(gry, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
txt = pytesseract.image_to_string(thr)
print(txt)
cv2.imshow("thr", thr)
cv2.waitKey(0)
I have a cropped image and I am trying to get the numbers on that cropped image
Here's the code I am using
image = cv2.imread('Cropped.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (3,3), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=1)
invert = 255 - opening
data = pytesseract.image_to_string(invert, lang='eng', config='--psm 6')
print(data)
Here's the sample cropped image
All what I got some numbers and not all of them. How to enhance such an image to be able to extract only the numbers?
I tried the code on this image but doesn't return correct numbers
You can easily solve this with three-main steps
Upsampling
Applying simple-threshold
set configuration to digits
Upsampling for accurate recognition. Otherwise tesseract may misterpret the digits.
Threshold Displays only the features of the image.
**Configuration Setting will recognize the digits
Result
Upsampling
Threshold
Pytesseract
277032200746
Code:
import cv2
import pytesseract
img1 = cv2.imread("kEpyN.png") # "FX2in.png"
gry1 = cv2.cvtColor(img1, cv2.COLOR_BGR2GRAY)
(h, w) = gry1.shape[:2]
gry1 = cv2.resize(gry1, (w*2, h*2))
thr1 = cv2.threshold(gry1, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
txt1 = pytesseract.image_to_string(thr1, config="digits")
print("".join(t for t in txt1 if t.isalnum()))
cv2.imshow("thr1", thr1)
cv2.waitKey(0)
Update:
Most-probably a version mismatch causes extra words and digits.
One-way to solving is taking a range of the image
For instance, from the thresholded image:
(h_thr, w_thr) = thr1.shape[:2]
thr1 = thr1[0:h_thr-10, int(w_thr/2)-400:int(w_thr/2)+200]
Result will be:
Now if you read, result should be like this output
277032200746
I've tried to binarize passport images for OCR using following steps :
img = cv2.medianBlur(nid_aligned_image,3)
img = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
this methods works well for better background image but not given type of images.
Here is the output and OCR can't read this
Can anyone suggest me a better approch ?
My approach for the problem is:
1- Apply adaptive thresholding
2- Apply Morphological Transformation
3- Apply bitwise operation
Step 1: Adaptive Threshold
From the documentation:
if an image has different lighting conditions in different areas. In that case, adaptive thresholding can help. Here, the algorithm determines the threshold for a pixel based on a small region around it. So we get different thresholds for different regions of the same image which gives better results for images with varying illumination.
To summarize: when a global value used as a threshold is not performing well, you will use adaptive thresholding.
img2 = cv2.imread("BESFs.png")
gry2 = cv2.cvtColor(img2, cv2.COLOR_BGR2GRAY)
flt = cv2.adaptiveThreshold(gry2,
100, cv2.ADAPTIVE_THRESH_MEAN_C,
cv2.THRESH_BINARY, 13, 16)
Result:
Step 2: Morphological Transformation
From the documentation:
It needs two inputs, one is our original image, second one is called structuring element or kernel which decides the nature of operation
We need to define a kernel (filter) for processing image.
krn = np.ones((3, 3), np.uint8)
We will use opening and closing:
Opening is just another name of erosion followed by dilation. It is useful in removing noise
Closing is reverse of Opening, Dilation followed by Erosion. It is useful in closing small holes inside the foreground objects, or small black points on the object.
opn = cv2.morphologyEx(flt, cv2.MORPH_OPEN, krn)
cls = cv2.morphologyEx(opn, cv2.MORPH_CLOSE, krn)
Step 3: Bitwise Operation
From the documentation
They will be highly useful while extracting any part of the image
gry2 = cv2.bitwise_or(gry2, cls)
Result:
Now if we use pytesseract for extracting the text
txt = pytesseract.image_to_string(gry2)
txt = txt.rstrip().split('\n\n')[1].split(' ')[1]
print("Passport number: {}".format(txt))
Result:
Passport number: BC0874168
Optional
For your future OCR problem, you can try to enhance the image resolution. For instance:
from PIL import Image
img = Image.open("BESFs.png")
h, w = img.size
fct = min(1, int(1024.0/h))
sz = int(fct * h), int(fct * w)
im_rsz = img.resize(sz, Image.ANTIALIAS)
im_rsz.save("out_dpi_300.png", dpi=(300, 300))
For this problem it has no effect, but maybe it may help you in the future.
Code for the problem:
import cv2
import pytesseract
import numpy as np
img2 = cv2.imread("BESFs.png")
gry2 = cv2.cvtColor(img2, cv2.COLOR_BGR2GRAY)
flt = cv2.adaptiveThreshold(gry2,
100, cv2.ADAPTIVE_THRESH_MEAN_C,
cv2.THRESH_BINARY, 13, 16)
krn = np.ones((3, 3), np.uint8)
opn = cv2.morphologyEx(flt, cv2.MORPH_OPEN, krn)
cls = cv2.morphologyEx(opn, cv2.MORPH_CLOSE, krn)
gry2 = cv2.bitwise_or(gry2, cls)
txt = pytesseract.image_to_string(gry2)
txt = txt.rstrip().split('\n\n')[1].split(' ')[1]
print("Passport number: {}".format(txt))
I having the following table area from the original image:
I'm trying extract the text,from this table.But when using threshold the whole gray regions get darkening.For example like below,
The threshold type which i did used,
thresh_value = cv2.threshold(original_gray, 128, 255, cv2.THRESH_BINARY_INV +cv2.THRESH_OTSU)[1]
is it possible to extract and change gray background into white and lets remain text pixel as it is if black then?
You should use adaptive thresholding in Python/OpenCV.
Input:
import cv2
import numpy as np
# read image
img = cv2.imread("text_table.jpg")
# convert img to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# do adaptive threshold on gray image
thresh = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 11, 11)
# write results to disk
cv2.imwrite("text_table_thresh.jpg", thresh)
# display it
cv2.imshow("thresh", thresh)
cv2.waitKey(0)
Result