I have the following situation from below. I have tried different filters from OpenCV such as: grayscale, resizing 3x, gaussian blur, erosion, unsharp mask but without any success. From tesseract I have used PSM 6, 7 and 8.
How do you suggest to preprocess the image in order to detect the correct text: H 25 FT ?
Important things to do are:
Use white for the background and black for characters font color.
Select desired tesseractpsm mode. In this case i use 7 psm mode to treat image as a single text line.
Try to use tessedit_char_whitelist config to specify only the characters that you are sarching for. In this case: H,2,5,F,T.
With that in mind, here is my code:
import cv2
import numpy as np
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract'
originalImage = cv2.imread('c.jpg')
grayImage = cv2.cvtColor(originalImage, cv2.COLOR_BGR2GRAY)
(thresh, blackAndWhiteImageOriginal) = cv2.threshold(grayImage, 127, 255, cv2.THRESH_BINARY_INV)
blackAndWhiteImage = cv2.erode(blackAndWhiteImageOriginal, np.ones((5,5), np.uint8))
ocr_output_details = pytesseract.image_to_data(blackAndWhiteImage, output_type=pytesseract.Output.DICT, config="--psm 7 -c tessedit_char_whitelist=H25FThft")
rgbImage = cv2.cvtColor(blackAndWhiteImage,cv2.COLOR_GRAY2RGB)
for i in range(len(ocr_output_details['level'])):
(x, y, w, h) = (ocr_output_details['left'][i], ocr_output_details['top'][i], ocr_output_details['width'][i], ocr_output_details['height'][i])
cv2.rectangle(rgbImage, (x, y), (x + w, y + h), (0,0,255), 2)
print('Text: ', ocr_output_details['text'])
cv2.imshow('Boxes', rgbImage)
And the result:
Also you can try to improve results using Tesseract documentation. Tesseract -Improving the quality of the output
I have been working on project which involves extracting text from an image. I have researched that tesseract is one of the best libraries available and I decided to use the same along with opencv. Opencv is needed for image manipulation.
I have been playing a lot with tessaract engine and it does not seems to be giving the expected results to me. I have attached the image as an reference. Output I got is:
1] =501 [
Instead, expected output is
What I have done so far:
Remove noise
Adaptive threshold
Sending it tesseract ocr engine
Are there any other suggestions to improve the algorithm?
Snippet of the code:
import cv2
import sys
import pytesseract
import numpy as np
from PIL import Image
if __name__ == '__main__':
if len(sys.argv) < 2:
print('Usage: python ocr_simple.py image.jpg')
# Read image path from command line
imPath = sys.argv[1]
gray = cv2.imread(imPath, 0)
# Blur
blur = cv2.GaussianBlur(gray,(9,9), 0)
# Binarizing
thres = cv2.adaptiveThreshold(blur, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 5, 3)
text = pytesseract.image_to_string(thresh)
Images attached.
First image is original image. Original image
Second image is what has been fed to tessaract. Input to tessaract
Before performing OCR on an image, it's important to preprocess the image. The idea is to obtain a processed image where the text to extract is in black with the background in white. For this specific image, we need to obtain the ROI before we can OCR.
To do this, we can convert to grayscale, apply a slight Gaussian blur, then adaptive threshold to obtain a binary image. From here, we can apply morphological closing to merge individual letters together. Next we find contours, filter using contour area filtering, and then extract the ROI. We perform text extraction using the --psm 6 configuration option to assume a single uniform block of text. Take a look here for more options.
Detected ROI
Extracted ROI
Result from Pytesseract OCR
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
# Grayscale, Gaussian blur, Adaptive threshold
image = cv2.imread('1.jpg')
original = image.copy()
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (3,3), 0)
thresh = cv2.adaptiveThreshold(blur, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 5, 5)
# Perform morph close to merge letters together
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5))
close = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel, iterations=3)
# Find contours, contour area filtering, extract ROI
cnts, _ = cv2.findContours(close, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)[-2:]
for c in cnts:
area = cv2.contourArea(c)
if area > 1800 and area < 2500:
x,y,w,h = cv2.boundingRect(c)
ROI = original[y:y+h, x:x+w]
cv2.rectangle(image, (x, y), (x + w, y + h), (36,255,12), 3)
# Perform text extraction
ROI = cv2.GaussianBlur(ROI, (3,3), 0)
data = pytesseract.image_to_string(ROI, lang='eng', config='--psm 6')
cv2.imshow('ROI', ROI)
cv2.imshow('close', close)
cv2.imshow('image', image)
I'm fairly new on OpenCv and tesseract. I'm recently building a project on using computer vision to detect door labels. Hopefully it would be beneficial for visually impaired group.
The idea of the program is to preprocess the input image by converting it into binary color, then use canny edge to detect the outlines of door label, then dilate the canny edge result. After these, feed image to tesseract while trying to show the text detected with boxes.
Expected results are green rectangles on text. While printing out the text itself.
The issue is the missing rectangles and failure in text detection.
I have tried going through these:
Recognize Text in images using Canny Edge detection in Opencv
OpenCv pytesseract for OCR
Image preprocessing with OpenCV before doing character recognition (tesseract)
The questions and solutions are either too simple or not as relevant. Some are not in python as well.
Attached below is my attempt on the code:
import pytesseract as pytess
import cv2 as cv
import numpy as np
from PIL import Image
from pytesseract import Output
img = cv.imread(r"C:\Users\User\Desktop\dataset\p\Image_31.jpg", 0)
# edges store the canny version of img
edges = cv.Canny(img, 100, 200)
# ker as in kernel
# (5, 5) is the matrix while uint8 is datatype
ker = np.ones((3, 3), np.uint8)
# dil as in dilation
# edges as the src, ker is the kernel we set above, number of dilation
dil = cv.dilate(edges, ker, iterations=1)
# setup pytesseract parameters
configs = r'--oem 3 --psm 6'
# feed image to tesseract
result = pytess.image_to_data(dil, output_type=Output.DICT, config=configs, lang='eng')
boxes = len(result['text'])
# make a new copy of edges
new_item = dil.copy()
for sequence_number in range(boxes):
if int(result['conf'][sequence_number]) > 30: # removed constraints
(x, y, w, h) = (result['left'][sequence_number], result['top'][sequence_number],
result['width'][sequence_number], result['height'][sequence_number])
new_item = cv.rectangle(new_item, (x, y), (x + w, y + h), (0, 255, 0), 2)
# detect sentence with tesseract
# pending as rectangle not achieved
cv.imshow("original", img)
cv.imshow("canny", edges)
cv.imshow("dilation", dil)
cv.imshow("capturedText", new_item)
#ignore below this line, it is only for testing
#testobj = Image.fromarray(dil)
#testtext = pytess.image_to_string(testobj, lang='eng')
Resultant image:
The testing part of the code return results as shown below:
Which, obviously does not satisfy the objective.
After posting the question, I realized I may have done it wrong in the beginning. I should attempt to use OpencV to detect the contour of the door label and isolate the part containing text before sending whatever is in the rectangle for OCR recognition.
Now that I identify the issue thanks to our stackoverflow members, now I'm attempting to add on image rectification/image wrapping technique to retrieve a straight front view to get a better accuracy for the system. Update soon.
After certain bug fixing, reducing the constraint while allowing the function to draw on the original image, I have achieved the results below. Attached the updated code as well.
import cv2 as cv
import numpy as np
import pytesseract as pytess
from pytesseract import Output
# input of img source
img = cv.imread(r"C:\Users\User\Desktop\dataset\p\Image_31.jpg")
# necessary image color conversion
img2 = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
# edges store the canny version of img
edges = cv.Canny(img2, 100, 200)
# ker as in kernel
# (5, 5) is the matrix while uint8 is datatype
ker = np.ones((3, 3), np.uint8)
# dil as in dilation
# edges as the src, ker is the kernel we set above, number of dilation
dil = cv.dilate(edges, ker, iterations=1)
# setup pytesseract parameters
configs = r'--oem 3 --psm 6'
# feed image to tesseract
result = pytess.image_to_data(dil, output_type=Output.DICT, config=configs, lang='eng')
# number of boxes that encapsulate the boxes
boxes = len(result['text'])
# make a new copy of edges
new_item = dil.copy()
for sequence_number in range(boxes):
if int(result['conf'][sequence_number]) > 0: #removed constraints
(x, y, w, h) = (result['left'][sequence_number], result['top'][sequence_number],
result['width'][sequence_number], result['height'][sequence_number])
# draw rectangle boxes on the original img
cv.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 3)
# Crop the image
crp = new_item[y:y + h, x:x + w]
txt = pytess.image_to_string(crp, config=configs)
# returns recognised text
cv.imshow("capturedText", crp)
# cv.imshow("original", img)
# cv.imshow("canny", edges)
# cv.imshow("dilation", dil)
cv.imshow("results", img)
You have found all the detected text in the image:
for sequence_number in range(boxes):
if int(result['conf'][sequence_number]) > 30:
(x, y, w, h) = (result['left'][sequence_number], result['top'][sequence_number],
result['width'][sequence_number], result['height'][sequence_number])
new_item = cv.rectangle(new_item, (x, y), (x + w, y + h), (0, 255, 0), 2)
But you also say the current confidence should be more than 70%.
If we remove the constraint
If we OCR each new item
Result will be:
Now if you read:
txt = pytesseract.image_to_string(new_item, config="--psm 6")
OCR will be:
Meeting Room ยง
The output of the current pytesseract version 0.3.7
# Load the libraries
import cv2
import pytesseract
# Load the image
img = cv2.imread("fsUSw.png")
# Convert it to the gray-scale
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# OCR detection
d = pytesseract.image_to_data(gry, config="--psm 6", output_type=pytesseract.Output.DICT)
# Get ROI part from the detection
n_boxes = len(d['level'])
# For each detected part
for i in range(1, 2):
# Get the localized region
(x, y, w, h) = (d['left'][i], d['top'][i], d['width'][i], d['height'][i])
# Draw rectangle to the detected region
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 0, 255), 5)
# Crop the image
crp = gry[y:y + h, x:x + w]
txt = pytesseract.image_to_string(crp, config="--psm 6")
# Display the cropped image
cv2.imshow("crp", crp)
# Display
cv2.imshow("img", img)
I think what you are looking for here is image rectificaiton (warping image to make it look like taken from another point of view) and there seem to be tools for this in python. However, the problem gets more complicated since in your case you need to detect how you want to rectify it. I am not sure how you should go about that.
So I've ben trying to detect a number (1-9) inside a yellow cube, but without a solid solution..
This is two of my pictures
This is one solution I've been trying, but without any luck
from PIL import Image
from operator import itemgetter
import numpy as np
import easyocr
import cv2
import re
import imutils
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract'
img = cv2.imread("ROI_0.png")
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
cv2.THRESH_BINARY_INV, 59, 88)
bnt = cv2.bitwise_not(thr)
txt = pytesseract.image_to_string(bnt, config="--psm 6 digits")
txt = txt.strip().split("\n")
cv2.imshow("bnt", bnt)
Is there another way to do this, because it's not working?
Binarize(otsu's method)
Correct skew using minAreaRect
Find max area contour
crop the number
pass cropped to pytesseract
image = cv2.imread("y6.png")
# image = image_resize(image,width=480,height=640)
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)[1]
contours = cv2.findContours(thresh,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)[0]
big = max(contours,key=cv2.contourArea)
(x,y),(w,h),angle = cv2.minAreaRect(big)
(h, w) = image.shape[:2]
center = (w // 2, h // 2)
M = cv2.getRotationMatrix2D(center, angle, 1.0)
rotated = cv2.warpAffine(thresh, M, (w, h),flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_CONSTANT,borderValue=(0,0,0))
big = cv2.findContours(rotated,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)[0]
big = max(big,key=cv2.contourArea)
x,y,w,h = cv2.boundingRect(big)
# cropped = rotated[y:y+h,x:x+w]
cropped = rotated[y:y+h-h//10,w//6:x+w-w//6]
data = pytesseract.image_to_string(cropped,config='--psm 6 digits')# -c tessedit_char_whitelist = 0123456789')
There are a few hardcoded values like h//10 and all in cropping. So optimization is needed.
You need to remove the black border first for tesseract to work. Just replace the black background with white color and then apply thresholding such that it can remove both white border and yellow color at the same time and then use tesseract to detect the character.
I am using tesseract for OCR, via the pytesseract bindings. Unfortunately, I encounter difficulties when trying to extract text including subscript-style numbers - the subscript number is interpreted as a letter instead.
For example, in the basic image:
I want to extract the text as "CH3", i.e. I am not concerned about knowing that the number 3 was a subscript in the image.
My attempt at this using tesseract is:
import cv2
import pytesseract
img = cv2.imread('test.jpeg')
# Note that I have reduced the region of interest to the known
# text portion of the image
text = pytesseract.image_to_string(
img[200:300, 200:320], config='-l eng --oem 1 --psm 13'
Unfortunately, this will incorrectly output
It's also possible to get 'CHa', depending on the psm parameter.
I suspect that this issue is related to the "baseline" of the text being inconsistent across the line, but I'm not certain.
How can I accurately extract the text from this type of image?
Update - 19th May 2020
After seeing Achintha Ihalage's answer, which doesn't provide any configuration options to tesseract, I explored the psm options.
Since the region of interest is known (in this case, I am using EAST detection to locate the bounding box of the text), the psm config option for tesseract, which in my original code treats the text as a single line, may not be necessary. Running image_to_string against the region of interest given by the bounding box above gives the output
which can, of course, be easily processed to get CH3.
This is because the font of subscript is too small. You could resize the image using a python package such as cv2 or PIL and use the resized image for OCR as coded below.
import pytesseract
import cv2
img = cv2.imread('test.jpg')
img = cv2.resize(img, None, fx=2, fy=2) # scaling factor = 2
data = pytesseract.image_to_string(img)
You want to do apply pre-processing to your image before feeding it into tesseract to increase the accuracy of the OCR. I use a combination of PIL and cv2 to do this here because cv2 has good filters for blur/noise removal (dilation, erosion, threshold) and PIL makes it easy to enhance the contrast (distinguish the text from the background) and I wanted to show how pre-processing could be done using either... (use of both together is not 100% necessary though, as shown below). You can write this more elegantly- it's just the general idea.
import cv2
import pytesseract
import numpy as np
from PIL import Image, ImageEnhance
img = cv2.imread('test.jpg')
def cv2_preprocess(image_path):
img = cv2.imread(image_path)
# convert to black and white if not already
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# remove noise
kernel = np.ones((1, 1), np.uint8)
img = cv2.dilate(img, kernel, iterations=1)
img = cv2.erode(img, kernel, iterations=1)
# apply a blur
# gaussian noise
img = cv2.threshold(cv2.GaussianBlur(img, (9, 9), 0), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
# this can be used for salt and pepper noise (not necessary here)
#img = cv2.adaptiveThreshold(cv2.medianBlur(img, 7), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
cv2.imwrite('new.jpg', img)
return 'new.jpg'
def pil_enhance(image_path):
image = Image.open(image_path)
contrast = ImageEnhance.Contrast(image)
return 'new2.jpg'
img = cv2.imread(pil_enhance(cv2_preprocess('test.jpg')))
text = pytesseract.image_to_string(img)
The cv2 pre-process produces an image that looks like this:
The enhancement with PIL gives you:
In this specific example, you can actually stop after the cv2_preprocess step because that is clear enough for the reader:
img = cv2.imread(cv2_preprocess('test.jpg'))
text = pytesseract.image_to_string(img)
But if you are working with things that don't necessarily start with a white background (i.e. grey scaling converts to light grey instead of white)- I have found the PIL step really helps there.
Main point is the methods to increase accuracy of the tesseract typically are:
fix DPI (rescaling)
fix brightness/noise of image
fix tex size/lines
(skewing/warping text)
Doing one of these or all three of them will help... but the brightness/noise can be more generalizable than the other two (at least from my experience).
I think this way can be more suitable for the general situation.
import cv2
import pytesseract
from pathlib import Path
image = cv2.imread('test.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1] # (suitable for sharper black and white pictures
contours = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours = contours[0] if len(contours) == 2 else contours[1] # is OpenCV2.4 or OpenCV3
result_list = []
for c in contours:
x, y, w, h = cv2.boundingRect(c)
area = cv2.contourArea(c)
if area > 200:
detect_area = image[y:y + h, x:x + w]
# detect_area = cv2.GaussianBlur(detect_area, (3, 3), 0)
predict_char = pytesseract.image_to_string(detect_area, lang='eng', config='--oem 0 --psm 10')
result_list.append((x, predict_char))
cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), thickness=2)
result = ''.join([char for _, char in sorted(result_list, key=lambda _x: _x[0])])
print(result) # CH3
output_dir = Path('./temp')
output_dir.mkdir(parents=True, exist_ok=True)
cv2.imwrite(f"{output_dir/Path('image.png')}", image)
cv2.imwrite(f"{output_dir/Path('clean.png')}", thresh)
I strongly suggest you refer to the following examples, which is a useful reference for OCR.
Get the location of all text present in image using opencv
Using YOLO or other image recognition techniques to identify all alphanumeric text present in images
I am starting to learn OpenCV and Tesseract, and have trouble with what seems to be a very simple example.
Here is an image that I am trying to OCR, that reads "171 m":
I do some preprocessing. Since blue is the dominant color of the text, I extract the blue channel and apply simple thresholding.
img = cv2.imread('171_m.png')[y, x, 0]
_, thresh = cv2.threshold(img, 150, 255, cv2.THRESH_BINARY_INV)
The resulting image looks like this:
Then throw that into Tesseract, with psm 7 for single line:
text = pytesseract.image_to_string(thresh, config='--psm 7')
>>> lim
I also tried to restrict possible characters, and it gets a bit better, but not quite.
text = pytesseract.image_to_string(thresh, config='--psm 7 -c tessedit_char_whitelist=1234567890m')
>>> 17m
OpenCV v4.1.1.
Tesseract v5.0.0-alpha.20190708
Any help appreciated.
Before throwing the image into Pytesseract, preprocessing can help. The desired text should be in black while the background should be in white. Here's an approach
Convert image to grayscale and enlarge image
Gaussian blur
Otsu's threshold
Invert image
After converting to grayscale, we enlarge the image using imutils.resize() and Gaussian blur. From here we Otsu's threshold to get a binary image
If you have noisy images, an additional step would be to use morphological operations to smooth or remove noise. But since your image is clean enough, we can simply invert the image to get our result
Output from Pytesseract using --psm 6
import cv2
import pytesseract
import imutils
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
image = cv2.imread('1.png',0)
image = imutils.resize(image, width=400)
blur = cv2.GaussianBlur(image, (7,7), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
result = 255 - thresh
data = pytesseract.image_to_string(result, lang='eng',config='--psm 6')
cv2.imshow('thresh', thresh)
cv2.imshow('result', result)
Disclaimer : This is not a solution, just a trial to partially solve this.
This process works only if you have knowledge of the number of the characters present in the image beforehand. Here is the trial code :
img0 = cv2.imread('171_m.png', 0)
adap_thresh = cv2.adaptiveThreshold(img0, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2)
text_adth = pytesseract.image_to_string(adap_thresh, config='--psm 7')
After adaptive thresholding, the produced image is like this :
Pytesseract gives output as :
171 mi.
Now, if you know, in advance, the number of characters present, you can slice the string read by pytesseract and get the desired output as '171m'.
I thought your image was not sharp enough, hence I applied the process described at How do I increase the contrast of an image in Python OpenCV to first sharpen your image and then proceed by extracting the blue layer and running the tesseract.
I hope this helps.
import cv2
import pytesseract
img = cv2.imread('test.png') #test.png is your original image
s = 128
img = cv2.resize(img, (s,int(s/2)), 0, 0, cv2.INTER_AREA)
def apply_brightness_contrast(input_img, brightness = 0, contrast = 0):
if brightness != 0:
if brightness > 0:
shadow = brightness
highlight = 255
shadow = 0
highlight = 255 + brightness
alpha_b = (highlight - shadow)/255
gamma_b = shadow
buf = cv2.addWeighted(input_img, alpha_b, input_img, 0, gamma_b)
buf = input_img.copy()
if contrast != 0:
f = 131*(contrast + 127)/(127*(131-contrast))
alpha_c = f
gamma_c = 127*(1-f)
buf = cv2.addWeighted(buf, alpha_c, buf, 0, gamma_c)
return buf
out = apply_brightness_contrast(img,0,64)
b, g, r = cv2.split(out) #spliting and using just the blue
pytesseract.image_to_string(255-b, config='--psm 7 -c tessedit_char_whitelist=1234567890m') # the 255-b here because the image has black backgorund and white numbers, 255-b switches the colors