Improving image pre-processing for tesseract (video game screenshot) - python

I am trying to read text for prices in a video game and am experiencing difficulty in pre-processing the image.
The rest of my code is "complete", as in after the text is extracted I am formatting it and outputting into CSV for later use.
This is what I have come up with so far for the following images, and would like input on other thresholds or pre-processing tools that will make the OCR more accurate.
Raw Image Screenshot
After gamma, denoise on left - binary threshold on right
The text detected
As you can see, it is very close but not perfect. I would like to make it more accurate as I will be processing many frames eventually.
Here is my current code:
import cv2
import pytesseract
import pandas as pd
import numpy as np
# Tells pytesseract where the tesseract environment is installed on local computer
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
img = cv2.imread("./image_frames/frame0.png")
# gamma to darken text to be same opacity?
def adjust_gamma(crop_img, gamma=1.0):
# build a lookup table mapping the pixel values [0, 255] to
# their adjusted gamma values
invGamma = 1.0 / gamma
table = np.array([((i / 255.0) ** invGamma) * 255
for i in np.arange(0, 256)]).astype("uint8")
# apply gamma correction using the lookup table
return cv2.LUT(crop_img, table)
adjusted = adjust_gamma(crop_img, gamma=0.15)
# grayscale the image
gray = cv2.cvtColor(adjusted, cv2.COLOR_BGR2GRAY)
# denoising image
dst = cv2.fastNlMeansDenoising(gray, None, 10, 10, 10)
# binary threshold
thresh = cv2.threshold(gray, 35, 255, cv2.THRESH_BINARY_INV)[1]
# OCR configurations (3 is default)
config = "--psm 3"
# Just show the image
cv2.imshow("before", gray)
cv2.imshow("before", dst)
cv2.imshow("thresh", thresh)
cv2.waitKey(0)
# Reads text from the image and prints to console
text = pytesseract.image_to_string(thresh, config=config)
# remove double lines
text = text.replace('\n\n','\n')
# remove unicode character
text = text.replace('', '')
print(text)
Any help is appreciated as I am very new to this!

Step#1: Scale the image
Step#2: Apply adaptive-threshold
Step#3: Set page-segmentation-mode (psm) to 6 (Assume a single uniform block of text.)
1 Scaling the image:
The reason is to see the image clearly, since the original image is really small.
img = cv2.imread("udQw1.png")
img = cv2.resize(img, None, fx=3, fy=3, interpolation=cv2.INTER_CUBIC)
2 Apply adaptive-threshold
Generally threshold is applied, but in your image, applying threshold has no effect to the result.
For different images you may need to set different C and block values.
For instance for the 1st image:
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
cv2.THRESH_BINARY_INV, 15, 22)
Result:
For instance for the 2nd image:
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
cv2.THRESH_BINARY_INV, 51, 4)
Result:
3 Set psm to 6 which assumes the image as a single uniform block of text.
txt = pytesseract.image_to_string(thr, config="--psm 6")
print(txt)
Result for the 1st image:
Dragon Claymore
1,388,888,888 mesos.
Maple Pyrope Spear
288,888,888 mesos.
Element Pierce
488,888,888 mesos.
Purple Adventurer Cape
97,777,777 mesos.
Result for the 2nd image:
Ring of Alchemist
749,999,995 mesos.
Dragon Slash Claw
499,999,995 mesos.
"Stormcaster Gloves
149,999,995 mesos.
Elemental Wand 6
749,999,995 mesos.
Big Money Chalr
1 tor 249,999,985 mesos.|
Code for the 1st image:
import pytesseract
import cv2
img = cv2.imread("udQw1.png")
img = cv2.resize(img, None, fx=3, fy=3, interpolation=cv2.INTER_CUBIC)
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
cv2.THRESH_BINARY_INV, 15, 22)
txt = pytesseract.image_to_string(thr, config="--psm 6")
print(txt)
Code for the 2nd image:
import pytesseract
import cv2
img = cv2.imread("7Y2yx.png")
img = cv2.resize(img, None, fx=3, fy=3, interpolation=cv2.INTER_CUBIC)
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
cv2.THRESH_BINARY_INV, 51, 4)
txt = pytesseract.image_to_string(thr, config="--psm 6")
print(txt)
Links
Simple and adaptive-threhsold
Page segmentation Modes
Improving quality of the output

Related

How to detect colored text on gradient background with pytesseract

I'm currently working on a small OCR bot. I got pretty much everything to work and am now trying to improve the OCR. Specifically, it has problems with two things: the orange/red-ish text on the same colored gradient and for some reason the first 1 of "1/1". Sadly I haven't found anything that worked in my case yet. I've made a small test image, which is consisting of multiple images, below:
Source Image
Results
Adaptive Threshold
As you can see the gradient results in a blob that is sometimes big enough to overlap with the first word (see "apprentice") resulting in garbage.
I've tried many variations and played around with thresholds, blurs, erode, dilation, box detection with the dilation method, etc. but nothing worked well. The only way I did get rid of the blob is using an adaptive Threshold. But sadly I wasn't able to get good results using the output image.
If anyone knows how to make the OCR more robust, increase accuracy and get rid of the blob I'd appreciate your help. Thanks.
The following code is my 'playground' to figure out a better way:
import cv2
import pytesseract
import numpy as np
pytesseract.pytesseract.tesseract_cmd = YOUR_PATH
def resize(img, scale_percent=300):
# use this instead?
# resize = image = imutils.resize(image, width=300)
# automatically resizes it about 300% by default
width = int(img.shape[1] * scale_percent / 100)
height = int(img.shape[0] * scale_percent / 100)
dim = (width, height)
resized = cv2.resize(img, dim, interpolation=cv2.INTER_AREA)
return resized
def preprocessImage(img, scale=300, threshhold=127):
""" input RGB colour space """
# makes results more accurate - inspired from https://stackoverflow.com/questions/58103337/how-to-ocr-image-with-tesseract
# another resource to improve accuracy - https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html
# converts from rgb to grayscale then enlarges it
# applies gaussian blur
# convert to b&w
# invert black and white colours (white background, black text)
grayscale = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
cv2.imshow('grayscale', grayscale)
resized = resize(grayscale, scale)
cv2.imshow('resized', resized)
blurred = cv2.medianBlur(resized, 5)
#cv2.imshow('median', blurred)
blurred = cv2.GaussianBlur(resized, (5, 5), 5)
cv2.imshow('1', blurred)
cv2.waitKey()
blackAndWhite = cv2.threshold(blurred, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
cv2.imshow('blackAndWhite', blackAndWhite)
th3 = cv2.adaptiveThreshold(blurred, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 11, 2)
cv2.floodFill(th3, None, (0, 0), 255)
cv2.imshow('th3', th3)
#kernel = np.ones((3, 3), np.uint8)
#erode = cv2.erode(th3, kernel)
kernel = np.ones((5, 5), np.uint8)
#opening = cv2.morphologyEx(blackAndWhite, cv2.MORPH_OPEN, kernel)
invertedColours = cv2.bitwise_not(blackAndWhite)
return invertedColours
# excerpt from https://www.youtube.com/watch?v=6DjFscX4I_c
def imageToText(img):
# returns item name from image, preprocess if needed
boxes = pytesseract.image_to_data(img)
num = []
for count, box in enumerate(boxes.splitlines()):
if (count != 0):
box = box.split()
if (len(box) == 12):
text = box[11].strip('#®')
if (text != ''):
num.append(text)
text = ' '.join(num)
## Alternate method
# text = pytesseract.image_to_string(img)
# print("Name:", text)
return text
if __name__ == "__main__":
img = cv2.imread("test.png")
img = preprocessImage(img, scale=300)
print(imageToText(img))
##############################################
##### Detecting Words ######
##############################################
#[ 0 1 2 3 4 5 6 7 8 9 10 11 ]
#['level', 'page_num', 'block_num', 'par_num', 'line_num', 'word_num', 'left', 'top', 'width', 'height', 'conf', 'text']
boxes = pytesseract.image_to_data(img)
# convert back to colored image
img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
# draw boxes and text
for a,b in enumerate(boxes.splitlines()):
print(b)
if a!=0:
b = b.split()
if len(b)==12:
x,y,w,h = int(b[6]),int(b[7]),int(b[8]),int(b[9])
cv2.putText(img,b[11],(x,y-5),cv2.FONT_HERSHEY_SIMPLEX,1,(50,50,255),2)
cv2.rectangle(img, (x,y), (x+w, y+h), (0, 0, 255), 2)
cv2.imshow('img', img)
cv2.waitKey(0)
I couldn't get it perfect but almost...
I got a lot of benefit from CLAHE equalization. See tutorial here. But that wasn't enough. Still needed thresholding. Adaptive techniques didn't work well, but cv2.THRESH_TOZERO gives OK results. See thresholding tutorial here
import cv2
from pytesseract import image_to_string, image_to_data
img = cv2.imread('gradient.png', cv2.IMREAD_GRAYSCALE)
img = cv2.resize(img, (0,0), fx=2.0, fy=2.0)
clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8,8))
img = clahe.apply(img)
img = 255-img # invert image. tesseract prefers black text on white background
ret, img = cv2.threshold(img, 127, 255, cv2.THRESH_TOZERO)
cv2.imwrite('output.png', img)
ocr = image_to_string(img, config='--psm 6')
print(ocr)
which gives ocr output
Tool Crafting Part
Apprentice Craft Kit
Adept Craft Kit
Expert Craft Kit
=
Master Craft Kit
1/1

How to improve Tesseract's output

I have an image that looks like this:
And this is the processed image
I have tried pretty much everything. I processed the image like this:
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) #Converting to GrayScale
(h, w) = gray.shape[:2]
gray = cv2.resize(gray, (w*2, h*2))
thresh = cv2.threshold(gray, 150, 255.0, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
gray = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, rectKernel)
blur = cv2.GaussianBlur(gray,(1,1),cv2.BORDER_DEFAULT)
text = pytesseract.image_to_string(blur, config="--oem 1 --psm 6")
But Tesseract doesnt print out anything. I am using this version of tesseract
5.0.0-alpha.20201127
How do I improve it's performance? Its highly unreliable.
Edit:
The answer below did a wonderful job on the said image.
But when I apply this technique to image like this one I get wrong output
Why is that? They seem roughly the same.
The problem is characters are not in center of the image.
Sometimes, tesseract have difficulty recognizing the characters or digit if they are not on the center.
Therefore my suggestion is:
Center the characters
Up-sample and convert to gray-scale
Centering the characters:
cv2.copyMakeBorder(img, 50, 50, 50, 50, cv2.BORDER_CONSTANT, value=[255])
50 is just a padding variable, you can set to any other value.
The background turns blue because of the value. OpenCV read the image in BGR fashion. giving 255 as an input is same as [255, 0, 0] which is display blue channel, but not green and red respectively.
You can try with other values. For me it won't matter, since I'll convert it to gray-scale on the next step.
Up-sampling and converting to gray-scale:
The same steps you have done. The first three-line of your code.
Now when you read:
MEHVISH MUQADDAS
Code:
import cv2
import pytesseract
# Load the image
img = cv2.imread("onf0D.jpg")
# Center the image
img = cv2.copyMakeBorder(img, 50, 50, 50, 50, cv2.BORDER_CONSTANT, value=[255])
# Up-sample
img = cv2.resize(img, (0, 0), fx=2, fy=2)
# Convert to gray-scale
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# OCR
txt = pytesseract.image_to_string(gry, config="--psm 6")
print(txt)
Read more tesseract-improve-quality.
You don't need to do threshold, GaussianBlur or morphologyEx.
The reasons are:
Simple-Threshold is used to get the features of the image. Input images' features are already available.
You don't have to smooth the image, there is no illumination effect on the image.
You don't need to do segmentation, since background is plain-white.
Update-1
The second image requires pre-processing. However, applying simple-threshold won't work on this image. You need to remove the background using a binary mask, then you can apply OCR.
Result of the binary-mask:
Now, if you apply OCR:
IRUM FEROZ
Code:
import cv2
import numpy as np
import pytesseract
# Load the image
img = cv2.imread("jCMft.jpg")
# Center the image
img = cv2.copyMakeBorder(img, 50, 50, 50, 50, cv2.BORDER_CONSTANT, value=[255])
# Up-sample
img = cv2.resize(img, (0, 0), fx=2, fy=2)
# Convert to HSV color-space
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
# Adaptive-Threshold
msk = cv2.inRange(hsv, np.array([0, 0, 0]), np.array([179, 255, 130]))
# OCR
txt = pytesseract.image_to_string(msk, config="--psm 6")
print(txt)
Q:How do I find the lower and upper bounds of the cv2.inRange method?
A: You can use the following script.
Q: What did you change in the second image?
A: First I converted image to the HSV format, instead of gray-scale. The reason is I wanted remove the background. If you experiment with adaptiveThreshold you will see there are a lot of artifacts on the background limits the tesseract recognition. Then I used cv2.inRange to get a binary mask. Feeding binary-mask to the input gave me the desired result.

Pytesseract read coloured text

I am trying to read coloured (red and orange) text with Pytesseract.
I tried to not grayscale the image, but that didn't work either.
Images, that it CAN read
Images, that it CANNOT read
My current code is:
tesstr = pytesseract.image_to_string(
cv2.cvtColor(nm.array(cap), cv2.COLOR_BGR2GRAY),
config="--psm 7")
This little function (below) will do for any color
ec9Ut.png
Thresh result
x18MN.png
Thresh result
SFr48.png
Thresh result
import cv2
from pytesseract import image_to_string
def getText(filename):
img = cv2.imread(filename)
HSV_img = cv2.cvtColor(img,cv2.COLOR_BGR2HSV)
h,s,v = cv2.split(HSV_img)
thresh = cv2.threshold(v, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
txt = image_to_string(thresh, config="--psm 6 digits")
return txt
text = getText('ec9Ut.png')
print(text)
text = getText('x18MN.png')
print(text)
text = getText('SFr48.png')
print(text)
Output
46
31
53
You can apply:
Erosion
Adaptive-threshold
Erosion
Erosion will decrease the thickness of the image like:
Original Image
Erosion
When we apply erosion to the 53 and 31 images
Original Image
Erosion
For adaptive-threshold:
When blockSize= 27
Erosion
Threshold
When blockSize= 11
Erosion
 Threshold
For each image, we need to apply different threhsolding
Code:
import cv2
from pytesseract import image_to_string
img_lst = ["fifty_three.png", "thirty_one.png"]
for img_pth in img_lst:
img = cv2.imread(img_pth)
(h, w) = img.shape[:2]
img = cv2.resize(img, (w*2, h*2))
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
erd = cv2.erode(gry, None, iterations=2)
if img_pth == "fifty_three.png":
thr = cv2.adaptiveThreshold(erd, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 27, 5)
else:
thr = cv2.adaptiveThreshold(erd, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 5)
txt = image_to_string(thr, config="--psm 6 digits")
print(txt)
cv2.imshow("thr", thr)
cv2.waitKey(0)
Result:
53
31
Possible Question1: Why two different block size parameters?
Well, thickness of each image are different. So two different parameters are required for text-recognition.
Possible Question2: Why None defined as kernel for erode method?
Unfortunately, I couldn't find a suitable kernel for erosion. Therefore I set to None.

opencv thresholding and pytesseract

currently I am trying to develop some simple computervision code to read the amount of kills that I have in a call of duty game and save it to an array as an integer. The code is screenshotting my screen every second and using opencv I am thresholding the image and inputting it into pytesseract. Although the numbers stay the same, the background noise changes the image a lot and forces a lot of null inputs. I am ok if it misses a few inputs but it misses %50 or more of all of the digits. If anyone has any tips on thresholding a single digit image with varying backgrounds, it would be a huge help.
'''
pytesseract.pytesseract.tesseract_cmd = r'C:/Program Files/Tesseract-OCR/tesseract'
pyautogui.screenshot('pictures/Kill.png', region = (1822, 48, 30, 23))
img = cv2.imread('pictures/Kill.png')
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
ret, thresh1 = cv2.threshold(img, 255, 255, cv2.THRESH_TRUNC)
cv2.imwrite('pictures/killthresh1.png',thresh1)
ret, thresh1 = cv2.threshold(img, 180, 255, cv2.THRESH_BINARY)
thresh1 = cv2.bitwise_not(thresh1)
cv2.imwrite('pictures/Killthresh2.png', thresh1)
custom_config = r'-l eng --oem 3 --psm 7 -c
tessedit_char_whitelist="1234567890" '
killnumber = pytesseract.image_to_string(thresh1, config = custom_config)
'''
Original pyautogui screenshot
TRUNC thresholded
BINARY thresholded
NOTE: These images yieled a 'NULL' result and I dont know why
After you read the image, img = cv2.imread('pictures/Kill.png')
Apply adaptive-threshold on Original pyautogui screenshot:
Now read:
txt = pytesseract.image_to_string(thr, config="--psm 7")
print(txt)
Result:
3
Code:
import cv2
import pytesseract
img = cv2.imread("0wHAy.png")
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
cv2.THRESH_BINARY_INV, 21, 9)
txt = pytesseract.image_to_string(thr, config="--psm 7")
print(txt)

Pytesseract with custom font incorrectly classifying numbers

I am trying to detect prices using pytesseract.
However I am having very bad results.
I have one large image with several prices in different locations.
These locations are constant so I am cropping the image down and saving each area as a new image and then trying to detect the text.
I know the text will only contain 0123456789$¢.
I trained my new font using trainyourtesseract.com.
For example, I take this image.
Double it's size, and threshold it to get this.
Run it through tesseract and get an output of 8.
Any help would be appreciated.
def getnumber(self, img):
grey = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thresh, grey = cv2.threshold(grey, 50, 255, cv2.THRESH_BINARY_INV)
filename = "{}.png".format(os.getpid())
cv2.imwrite(filename, grey)
text = pytesseract.image_to_string(Image.open(filename), lang='Droid',
config='--psm 13 --oem 3 -c tessedit_char_whitelist=0123456789.$¢')
os.remove(filename)
return(text)
You're on the right track. When preprocessing the image for OCR, you want to get the text in black with the background in white. The idea is to enlarge the image, Otsu's threshold to get a binary image, then perform OCR. We use --psm 6 to tell Pytesseract to assume a single uniform block of text. Look here for more configuration options. Here's the processed image:
Result from OCR:
2¢
Code
import cv2
import pytesseract
import imutils
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
# Resize, grayscale, Otsu's threshold
image = cv2.imread('1.png')
image = imutils.resize(image, width=500)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Perform text extraction
data = pytesseract.image_to_string(thresh, lang='eng',config='--psm 6')
print(data)
cv2.imshow('thresh', thresh)
cv2.imwrite('thresh.png', thresh)
cv2.waitKey()
Machine specs:
Windows 10
opencv-python==4.2.0.32
pytesseract==0.2.7
numpy==1.14.5

Categories