opencv thresholding and pytesseract - python

currently I am trying to develop some simple computervision code to read the amount of kills that I have in a call of duty game and save it to an array as an integer. The code is screenshotting my screen every second and using opencv I am thresholding the image and inputting it into pytesseract. Although the numbers stay the same, the background noise changes the image a lot and forces a lot of null inputs. I am ok if it misses a few inputs but it misses %50 or more of all of the digits. If anyone has any tips on thresholding a single digit image with varying backgrounds, it would be a huge help.
'''
pytesseract.pytesseract.tesseract_cmd = r'C:/Program Files/Tesseract-OCR/tesseract'
pyautogui.screenshot('pictures/Kill.png', region = (1822, 48, 30, 23))
img = cv2.imread('pictures/Kill.png')
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
ret, thresh1 = cv2.threshold(img, 255, 255, cv2.THRESH_TRUNC)
cv2.imwrite('pictures/killthresh1.png',thresh1)
ret, thresh1 = cv2.threshold(img, 180, 255, cv2.THRESH_BINARY)
thresh1 = cv2.bitwise_not(thresh1)
cv2.imwrite('pictures/Killthresh2.png', thresh1)
custom_config = r'-l eng --oem 3 --psm 7 -c
tessedit_char_whitelist="1234567890" '
killnumber = pytesseract.image_to_string(thresh1, config = custom_config)
'''
Original pyautogui screenshot
TRUNC thresholded
BINARY thresholded
NOTE: These images yieled a 'NULL' result and I dont know why

After you read the image, img = cv2.imread('pictures/Kill.png')
Apply adaptive-threshold on Original pyautogui screenshot:
Now read:
txt = pytesseract.image_to_string(thr, config="--psm 7")
print(txt)
Result:
3
Code:
import cv2
import pytesseract
img = cv2.imread("0wHAy.png")
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
cv2.THRESH_BINARY_INV, 21, 9)
txt = pytesseract.image_to_string(thr, config="--psm 7")
print(txt)

Related

Pytesseract read coloured text

I am trying to read coloured (red and orange) text with Pytesseract.
I tried to not grayscale the image, but that didn't work either.
Images, that it CAN read
Images, that it CANNOT read
My current code is:
tesstr = pytesseract.image_to_string(
cv2.cvtColor(nm.array(cap), cv2.COLOR_BGR2GRAY),
config="--psm 7")
This little function (below) will do for any color
ec9Ut.png
Thresh result
x18MN.png
Thresh result
SFr48.png
Thresh result
import cv2
from pytesseract import image_to_string
def getText(filename):
img = cv2.imread(filename)
HSV_img = cv2.cvtColor(img,cv2.COLOR_BGR2HSV)
h,s,v = cv2.split(HSV_img)
thresh = cv2.threshold(v, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
txt = image_to_string(thresh, config="--psm 6 digits")
return txt
text = getText('ec9Ut.png')
print(text)
text = getText('x18MN.png')
print(text)
text = getText('SFr48.png')
print(text)
Output
46
31
53
You can apply:
Erosion
Adaptive-threshold
Erosion
Erosion will decrease the thickness of the image like:
Original Image
Erosion
When we apply erosion to the 53 and 31 images
Original Image
Erosion
For adaptive-threshold:
When blockSize= 27
Erosion
Threshold
When blockSize= 11
Erosion
 Threshold
For each image, we need to apply different threhsolding
Code:
import cv2
from pytesseract import image_to_string
img_lst = ["fifty_three.png", "thirty_one.png"]
for img_pth in img_lst:
img = cv2.imread(img_pth)
(h, w) = img.shape[:2]
img = cv2.resize(img, (w*2, h*2))
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
erd = cv2.erode(gry, None, iterations=2)
if img_pth == "fifty_three.png":
thr = cv2.adaptiveThreshold(erd, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 27, 5)
else:
thr = cv2.adaptiveThreshold(erd, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 5)
txt = image_to_string(thr, config="--psm 6 digits")
print(txt)
cv2.imshow("thr", thr)
cv2.waitKey(0)
Result:
53
31
Possible Question1: Why two different block size parameters?
Well, thickness of each image are different. So two different parameters are required for text-recognition.
Possible Question2: Why None defined as kernel for erode method?
Unfortunately, I couldn't find a suitable kernel for erosion. Therefore I set to None.

Improving image pre-processing for tesseract (video game screenshot)

I am trying to read text for prices in a video game and am experiencing difficulty in pre-processing the image.
The rest of my code is "complete", as in after the text is extracted I am formatting it and outputting into CSV for later use.
This is what I have come up with so far for the following images, and would like input on other thresholds or pre-processing tools that will make the OCR more accurate.
Raw Image Screenshot
After gamma, denoise on left - binary threshold on right
The text detected
As you can see, it is very close but not perfect. I would like to make it more accurate as I will be processing many frames eventually.
Here is my current code:
import cv2
import pytesseract
import pandas as pd
import numpy as np
# Tells pytesseract where the tesseract environment is installed on local computer
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
img = cv2.imread("./image_frames/frame0.png")
# gamma to darken text to be same opacity?
def adjust_gamma(crop_img, gamma=1.0):
# build a lookup table mapping the pixel values [0, 255] to
# their adjusted gamma values
invGamma = 1.0 / gamma
table = np.array([((i / 255.0) ** invGamma) * 255
for i in np.arange(0, 256)]).astype("uint8")
# apply gamma correction using the lookup table
return cv2.LUT(crop_img, table)
adjusted = adjust_gamma(crop_img, gamma=0.15)
# grayscale the image
gray = cv2.cvtColor(adjusted, cv2.COLOR_BGR2GRAY)
# denoising image
dst = cv2.fastNlMeansDenoising(gray, None, 10, 10, 10)
# binary threshold
thresh = cv2.threshold(gray, 35, 255, cv2.THRESH_BINARY_INV)[1]
# OCR configurations (3 is default)
config = "--psm 3"
# Just show the image
cv2.imshow("before", gray)
cv2.imshow("before", dst)
cv2.imshow("thresh", thresh)
cv2.waitKey(0)
# Reads text from the image and prints to console
text = pytesseract.image_to_string(thresh, config=config)
# remove double lines
text = text.replace('\n\n','\n')
# remove unicode character
text = text.replace('', '')
print(text)
Any help is appreciated as I am very new to this!
Step#1: Scale the image
Step#2: Apply adaptive-threshold
Step#3: Set page-segmentation-mode (psm) to 6 (Assume a single uniform block of text.)
1 Scaling the image:
The reason is to see the image clearly, since the original image is really small.
img = cv2.imread("udQw1.png")
img = cv2.resize(img, None, fx=3, fy=3, interpolation=cv2.INTER_CUBIC)
2 Apply adaptive-threshold
Generally threshold is applied, but in your image, applying threshold has no effect to the result.
For different images you may need to set different C and block values.
For instance for the 1st image:
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
cv2.THRESH_BINARY_INV, 15, 22)
Result:
For instance for the 2nd image:
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
cv2.THRESH_BINARY_INV, 51, 4)
Result:
3 Set psm to 6 which assumes the image as a single uniform block of text.
txt = pytesseract.image_to_string(thr, config="--psm 6")
print(txt)
Result for the 1st image:
Dragon Claymore
1,388,888,888 mesos.
Maple Pyrope Spear
288,888,888 mesos.
Element Pierce
488,888,888 mesos.
Purple Adventurer Cape
97,777,777 mesos.
Result for the 2nd image:
Ring of Alchemist
749,999,995 mesos.
Dragon Slash Claw
499,999,995 mesos.
"Stormcaster Gloves
149,999,995 mesos.
Elemental Wand 6
749,999,995 mesos.
Big Money Chalr
1 tor 249,999,985 mesos.|
Code for the 1st image:
import pytesseract
import cv2
img = cv2.imread("udQw1.png")
img = cv2.resize(img, None, fx=3, fy=3, interpolation=cv2.INTER_CUBIC)
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
cv2.THRESH_BINARY_INV, 15, 22)
txt = pytesseract.image_to_string(thr, config="--psm 6")
print(txt)
Code for the 2nd image:
import pytesseract
import cv2
img = cv2.imread("7Y2yx.png")
img = cv2.resize(img, None, fx=3, fy=3, interpolation=cv2.INTER_CUBIC)
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
cv2.THRESH_BINARY_INV, 51, 4)
txt = pytesseract.image_to_string(thr, config="--psm 6")
print(txt)
Links
Simple and adaptive-threhsold
Page segmentation Modes
Improving quality of the output

Can't get numbers from image with python, Tesseract and opencv

i have to get numbers from a water-meter image usign python tesseract and opencv.
I have tried to change the --psm but it's doesn't work.
Here the image without modification :
enter image description here
Here the outpout image :
enter image description here
I need your help guys, i'm starting python and i'm already blocked :'(
My code :
from PIL import Image
import pytesseract
import cv2
import numpy as np
import urllib
import requests
pytesseract.pytesseract.tesseract_cmd = r'C:\Users\Hymed\AppData\Local\Tesseract-OCR\tesseract.exe'
col = Image.open("pts.jpg")
gray = col.convert('L')
bw = gray.point(lambda x: 0 if x<128 else 255, '1')
bw.save("cp19.png")
image = cv2.imread('cp19.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = 255 - cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Blur and perform text extraction
thresh = cv2.GaussianBlur(thresh, (3,3), 0)
img1 = np.array(thresh)
data = pytesseract.image_to_string(img1, config='--psm 11 digits')
print(data)
cv2.imshow('thresh', thresh)
cv2.waitKey()
You have nearly finished the task.
I use the divide operation, after the GaussianBlur.
div = cv2.divide(gray, thresh, scale=192)
Result:
When I read from the image:
data = pytesseract.image_to_string(div, config='--psm 11 digits')
print(data)
Result:
00000161
Code: (Just added div = cv2.divide(gray, thresh, scale=192) rest are your code)
from PIL import Image
import pytesseract
import cv2
import numpy as np
col = Image.open("TOaEW.jpg")
gray = col.convert('L')
bw = gray.point(lambda x: 0 if x < 128 else 255, '1')
bw.save("cp19.png")
image = cv2.imread('cp19.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = 255 - cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Blur and perform text extraction
thresh = cv2.GaussianBlur(thresh, (3, 3), 0)
div = cv2.divide(gray, thresh, scale=192) # added
data = pytesseract.image_to_string(div, config='--psm 11 digits')
print(data)
I tried to read the number from an image using Tesseract. Except the numbers shown in the first line, it also returned an unidentified symbol in the second line. I don't understand what I did wrong. Here is the code and the results
code and output
This is the image I extracted the number from:
Image used for number extraction

How to read digits from some images using Python

I have some pics from which I want to read digits. I used pytesseract as well as cv2 threshold.
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
crop = ['crop.png','crop1.png','crop2.png','crop3.png']
for c in crop:
image = cv2.imread(c, 0)
#thresh = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
thresh = cv2.threshold(image, 0, 255,cv2.THRESH_OTSU)[1]
#thresh = cv2.GaussianBlur(thresh, (1,3), 0 )
#thresh = cv2.adaptiveThreshold(thresh,125, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 11, 12)
#thresh = cv2.bilateralFilter(thresh, 15, 80, 80, cv2.BORDER_DEFAULT)
data = pytesseract.image_to_string(thresh, lang='eng',config='--psm 6')
print(data)
print('\nnext')
cv2.imshow('thresh', thresh)
but not getting good output
please tell me where I am doing wrong.
here are the pics
https://ibb.co/thgXTSn
https://ibb.co/cYGYL2W
https://ibb.co/R2nbt0g
https://ibb.co/ZgPKy2N
You can try to do image processing before recognition. For example, like this:
image = cv2.imread(c, 0)
se=cv2.getStructuringElement(cv2.MORPH_ELLIPSE , (5,5))
close=cv2.morphologyEx(image, cv2.MORPH_CLOSE, se)
close=cv2.absdiff(close, image)
image=cv2.bitwise_not(close)
Also try upsampling image.
Hope this solve your problem.

Pytesseract with custom font incorrectly classifying numbers

I am trying to detect prices using pytesseract.
However I am having very bad results.
I have one large image with several prices in different locations.
These locations are constant so I am cropping the image down and saving each area as a new image and then trying to detect the text.
I know the text will only contain 0123456789$¢.
I trained my new font using trainyourtesseract.com.
For example, I take this image.
Double it's size, and threshold it to get this.
Run it through tesseract and get an output of 8.
Any help would be appreciated.
def getnumber(self, img):
grey = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thresh, grey = cv2.threshold(grey, 50, 255, cv2.THRESH_BINARY_INV)
filename = "{}.png".format(os.getpid())
cv2.imwrite(filename, grey)
text = pytesseract.image_to_string(Image.open(filename), lang='Droid',
config='--psm 13 --oem 3 -c tessedit_char_whitelist=0123456789.$¢')
os.remove(filename)
return(text)
You're on the right track. When preprocessing the image for OCR, you want to get the text in black with the background in white. The idea is to enlarge the image, Otsu's threshold to get a binary image, then perform OCR. We use --psm 6 to tell Pytesseract to assume a single uniform block of text. Look here for more configuration options. Here's the processed image:
Result from OCR:
2¢
Code
import cv2
import pytesseract
import imutils
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
# Resize, grayscale, Otsu's threshold
image = cv2.imread('1.png')
image = imutils.resize(image, width=500)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Perform text extraction
data = pytesseract.image_to_string(thresh, lang='eng',config='--psm 6')
print(data)
cv2.imshow('thresh', thresh)
cv2.imwrite('thresh.png', thresh)
cv2.waitKey()
Machine specs:
Windows 10
opencv-python==4.2.0.32
pytesseract==0.2.7
numpy==1.14.5

Categories