Tesseract image_to_string brings out the wrong numbers - python

Hi Im trying to create a program that takes a screenshot from another app, takes the numbers from said image and outputs it to txt and another window, my problem comes when the pytesseract.imagetostring brings another value other that the originial.
example:
Here the screenshot that my program took,original image, the number is 8258
then I appplied a grayscale to help tesseract out, image grayscale
after that I aplied blur because aparentely this helps tesseract out, image blur
and finally I put a threshold because It helped to get the numbers correctly, image thresh
after all of that is done I call pytessract to make a string form the image:
data = pytesseract.image_to_string(thresh, lang='eng',config='--psm 13 --oem 3 -c tessedit_char_whitelist=0123456789')
But my results are this for the txt, tesseract txt (its shows twice because I put 2 prints)
and for my window image window , the value comes out as 68258
I really don´t know why this is happening and Im all out of answers to fix it
im using tesseract v.5.0.0
this is the code Im using:
from ctypes import resize
import pygetwindow
import pyautogui
import cv2
import pytesseract
from tkinter import *
from tkinter import font
from pygetwindow import PyGetWindowException
import numpy as np
import io
path = "C:\Image.png"
//this function takes the screnshot from the window of the other app
def getting_image():
titles = pygetwindow.getAllTitles()
window = pygetwindow.getWindowsWithTitle('Connect')[0]
x1 = window.left
y1 = window.top
height = window.height
width = window.width
x2 = x1 + width
y2 = y1 + height
pyautogui.screenshot(path)
im = Image.open(path)
im = im.crop((x1+640,y1+125,x2-130,y2-670))
im.save(path, dpi=(600,600))
//this function takes the image saved and convert it to text
def send_image():
image = cv2.imread(path) //path is currently my C: drive, C:\Image.png
cv2.imshow('image_org',image)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imshow('image_bckgray',gray)
blur = cv2.medianBlur(gray,5)
cv2.imshow('image_blur',blur)
thresh = 255 - cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
cv2.imshow('image_thresh',thresh)
data = pytesseract.image_to_string(thresh, lang='eng',config='--psm 13 --oem 3 -c tessedit_char_whitelist=0123456789')
print(data)
data = wording(data) //this just takes out the \x0c at the end of the file
print(data)
return (data)

Related

Python Tesseract is not detecting my number

I wanted to take a screenshot of My Valorant Game and give out the remaining Time in the Image.
It all works fine but its not detecting a number in the Image.
time.sleep(2)
myScreenshot = pyautogui.screenshot()
myScreenshot.save(r'Path\screenshot.png')
raw = cv2.imread("Path/screenshot.png")
y=10
x=1160
h=100
w=200
cropped = raw[y:y+h, x:x+w]
cv2.imwrite("Path/Time.png", cropped)
Time = cv2.imread("Path/Time.png")
string = pytesseract.image_to_string(Time, config='--psm 13')
print(string)
Example Image of "Time.png"
I tryed different psm setting they didnt help.
You need to preprocess the input image before OCR (e.g. remove background/noise). Something like this should work for image you provided:
import numpy as np
import pyautogui
import pytesseract
from PIL import Image
y = 10
x = 1160
h = 100
w = 200
cropped = pyautogui.screenshot(region=(x, y, h, w))
data = np.array(cropped)
color = (255, 255, 255)
mask_cv = np.any(data == [255,255,255], axis = -1)
ocr_area = Image.fromarray(np.invert(mask_cv))
string = pytesseract.image_to_string(ocr_area)
print(string)

How to read numbers on screen efficiently (pytesseract)?

I'm trying to read numbers on the screen and for that I'm using pytesseract. The thing is, even though it works, it works slowly and doesn't give good results at all. for example, with this image:
I can make this thresholded image:
and it reads 5852 instead of 585, which is understandable, but sometimes it can be way worse with different thresholding. It can read 1 000 000 as 1 aaa eee for example, or 585 as 5385r (yes it even adds characters without any reason)
Isn't any way to force pytesseract to read only numbers or simply use something that works better than pytesseract?
my code:
from PIL import Image
from pytesseract import pytesseract as pyt
import test
pyt.tesseract_cmd = 'C:/Program Files/Tesseract-OCR/tesseract.exe'
def tti2(location) :
image_file = location
im = Image.open(image_file)
text = pyt.image_to_string(im)
print(text)
for character in "abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ*^&\n" :
text = text.replace(character, "")
return text
test.th("C:\\Users\\Utilisateur\\Pictures\\greenshot\\flea market sniper\\TEST.png")
print(tti2("C:\\Users\\Utilisateur\\Pictures\\greenshot\\flea market sniper\\TESTbis.png"))
code of "test" (it's for the thresholding) :
import cv2
from PIL import Image
def th(Path) :
img = cv2.imread(Path)
# If your image is not already grayscale :
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
threshold = 60 # to be determined
_, img_binarized = cv2.threshold(img, threshold, 255, cv2.THRESH_BINARY)
pil_img = Image.fromarray(img_binarized)
Path = Path.replace(".png","")
pil_img.save(Path+"bis.png")
A way to force pytesseract to read only numbers can be done using tessedit_char_whitelist config with only digits values.
You can try to improve results using Tesseract documentation:
Tesseract - Improving the quality of the output
Also i suggest you to use:
White for the background and black for characters font color.
Select desired tesseract psm mode. In the previous case i was using 7 psm mode to treat image as a single text line.
Use tessedit_char_whitelist config to specify only the characters that you are sarching for.
With that in mind, here is the code:
import cv2
import numpy as np
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract'
originalImage = cv2.imread('1.png')
grayImage = cv2.cvtColor(originalImage, cv2.COLOR_BGR2GRAY)
(_, blackAndWhiteImage) = cv2.threshold(grayImage, 127, 255, cv2.THRESH_BINARY_INV)
text = pytesseract.image_to_string(blackAndWhiteImage, config="--psm 7 --oem 3 -c tessedit_char_whitelist=0123456789")
print('Text: ', text)
cv2.imshow('Image result', blackAndWhiteImage)
cv2.waitKey(0)
cv2.destroyAllWindows()
And the desired result:

I can't read long distance text with pytesseract

I have this image and I want to read the text on it but pytesseract returns blank
import os
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"
import cv2
import numpy as np
import math
from scipy import ndimage
import easyocr
import pytesseract
img = cv2.imread('cikti.jpg')
scale_percent = 220 # percent of original size
width = int(img.shape[1] * scale_percent / 100)
height = int(img.shape[0] * scale_percent / 100)
dim = (width, height)
# resize image
img = cv2.resize(img, dim, interpolation = cv2.INTER_AREA)
cv2.imshow('img', img)
cv2.waitKey(0)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(gray, 50, 150, apertureSize=3)
cv2.imshow('edges', edges)
cv2.waitKey(0)
angles = []
lines = cv2.HoughLinesP(edges, 1, math.pi / 180.0, 90)
for [[x1, y1, x2, y2]] in lines:
#cv2.line(img, (x1, y1), (x2, y2), (255, 0, 0), 3)
angle = math.degrees(math.atan2(y2 - y1, x2 - x1))
if(angle != 0):
angles.append(angle)
print(angles)
median_angle = np.median(angles)
img = ndimage.rotate(img, median_angle)
print(median_angle)
filiter = np.array([[-1,-1,-1],
[-1,9,-1],
[-1,-1,-1]])
cv2.imshow('filitird', img)
cv2.waitKey(0)
reader = easyocr.Reader(['tr'])
ocr_result = reader.readtext(img,)
print(ocr_result)
cv2.imshow('result', img)
k = cv2.waitKey(0)
cv2.destroyAllWindows()
here is the code i wrote
It may be because of the long distance, but enlarging the picture did not solve my problem.
what should I do
I was able successfully to read this image with tesseract by doing the following:
cropping out the pink border
reducing to grayscale (binarising)
running tesseract with --psm 8 (see this question )
I don't know if the cropping is necessary, but I couldn't get any output at all with any page segregation mode before binarising.
I did the processing manually here, but you will likely want to automate it. A good trick for setting thresholds is to look at the standard deviation of the image in question and use that to scale your thresholds, rather than picking some absolute value and having it fail on you.
Here's the image I got working:
And the run:
$ tesseract img3.png img3 --psm 8 txt
Tesseract Open Source OCR Engine v4.1.1 with Leptonica
$ cat img3.txt
47 F02 43
I've not tried with pytesseract, but you should be able to set the same thing.
Easy ocr was able to read the image immediately, albeit inaccurately, when I tried with the web service
Update: grayscaling
This is a whole subject in itself. You might want to start with this tutorial from the opencv docs. There are basically two approaches--trying properly to binarise the image (convert it to two colour pixels, on or off) and just grayscaling it. Somewhere inbetween is 'posterising', where you reduce the number of tones (binarising is a special case of posterising, where the number of tones is 2). I normally handle grayscaling with the inbuilt function in PIL (pillow); I've had good result with a quick-and-dirty sort-of binarisation algorithm where I first normalise the brightness and contrast of an image and then apply a skewing function like
def filter_point(point: int) -> int:
if point < THRESH:
return round(point/1.2)
else:
return round(point *2)
This drives most pixels to fully white/black but leaves some intermediate values in place. It's a poor solution in that it depends on three magic numbers, but in my application (preparing scanned pdfs for human reading) I got better results than with automated thresholding or posterisation.
Thus sadly the answer is going to be 'play with it'. I'd suggest you start out with an image editor and see what the bare minimum you can do to the image to get tesseract to work is---perhaps just grayscaling (which you do earlier in the code anyway) will be enough; do you need to crop it, etc. Not drawing that pink box is going to help. I provided a very crude example filter to demonstrate that pixels are just numbers and you can do your image processing that way, but you are much better off using built in methods if you possibly can.
import os
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"
import cv2 as cv
import numpy as np
import easyocr
img = cv.imread('result.png',0)
th2 = cv.adaptiveThreshold(img, 255, cv.ADAPTIVE_THRESH_MEAN_C, cv.THRESH_BINARY, 29, 10);
cv.imshow("ADAPTIVE_THRESH_MEAN_C", th2)
cv.waitKey(0)
cv.destroyAllWindows()
reader = easyocr.Reader(['tr'])
ocr_result = reader.readtext(th2,)
print(ocr_result)
it worked like this
Image befor ocr :
Result :

Pytesseract Wrong Number

I have a problem with the recognition, that some of my input images that are visibly a 1 turn into a 4 after the .image_to_string() command.
My input image is this:
unedited img
I then run some preprocessing steps over it (greyscale, thresholding with otsu, and enlarge the picture) leading to this:
preprocessed img
I also tried dilating the picture with no improvement in the output changing.
After running:
custom_config = "-c tessedit_char_whitelist=0123456789LV --psm 13"
pytesseract.image_to_string(processed_img, config=custom_config)
The final result is a String Displaying:
4LV♀ and I don't understand what I can change to get a 1 instead of the 4.
Thanks in advance for your time.
The ♀ or \n\x0c is because you need custom_config = "-c page_separator=''" as the config because for some reason it adds it as the page separator. you don't need anything else in your config.
To get your number is to do with the processing, mainly to do with the size. However this code i found works best.
import pytesseract
from PIL import Image
import cv2
pytesseract.pytesseract.tesseract_cmd = r'C:\\Program Files\\Tesseract-OCR\\tesseract.exe'
import numpy as np
imagepath = "./Pytesseract Wrong Number/kD3Zy.jpg"
read_img = Image.open(imagepath)
# convert PIL image to cv2 image locally
read_img = read_img.convert('RGB')
level_img = np.array(read_img)
level_img = level_img[:, :, ::-1].copy()
# convert to grayscale
level_img = cv2.cvtColor(level_img, cv2.COLOR_RGB2GRAY)
level_img, img_bin = cv2.threshold(level_img, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
level_img = cv2.bitwise_not(img_bin)
kernel = np.ones((2, 1), np.uint8)
# make the image bigger, because it needs at least 30 pixels for the height for the characters
level_img = cv2.resize(level_img,(0,0),fx=4,fy=4, interpolation=cv2.INTER_CUBIC)
level_img = cv2.dilate(level_img, kernel, iterations=1)
# --debug--
#cv2.imshow("Debug", level_img)
#cv2.waitKey()
#cv2.destroyAllWindows
#cv2.imwrite("1.png", level_img)
custom_config = "-c page_separator=''"
level = pytesseract.image_to_string(level_img, config=custom_config)
print(level)
if you want to save it uncomment #cv2.imwrite("1.png", level_img)
Try settings "--psm 8 --oem 3" All list is at enter link description here, though psm 8 and oem 3 generally works fine.

Pytesseract questions

I'm trying to read numbers from a screenshot I'm taking from a game, but I'm having a trouble getting the numbers right.
from pyautogui import *
import pyautogui as pg
import time
import keyboard
import random
import win32api, win32con
import threading
import cv2
import numpy
from pynput.mouse import Button, Controller
from pynput.keyboard import Listener, KeyCode
from PIL import Image
from pytesseract import *
pytesseract.tesseract_cmd = r'D:\Python\Tesseract\tesseract.exe'
#configs
custom_config = r'--dpi 300 --psm 6 --oem 3 -c tessedit_char_whitelist=0123456789'
# 1. load the image as grayscale
img = cv2.imread("price.png",cv2.IMREAD_GRAYSCALE)
# Change all pixels to black, if they aren't white already (since all characters were white)
img[img <= 150] = 231
img[img == 199] = 0
cv2.imwrite('resultfirst.png', img)
# 2. Scale it 10x
scaled = cv2.resize(img, (0,0), fx=10, fy=10, interpolation = cv2.INTER_CUBIC)
# 3. Retained your bilateral filter
filtered = cv2.bilateralFilter(scaled, 11, 17, 17)
# 4. Thresholded OTSU method
thresh = cv2.threshold(filtered, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]
time.sleep(1)
# 5. Erode the image to bulk it up for tesseract
kernel = numpy.ones((5,5),numpy.uint8)
eroded = cv2.erode(thresh, kernel, iterations = 2)
pre_processed = eroded
output = pytesseract.image_to_string(pre_processed, config=custom_config)
cv2.imwrite('result.png', pre_processed)
print(output)
Image is pretty clear but returns either 13500 or 18500, but no amount of tinkering returns the 7 correctly. Is there a better way to go at it or am I forgetting something?
EDIT:
I managed to get better results after I converted the yellow (gray after grayscale conversion) to black, to fill the numbers. I added the conversion code to the codeblock.
Before:
This was the original result before
After:
This is the result now
Problem is that pytesseract still returns that 7 as 1 every time. I don't think I can make that 7 more like 7 from this.. what to do?
Not sure how general this solution will be, but if all of your pictures are like this one a threshold of 103 will work:
image = cv2.imread('price.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
threshold = 103
_, img_binarized = cv2.threshold(gray, threshold, 255, cv2.THRESH_BINARY)
print(pytesseract.image_to_string(img_binarized, config='--dpi 300 --psm 6 --oem 1 -c tessedit_char_whitelist=0123456789').strip())
gives 78500 on my machine.

Categories