Why does reading text from image using pytesseract doesnt work? - python

Here is my code:
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'F:\Installations\tesseract'
print(pytesseract.image_to_string('images/meme1.png', lang='eng'))
And here is the image:
And the output is as follows:
GP.
ed <a
= va
ay Roce Thee .
‘ , Pe ship
RCAC Tm alesy-3
Pein Reg a
years —
? >
ee bs
I see the word years in the output so it does recognize the text but why doesn't it recognize it fully?

OCR is still a very hard problem in cluttered scenes. You probably won't get better results without doing some preprocessing on the image. In this specific case it makes sense to threshold the image first, to only extract the white regions (i.e. the text). You can look into opencv for this: https://docs.opencv.org/3.4/d7/d4d/tutorial_py_thresholding.html
Additionally, in your image, there are only two lines of text in arbitrary positions, so it might make sense to play around with page segmentation modes: https://github.com/tesseract-ocr/tesseract/issues/434

Related

Pytesseract can not recognize even very simple textline

Binary image B2
Binary image Y2
I think these images are quite simple and clear. Still pytesseract does not work. I really wonder why.
Here is my code
from pytesseract import pytesseract as tesseract
import cv2 as cv
binary = cv.imread(filepath)
lang = 'eng'
config = 'tessedit_char_whitelist=RGB123'
print(tesseract.image_to_string(binary, lang=lang, config=config))
The output is just blank string.
To Dennlinger's point, I would definitely rotate it before sending it through PyTess. PyTess should rotate it automatically though. Should.
Alternatively, I see in your configuration that you have white listed "RGB123" which, correct me if I'm wrong, may mean that PyTess is mainly looking for those specific numbers and characters.
I'd try changing your configuration by omiting that configuration so that it can pick up the "Y" in there.

Tesseract OCR having trouble detecting numbers

I am trying to detect some numbers with tesseract in python. Below you will find my starting image and what I can get it down to. Here is the code I used to get it there.
import pytesseract
import cv2
import numpy as np
pytesseract.pytesseract.tesseract_cmd = "C:\\Users\\choll\\AppData\\Local\\Programs\\Tesseract-OCR\\tesseract.exe"
image = cv2.imread(r'64normalwart.png')
lower = np.array([254, 254, 254])
upper = np.array([255, 255, 255])
image = cv2.inRange(image, lower, upper)
image = cv2.bitwise_not(image)
#Uses a language that should work with minecraft text, I have tried with and without, no luck
text = pytesseract.image_to_string(image, lang='mc')
print(text)
cv2.imwrite("Wartthreshnew.jpg", image)
cv2.imshow("Image", image)
cv2.waitKey(0)
I end up with black numbers on a white background which seems pretty good but tesseract can still not detect the numbers. I also noticed the numbers were pretty jagged but I don't know how to fix that. Does anyone have recommendations for how I could make tesseract be able to recognize these numbers?
Starting Image
What I end up with
Your problem is with the page segmentation mode. Tesseract segments every image in a different way. When you don't choose an appropriate PSM, it goes for mode 3, which is automatic and might not be suitable for your case. I've just tried your image and it works perfectly with PSM 6.
df = pytesseract.image_to_string(np.array(image),lang='eng', config='--psm 6')
These are all PSMs availabe at this moment:
0 Orientation and script detection (OSD) only.
1 Automatic page segmentation with OSD.
2 Automatic page segmentation, but no OSD, or OCR.
3 Fully automatic page segmentation, but no OSD. (Default)
4 Assume a single column of text of variable sizes.
5 Assume a single uniform block of vertically aligned text.
6 Assume a single uniform block of text.
7 Treat the image as a single text line.
8 Treat the image as a single word.
9 Treat the image as a single word in a circle.
10 Treat the image as a single character.
11 Sparse text. Find as much text as possible in no particular order.
12 Sparse text with OSD.
13 Raw line. Treat the image as a single text line,
bypassing hacks that are Tesseract-specific.
Use the pytesseract.image_to_string(img, config='--psm 8') or try diffrent configs to see if the image will get recognized. Useful link here Pytesseract OCR multiple config options

EasyOCR not recognizing simple numbers

I am trying to analyze a page footer in a video and retrieve the current page number. I got the frame collection working but I am struggling on reading the page number itself, using EasyOCR.
I already tried using pytesseract, but that doesnt work well. I have misinterpreted numbers: 10 gets recognized as 113, 6 as 41 and so on. Overall its very inconsistent, even though I format my input image correctly with grayscale, threshholding and cropping (only analyzing the pagenumber area of the footer).
Here is the code:
def getPageNumberTest(path, psm):
image = cv2.imread(path)
height = len(image)
width = len(image[0])
# the height of the footer
footerHeight = 90 # int(height / 15.5)
# retrieve only the footer from the image
cropped = image[height-footerHeight:height,0:width]
results = reader.readtext(cropped)
Which gives me the following output:
Is there a setting I am missing? Is there a way to instruct EasyOCR to look for numbers only?
Any help or hint is appreciated!
EDIT:
After some fiddling around with some optimizations of the number-images, I am now back to the beginning, not optimizing the images at all. All thats left is the conversion to gray-scale and a resize.
This is what a normal input looks like:
But the results are:
Which is weird, because for most numbers (especially for single digits) this works flawlessly, yielding certainties of over 95%...
I tried deblurring, threshholding, denoising with cv2.filter2D(), blurring,...
When I use threshholding, for example, my output looks like this (ignoring the "1", same applies for the single digit "1"):
I had a look into pattern matching, which isnt an option because I don't know the pagenumber shape beforehand...
txt = pytesseract.image_to_string(final_image, config='--psm 13 --oem 3 -c tessedit_char_whitelist=0123456789')
According to my tests, PaddleOCR works better than easyOCR in most scenes.

Python - Improving Tesseract OCR to recognize list of names

I'm working on a project that will recognize teams in a game (Overwatch) and record which players were on which team. It has a predefined list of who is playing, it only needs to recognize which image they are located on. So far I have had success in capturing the images for each team and getting a rough output as to the name for each player, however, it is getting several letters confused.
My input images:
And the output I get from OCR:
W THEMIGHTVMRT
ERSVZENVRTTR
ERSVLUCID
ERSVZRRVR
ERSVMEI
EFISVSDMBRR
ERSV RNR
ERSVZENVRTTR
EFISVZHRVR
ERSVMCCREE
ERSVMEI
EHSVRDRDHDG
From this, you can see that the OCR confuses "A" with "R" and "Y" with "V". I was able to get the font file that Overwatch uses and generate a .traineddata file using Train Your Tesseract - I'm aware that there is probably a better way of generating this file, though I'm not sure how.
My code:
from pytesseract import *
import pyscreenshot
pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract'
tessdata_dir_config = '--tessdata-dir "C:\\Program Files (x86)\\Tesseract-OCR\\tessdata"'
team1 = pyscreenshot.grab(bbox=(50,450,530,810)) # X1, Y1, X2, Y2
team1.save("team1screenshot.png")
team1text = pytesseract.image_to_string(team1, config=tessdata_dir_config, lang='owf')
team2 = pyscreenshot.grab(bbox=(800,450,1280,810)) # X1, Y1, X2, Y2
team2.save("team2screenshot.png")
team2text = pytesseract.image_to_string(team2, config=tessdata_dir_config, lang='owf')
print(team1text)
print("------------------")
print(team2text)
How should I improve the recognition of these characters? Do I need a better .traineddata file, or is it regarding better image processing?
Thanks for any help!
As #FlorianBrucker mentioned, doing a similarity test on the strings allows (with some fine tuning) the ability to find the correct string after the OCR level.
You could try custom OCR configs to do a sparse text search, "Find as much text as possible in no particular order."
SET psm to 11 in tesseract configs
See if you can do this:
tessdata_dir_config = "--oem 3 --psm 11"
To see a complete list of supported page segmentation modes (psm), use tesseract -h. Here's the list as of 3.21:
Orientation and script detection (OSD) only.
Automatic page segmentation with OSD.
Automatic page segmentation, but no OSD, or OCR.
Fully automatic page segmentation, but no OSD. (Default)
Assume a single column of text of variable sizes.
Assume a single uniform block of vertically aligned text.
Assume a single uniform block of text.
Treat the image as a single text line.
Treat the image as a single word.
Treat the image as a single word in a circle.
Treat the image as a single character.
Sparse text. Find as much text as possible in no particular order.
Sparse text with OSD.
Raw line. Treat the image as a single text line, bypassing hacks that are Tesseract-specific.
I'm using python wrapper for Tesseract https://github.com/madmaze/pytesseract
Here you can configure tesseract as:
custom_oem_psm_config = r'--oem 3 --psm 6'
pytesseract.image_to_string(image, config=custom_oem_psm_config)

Image processing 1 bit images

I am trying to understand the basics for image processing an image of a corridor. I have have used PIL to convert find the edges in an image, then I have converted it to a 1 bit image. I know what I want to be able to extract - the longest horizontal and diagonal lines that can be found in the image. Any ideas?
from PIL import *
import Image, ImageFilter
im = im.open("c:\Python26\Lib\site-packages\PIL\corridor.jpg")
imageInfo=list(im.getdata())
im.putdata(imageInfo)
print pic.size
for i in imageInfo2[180:220]:
if i==0:
print "This is a BLACK pixel"
elif i==255:
print "This is a WHITE pixel"
else:
print "ERROR"
First don't call them 1 bit images - that normally refers to images (like icons) where each pixel is 1bit and so 8pixels can be packed into a single byte.
Images with only two levels are normally called 'binary' in image processing.
Now you only have to learn the science of image processing !
A good place to start is opencv a free image processing library that also works with python and interfaces reasonably well with PIL.
You shoudl also read their book - or one of the other good books on image processing

Categories