I am trying to analyze a page footer in a video and retrieve the current page number. I got the frame collection working but I am struggling on reading the page number itself, using EasyOCR.
I already tried using pytesseract, but that doesnt work well. I have misinterpreted numbers: 10 gets recognized as 113, 6 as 41 and so on. Overall its very inconsistent, even though I format my input image correctly with grayscale, threshholding and cropping (only analyzing the pagenumber area of the footer).
Here is the code:
def getPageNumberTest(path, psm):
image = cv2.imread(path)
height = len(image)
width = len(image[0])
# the height of the footer
footerHeight = 90 # int(height / 15.5)
# retrieve only the footer from the image
cropped = image[height-footerHeight:height,0:width]
results = reader.readtext(cropped)
Which gives me the following output:
Is there a setting I am missing? Is there a way to instruct EasyOCR to look for numbers only?
Any help or hint is appreciated!
EDIT:
After some fiddling around with some optimizations of the number-images, I am now back to the beginning, not optimizing the images at all. All thats left is the conversion to gray-scale and a resize.
This is what a normal input looks like:
But the results are:
Which is weird, because for most numbers (especially for single digits) this works flawlessly, yielding certainties of over 95%...
I tried deblurring, threshholding, denoising with cv2.filter2D(), blurring,...
When I use threshholding, for example, my output looks like this (ignoring the "1", same applies for the single digit "1"):
I had a look into pattern matching, which isnt an option because I don't know the pagenumber shape beforehand...
txt = pytesseract.image_to_string(final_image, config='--psm 13 --oem 3 -c tessedit_char_whitelist=0123456789')
According to my tests, PaddleOCR works better than easyOCR in most scenes.
Related
I am trying to detect some numbers with tesseract in python. Below you will find my starting image and what I can get it down to. Here is the code I used to get it there.
import pytesseract
import cv2
import numpy as np
pytesseract.pytesseract.tesseract_cmd = "C:\\Users\\choll\\AppData\\Local\\Programs\\Tesseract-OCR\\tesseract.exe"
image = cv2.imread(r'64normalwart.png')
lower = np.array([254, 254, 254])
upper = np.array([255, 255, 255])
image = cv2.inRange(image, lower, upper)
image = cv2.bitwise_not(image)
#Uses a language that should work with minecraft text, I have tried with and without, no luck
text = pytesseract.image_to_string(image, lang='mc')
print(text)
cv2.imwrite("Wartthreshnew.jpg", image)
cv2.imshow("Image", image)
cv2.waitKey(0)
I end up with black numbers on a white background which seems pretty good but tesseract can still not detect the numbers. I also noticed the numbers were pretty jagged but I don't know how to fix that. Does anyone have recommendations for how I could make tesseract be able to recognize these numbers?
Starting Image
What I end up with
Your problem is with the page segmentation mode. Tesseract segments every image in a different way. When you don't choose an appropriate PSM, it goes for mode 3, which is automatic and might not be suitable for your case. I've just tried your image and it works perfectly with PSM 6.
df = pytesseract.image_to_string(np.array(image),lang='eng', config='--psm 6')
These are all PSMs availabe at this moment:
0 Orientation and script detection (OSD) only.
1 Automatic page segmentation with OSD.
2 Automatic page segmentation, but no OSD, or OCR.
3 Fully automatic page segmentation, but no OSD. (Default)
4 Assume a single column of text of variable sizes.
5 Assume a single uniform block of vertically aligned text.
6 Assume a single uniform block of text.
7 Treat the image as a single text line.
8 Treat the image as a single word.
9 Treat the image as a single word in a circle.
10 Treat the image as a single character.
11 Sparse text. Find as much text as possible in no particular order.
12 Sparse text with OSD.
13 Raw line. Treat the image as a single text line,
bypassing hacks that are Tesseract-specific.
Use the pytesseract.image_to_string(img, config='--psm 8') or try diffrent configs to see if the image will get recognized. Useful link here Pytesseract OCR multiple config options
I'm trying to read some entries from a table of data filled with a name and then columns of numbers. Here's the original picture:
Between binarizing, converting to black/white, and just inverting, I found that inverting the image led to the best results.
image = PIL.ImageOps.invert(image
This lets me process roughly 90%+ of the columns I have as I scroll down to more images, but I'm still failing on a bunch of them. Sometimes, the parenthesis in the columns merge the two numbers I have in each column. Is there any way I can fix issues with parenthesis being mixed with numbers, or maybe remove all of the green text?
Resizing the image seemed to be the option that fixed the problems.
image = pyautogui.screenshot(region=(550, 354, 964, 552))
width, height = image.size
image = image.resize((args.resize*width, args.resize*height))
I resized to at least 3x the original size. I guess that increased the distance between characters, making it simpler to recognize the end of one digit and the parenthesis that followed.
Alternatively, the following is an even larger improvement:
image = cv2.imread(output)
image = cv2.bitwise_not(image)
image = cv2.resize(image, None, fx=1.5, fy=1.7,
interpolation=cv2.INTER_CUBIC) # scale
cv2.imwrite(output, image
The scaling is not linear and if you skew it a bit it works better.
I'm working on a project that will recognize teams in a game (Overwatch) and record which players were on which team. It has a predefined list of who is playing, it only needs to recognize which image they are located on. So far I have had success in capturing the images for each team and getting a rough output as to the name for each player, however, it is getting several letters confused.
My input images:
And the output I get from OCR:
W THEMIGHTVMRT
ERSVZENVRTTR
ERSVLUCID
ERSVZRRVR
ERSVMEI
EFISVSDMBRR
ERSV RNR
ERSVZENVRTTR
EFISVZHRVR
ERSVMCCREE
ERSVMEI
EHSVRDRDHDG
From this, you can see that the OCR confuses "A" with "R" and "Y" with "V". I was able to get the font file that Overwatch uses and generate a .traineddata file using Train Your Tesseract - I'm aware that there is probably a better way of generating this file, though I'm not sure how.
My code:
from pytesseract import *
import pyscreenshot
pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract'
tessdata_dir_config = '--tessdata-dir "C:\\Program Files (x86)\\Tesseract-OCR\\tessdata"'
team1 = pyscreenshot.grab(bbox=(50,450,530,810)) # X1, Y1, X2, Y2
team1.save("team1screenshot.png")
team1text = pytesseract.image_to_string(team1, config=tessdata_dir_config, lang='owf')
team2 = pyscreenshot.grab(bbox=(800,450,1280,810)) # X1, Y1, X2, Y2
team2.save("team2screenshot.png")
team2text = pytesseract.image_to_string(team2, config=tessdata_dir_config, lang='owf')
print(team1text)
print("------------------")
print(team2text)
How should I improve the recognition of these characters? Do I need a better .traineddata file, or is it regarding better image processing?
Thanks for any help!
As #FlorianBrucker mentioned, doing a similarity test on the strings allows (with some fine tuning) the ability to find the correct string after the OCR level.
You could try custom OCR configs to do a sparse text search, "Find as much text as possible in no particular order."
SET psm to 11 in tesseract configs
See if you can do this:
tessdata_dir_config = "--oem 3 --psm 11"
To see a complete list of supported page segmentation modes (psm), use tesseract -h. Here's the list as of 3.21:
Orientation and script detection (OSD) only.
Automatic page segmentation with OSD.
Automatic page segmentation, but no OSD, or OCR.
Fully automatic page segmentation, but no OSD. (Default)
Assume a single column of text of variable sizes.
Assume a single uniform block of vertically aligned text.
Assume a single uniform block of text.
Treat the image as a single text line.
Treat the image as a single word.
Treat the image as a single word in a circle.
Treat the image as a single character.
Sparse text. Find as much text as possible in no particular order.
Sparse text with OSD.
Raw line. Treat the image as a single text line, bypassing hacks that are Tesseract-specific.
I'm using python wrapper for Tesseract https://github.com/madmaze/pytesseract
Here you can configure tesseract as:
custom_oem_psm_config = r'--oem 3 --psm 6'
pytesseract.image_to_string(image, config=custom_oem_psm_config)
I am trying to get the Python 2.7 PIL Library to work with JPEG images that are only available as a stream coming from a HDD image and are not complete.
I have set the option:
ImageFile.LOAD_TRUNCATED_IMAGES = True
And load the stream as far as it is available (or better said: as far as I am 100% sure that this data is still a image, not some other file type). I have tested different things and as far as I can tell (for JPEGs) PIL only accepts it as a valid JPEG Image if it finds the 0xFFDA (Start of Scan Marker). This is a short example of how I load the data:
from PIL import Image
from StringIO import StringIO
ImageFile.LOAD_TRUNCATED_IMAGES = True
with open("/path/to/image.raw", 'rb') as fp:
fp.seek("""jump to position in image where JPEG starts""")
data = fp.read("""number of bytes I know that those belong to that jpeg""")
img = Image.open(StringIO(data)) # This would throw exception if the data does
# not contain the 0xffda marker
pixel = img.load() # Would throw exception if LOAD_TRUNCATED_IMAGES = false
height,width = img.size
for i in range(height):
for j in range(width):
print pixel[i,j]
On the very last line I expected (or hoped) to see at least the read pixel data to be displayed. But for every pixel it returns (0,0,0).
The Question: Is what I am trying here not possible with PIL?
Some weeks ago I tried the same with a image file I truncated myself, simply by cutting data from it with an editor. It worked for the pixel-data that was available. As soon as it reached a pixel that I cut off, the program threw an exception (I will try this again later today to make sure that I am not remembering wrong).
If somebody is wondering why I am doing this: I need to make sure that the image/picture inside that hdd image is in consecutive blocks/clusters and is not fragmented. To make sure of this I wanted to use pixel matching.
EDIT:
I have tried it again and this is what I have seen.
I opened a truncated image in GIMP and it showed me a few pixel lines in the upper part, but PIL was not able to at least give me the RGB values of those pixels. It always returns (0,0,0).
I made the image slightly bigger such that the lower 4/5 of the image was not visible, but that was enough for PIL to show me the RGB values that were available. Everything else was (0,0,0).
I am still not 100% sure whether PIL can show me the RGB values, even if only view pixel-data is available.
I would try it with an uncompressed format like TGA. JPG being a compressed format may not make any sense to extract pixels from an incomplete image. JPEG actually stores the parameters for equations that describe the image, not pixel values. When you query a JPEG for a pixel value it evaluates the equations at that point and returns the result.
I have the same problem with Pillow==9.2.0
Let's downgrade to Pillow==8.3.2 and it works.
I don't really know about streaming, but I think that you simply cannot access rgb value the way you do.
Try:
rgb_im = img.convert('RGB')
r, g, b = rgb_im.getpixel((i, j))
I'm trying to convert images taken from a capture (webcam) and do some processing on them with OpenCV, but I'm having a difficult time..
When trying to convert the image to grayscale, the program crashes. (Python.exe has stopped working)
Here is the main snippet of my code:
newFrameImageGS = cv.CreateImage ((320, 240), cv.IPL_DEPTH_8U, 1)
for i in range(0,5):
newFrameImage = cv.QueryFrame(ps3eye)
cv.CvtColor(newFrameImage,newFrameImageGS,cv.CV_BGR2GRAY)
golfSwing.append(newFrameImageGS)
When I try using cvConvertScale I get the assertion error:
src.size() == dst.size() && src.channels() == dst.channels()
which makes sense, but I'm pretty confused on how to go about converting the input images of my web cam into images that can be used by functions like cvUpdateMotionHistory() and cvCalcOpticalFlowLK()
Any ideas? Thanks.
UPDATE:
I converted the image to grayscale manually with this:
for row in range(0,newFrameImage.height):
for col in range(0,newFrameImage.width):
newFrameImageGS[row,col] = (newFrameImage8U[row,col][0] * 0.114 + # B
newFrameImage8U[row,col][1] * 0.587 + # G
newFrameImage8U[row,col][2] * 0.299) # R
But this takes quite a while.. and i still can't figure out why cvCvtColor is causing the program to crash.
For some reason, CvtColor caused the program to crash when the image depths where 8 bit. When I converted them to 32 bit, the program no longer crashed and everything seemed to work OK. I have no idea why this is, but at least it works now.
newFrameImage = cv.QueryFrame(ps3eye)
newFrameImage32F = cv.CreateImage((320, 240), cv.IPL_DEPTH_32F, 3)
cv.ConvertScale(newFrameImage,newFrameImage32F)
newFrameImageGS_32F = cv.CreateImage ((320,240), cv.IPL_DEPTH_32F, 1)
cv.CvtColor(newFrameImage32F,newFrameImageGS_32F,cv.CV_RGB2GRAY)
newFrameImageGS = cv.CreateImage ((320,240), cv.IPL_DEPTH_8U, 1)
cv.ConvertScale(newFrameImageGS_32F,newFrameImageGS)
There is a common mistake here:
You're creating a single image in the newFrameImageGS variable before the loop, then overwrite its contents in the loop, which is then appended to a list. The result will not be what you would expect. The list will contain five references to the same image instance at the end, since only the object reference is appended to the list, no copy of the object made this way. This image will contain the very last frame, so you get five of that frame as a result, which is not what you want, I guess. Please review the Python tutorial if it is not clear for you. You can solve this by moving the first line of the above code into the body of the for loop.
Another possibilities if fixing the above would not help you:
The CvtColor function seems to be the correct one for conversion to grayscale, since it can convert to a different number of channels.
According to this manual the CvtColor function requires a destination image of the same data type as the source. Please double check that newFrameImage is a IPL_DEPTH_8U image.