I am just starting out on python and I am attempting to create a code that does real-time OCR on a portion of my screen. I was certain this code would work, but it just throws me a bunch of tesseract errors. Does the image need to be saved for Tesseract to work? Is there a better OCR library for this task? The OpenCV part works perfectly and displays the image.
import numpy as np
import cv2
from PIL import ImageGrab
import pytesseract
while True:
orig_img = ImageGrab.grab(box)
np_im = np.array(orig_img)
img = cv2.cvtColor(np_im, cv2.COLOR_BGR2GRAY)
text = pytesseract.image_to_string(img)
cv2.imshow('window',img)
if cv2.waitKey(25) & 0xFF == ord('q'):
cv2.destroyAllWindows()
print(text)
I fixed it. I was not aware that I needed to install tesseract in my pc. I also added
im = Image.fromarray(img)
im.save("img.png")
to save the image
Related
I'm tryting to recognize lego bricks from video cam using opencv. It performs extremely bad comparing with just running detect.py in Yolov5. Thus I made some experiments about just recognizing images, and I found using openCV still performs dramatically bad as well, is there any clue? Here are the experiments I did.
This is the result from detect.py by just running
python detect.py --weights runs/train/yolo/weights/best.pt --source legos.jpg
This is the result from openCV by implementing this
import torch
import cv2
import numpy as np
model = torch.hub.load('.', 'custom', path='runs/train/yolo/weights/last.pt', source='local')
cap = cv2.VideoCapture('legos.jpg')
while cap.isOpened():
ret, frame = cap.read()
# Make detections
results = model(frame)
cv2.imshow('YOLO', np.squeeze(results.render()))
if cv2.waitKey(0) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
If I simply do this, it gives a pretty good result
import torch
results = model('legos.jpg')
results.show()
Any genious ideas?
Probably your model is trained with RGB images while opencv is using BGR format. Please try to convert the colour space accordingly. Example:
import torch
import cv2
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')
# read image and convert to RGB
img = cv2.imread('zidane.jpg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# make detections
results = model(img)
# render results and convert back to BGR
results.render()
out = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
cv2.imshow('YOLO', out)
cv2.waitKey(-1)
cv2.destroyAllWindows()
I used Zbar and OpenCV to read the QR code in the image below but both failed to detect it. For ZBar, I use pyzbar library as the python wrapper. There are images that QR is detected correctly and images really similar to the successful ones that fail. My phone camera can read the QR code in the uploaded image which means it is a valid one. Below is the code snippet:
from pyzbar.pyzbar import decode
from pyzbar.pyzbar import ZBarSymbol
import cv2
# zbar
results = decode(cv2.imread(image_path), symbols=[ZBarSymbol.QRCODE])
print(results)
# opencv
qr_decoder = cv2.QRCodeDetector()
data, bbox, rectified_image = qr_decoder.detectAndDecode(cv2.imread(image_path))
print(data, bbox)
What type of pre-processing will help to increase the rate of success for detecting QR codes?
zbar, which does some preprocessing, does not detect the QR code, which you can test running zbarimg image.jpg.
Good binarization is useful here. I got this to work using the kraken.binarization.nlbin() function of the Kraken library. The library is for OCR, but works very well for QR codes, too, by using non-linear processing. The Kraken binarization code is here.
Here is the code for the sample:
from kraken import binarization
from PIL import Image
from pyzbar.pyzbar import decode
from pyzbar.pyzbar import ZBarSymbol
image_path = "image.jpg"
# binarization using kraken
im = Image.open(image_path)
bw_im = binarization.nlbin(im)
# zbar
decode(bw_im, symbols=[ZBarSymbol.QRCODE])
[Decoded(data=b'DE-AAA002065', type='QRCODE', rect=Rect(left=1429, top=361, width=300, height=306), polygon=[Point(x=1429, y=361), Point(x=1429, y=667), Point(x=1729, y=667), Point(x=1723, y=365)])]
The following picture shows the clear image of the QR code after binarization:
I had a similar issue, and Seanpue's answer got me on the right track for this problem. Since I was already using the OpenCV library for image processing rather than PIL, I used it to perform Otsu's Binarization using the directions in an OpenCV tutorial on Image Thresholding. Here's my code:
import cv2
from pyzbar.pyzbar import decode
from pyzbar.pyzbar import ZBarSymbol
image_path = "qr.jpg"
# preprocessing using opencv
im = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
blur = cv2.GaussianBlur(im, (5, 5), 0)
ret, bw_im = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)
# zbar
decode(bw_im, symbols=[ZBarSymbol.QRCODE])
[Decoded(data=b'DE-AAA002065', type='QRCODE', rect=Rect(left=1429, top=362, width=300, height=305), polygon=[Point(x=1429, y=362), Point(x=1430, y=667), Point(x=1729, y=667), Point(x=1724, y=366)])]
Applying the gaussian blur is supposed to remove noise from the picture to make the binarization more effective, but for my application it didn't actually make much difference. What was vital was to convert the image to grayscale to make the threshold function work (done here by opening the file with the cv2.IMREAD_GRAYSCALE flag).
QReader use to work quite well for these cases.
from qreader import QReader
import cv2
if __name__ == '__main__':
# Initialize QReader
detector = QReader()
img = cv2.cvtColor(cv2.imread('92iKG.jpg'), cv2.COLOR_BGR2RGB)
# Detect and Decode the QR
print(detector.detect_and_decode(image=img))
This code output for this QR:
DE-AAA002065
I have read mountains of posts on pytesseract, but I cannot get it to read text off a dead simple image; It returns an empty string.
Here is the image:
I have tried scaling it, grayscaling it, and adjusting the contrast, thresholding, blurring, everything it says in other posts, but my problem is that I don't know what the OCR wants to work better. Does it want blurry text? High contrast?
Code to try:
import pytesseract
from PIL import Image
print pytesseract.image_to_string(Image.open(IMAGE FILE))
As you can see in my code, the image is stored locally on my computer, hence Image.open()
Trying something along the lines of
import pytesseract
from PIL import Image
import requests
import io
response = requests.get('https://i.stack.imgur.com/J2ojU.png')
img = Image.open(io.BytesIO(response.content))
text = pytesseract.image_to_string(img, lang='eng', config='--psm 7')
print(text)
with --psm values equal or larger than 6 did yield "Gm" for me.
If the image is stored locally (and in your working directory), just drop the response variable and change the definition of text with the lines
image_name = "J2ojU.png" # or whatever appropriate
text = pytesseract.image_to_string(Image.open(image_name), lang='eng', config='--psm 7')
There are several reasons:
Edges are not sharp and continuous (By sharp I mean smooth, not with teeth)
Image is too small, you need to resize
Font is missing (not mandatory, but trained font incredibly improve possibility of recognition)
Based on points 1) and 2) I was able to recognize text.
1) I resized image 3x and 2) I blurred the image to make edges smooth
import pytesseract
import cv2
import numpy as np
import urllib
import requests
pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract'
from PIL import Image
def url_to_image(url):
resp = urllib.request.urlopen(url)
image = np.asarray(bytearray(resp.read()), dtype="uint8")
image = cv2.imdecode(image, cv2.IMREAD_COLOR)
return image
url = 'https://i.stack.imgur.com/J2ojU.png'
img = url_to_image(url)
retval, img = cv2.threshold(img,200,255, cv2.THRESH_BINARY)
img = cv2.resize(img,(0,0),fx=3,fy=3)
img = cv2.GaussianBlur(img,(11,11),0)
img = cv2.medianBlur(img,9)
cv2.imshow('asd',img)
cv2.waitKey(0)
cv2.destroyAllWindows()
txt = pytesseract.image_to_string(img)
print('recognition:', txt)
>> recognition: Gm
Note:
This script is good for testing any image on web
Note 2:
All processing is based on your posted image
Note 3:
Text recognition is not easy. Every recognition requires special processing. If you try this steps with different image, it may not work at all. Important is to try a lot of recognition on images so you understand what tesseract wants
I'm trying to read a video file in opencv (python 2.7), and I just copied the example in the opencv tutorial, but nothing happens:
import numpy as np
import cv2
cap = cv2.VideoCapture('input.mp4')
while(cap.isOpened()):
ret, frame = cap.read()
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
cv2.imshow('frame',gray)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
The function cap.isOpened always returns FALSE.I have already tried to use absolute path in the argument of VideoCapture, but I still get the same result. What am I getting wrong?
Maybe your OpenCV version is not properly installed. You can check your build infos with print cv2.getBuildInformation() if there is any weird components.
I would suggest to rebuild it, or install it via Anaconda to be sure not to miss any package.
You need to define video location or move the video where python is installed
Keep the full path of the video file.
For example :-
cap = cv2.VideoCapture("D:\\Video Folder\\input.mp4")
I believe this would solve this issue.
I'm trying to use the Zbar library's QR code detection methods on images I extract with OpenCV's camera methods. Normally the QR code detection methods work with images (jpg, png, etc.) on my computer, but I guess the captured frames of OpenCV are different.
Is there a way of making the captured frame into a PIL Image?
Thank you.
from PIL import Image
import zbar
import cv2.cv as cv
capture = cv.CaptureFromCAM(1)
imgSize = cv.GetSize(cv.QueryFrame(capture))
img = cv.QueryFrame(capture)
#SOMETHING GOES HERE TO TURN FRAME INTO IMAGE
img = img.convert('L')
width, height = img.size
scanner = zbar.ImageScanner()
scanner.parse_config('enable')
zbar_img = zbar.Image(width, height, 'Y800', img.tostring())
# scan the image for barcodes
scanner.scan(zbar_img)
for symbol in zbar_img:
print symbol.data
With the python CV2, you can also do this:
import Image, cv2
cap = cv2.VideoCapture(0) # says we capture an image from a webcam
_,cv2_im = cap.read()
cv2_im = cv2.cvtColor(cv2_im,cv2.COLOR_BGR2RGB)
pil_im = Image.fromarray(cv2_im)
pil_im.show()
I think I may have found the answer. I'll edit later with results.
OpenCV to PIL Image
import Image, cv
cv_im = cv.CreateImage((320,200), cv.IPL_DEPTH_8U, 1)
pi = Image.fromstring("L", cv.GetSize(cv_im), cv_im.tostring())
Source: http://opencv.willowgarage.com/documentation/python/cookbook.html
Are you trying to obtain a RGB image? If that is the case, you need to change your parameters from this:
cv_im = cv.CreateImage((320,200), cv.IPL_DEPTH_8U, 1)
pi = Image.fromstring("L", cv.GetSize(cv_im), cv_im.tostring())
to that:
cv_im = cv.CreateImage((320,200), cv.IPL_DEPTH_8U, 3)
pi = Image.fromstring("RGB", cv.GetSize(cv_im), cv_im.tostring())
since it is documented almost nowhere, but the 'L' parameter of Image.fromstring is for 8-bit B&W images. Besides, you need to change the argument of your cv.CreateImage function from 1 (single channel image) to 3 (3 channels=RGB).
Hope it works for you.
Cheers
A simple way is to directly swap the channels. Suppose you are trying to convert a 3-channel image file between OpenCV format and PIL format. You can just use:
img[...,[0,2]]=img[...,[2,0]]
In this way, you won't be bothered with cv2.cvtColor as this function only works on images with certain depth.