I am trying to improve an image in order to making the text more readable for OCR, but the problem is that some images have some missing pixels and OCR doesn't recognized it.
Here is my code:
import cv2 as cv
import pytesseract
import numpy as np
img = cv.imread("image1.jpeg")
img = cv.cvtColor(img, cv.COLOR_RGB2GRAY)
threshold = cv.adaptiveThreshold(img, 255, cv.ADAPTIVE_THRESH_GAUSSIAN_C, cv.THRESH_BINARY, 57, 13)
x = 255 - threshold
kernel = np.ones((3, 3),np.uint8)
closing = cv.morphologyEx(x, cv.MORPH_CLOSE, kernel)
captcha = pytesseract.image_to_string(closing, config="--psm 3")
print(captcha)
cv.imshow('close', closing)
cv.imshow('thresh', threshold)
cv.waitKey(0)
cv.destroyAllWindows()
This is the original image
This is threshold image
And this is the result after using closing morph
For some reason OCR returns the string le eth g
Any idea of how can I improve my code?
Sometimes preprocessing is not needed for the input images. When I tried the input image you gave:
I used the code:
import cv2 as cv
import pytesseract
img = cv.imread("/home/yns/Downloads/t.jpg")
captcha = pytesseract.image_to_string(img, config="--psm 6")
print(captcha)
and the result comes out as:
TTCo7
which is almos correct. it would be better to keep in mind tesseract is more accurate for the aligned texts so even in some CAPTCHA texts you get successful results it will not work fine at all.
For the reference here is the output of tesseract --version:
tesseract 4.1.3 leptonica-1.78.0 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
Found AVX2
Found AVX
Found FMA
Found SSE
Found libarchive 3.2.2 zlib/1.2.11 liblzma/5.2.2 bz2lib/1.0.6 liblz4/1.7.1
Related
I am trying to read this image and do the arithmetic operation in the image. For some reason i am not able to read 7 because of the font it has. I am relatively new to image processing. Can you please help me with solution. I tried pixeliating the image, but that did not help.
import cv2
import pytesseract
from PIL import Image
img = cv2.imread('modules/visual_basic_math/temp2.png', cv2.IMREAD_GRAYSCALE)
thresh = cv2.threshold(img, 100, 255, cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)[1]
print(pytesseract.image_to_string(img, config='--psm 6'))
Response i am getting is -
+44 849559
+46653% 14
+7776197
+6415995
+*9156346
x4463310
+54Q%433
+1664 20%
Right now, tesseract is a bit outdated. There are much more powerful libraries. I recommend PaddleOCR. To install it:
pip install paddlepaddle
pip install paddleocr
Then:
from paddleocr import PaddleOCR
ocr = PaddleOCR(use_angle_cls=True, lang='es')
predictions = ocr.ocr("ietDJpng")[0]
filtered_text = []
for pred in predictions:
filtered_text.append(pred[-1][0])
filtered_text = [t.replace(" ", "") for t in filtered_text] # Remove spaces
['+4487559', '+4665714', '+7776157', ':6415995', ':9156346', 'x4463310', '-54q7433', '+1664207']
The output is not completely correct (the division symbols are : and one of them is wrong). Also, it confuses a 9 with a q). However, the results are better and the use of the library is as comfortable as tesseract.
Hope it helps!
I'm a programmer, but i have no prior experience with Python or any of its libraries, or even OCR/ALPR as a whole. I have a script that i made (basically copying and paste other scripts for all over the web) that I pretend to use to recognize License Plates. But the reality is my code is very bad right now. It can recognize text in images pretty well, but it sucks to capture license plates. Very rarely i can get a License Plate with it.
So I would like some help on how should I change my code to make it better.
In my code, I simply choose an image, convert it to binary and BW, and try to read it.
I ONLY NEED the string (image_to_string); I do NOT NEED THE IMAGE.
Also is important to note that, as I said, I have no expertise with this code or the functions I'm using.
My code:
from PIL import Image
import pytesseract
import numpy as np
import cv2
image = cv2.imread('redcar.jpg')
image=cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
se=cv2.getStructuringElement(cv2.MORPH_RECT , (8,8))
bg=cv2.morphologyEx(image, cv2.MORPH_DILATE, se)
out_gray=cv2.divide(image, bg, scale=255)
out_binary=cv2.threshold(out_gray, 0, 255, cv2.THRESH_OTSU )[1]
#cv2.imshow('binary', out_binary)
cv2.imwrite('binary.png',out_binary)
#cv2.imshow('gray', out_gray)
cv2.imwrite('gray.png',out_gray)
filename = 'gray.png'
img1 = np.array(Image.open(filename))
text = pytesseract.image_to_string(filename,config ='--psm 6')
print(text)
The image I'm using:
I hope easyocr will be helpfull in such case. You can install easyocr by pip install easyocr with opencv version of opencv-python-4.5.4.60
import easyocr
IMAGE_PATH = 'AQFCB.jpg'
reader = easyocr.Reader(['en'])
result = reader.readtext(IMAGE_PATH)
for detection in result:
if detection[2] > 0.5:
print(detection[1])
the output is
HR.26 BR.9044
My code is:
import cv2,numpy
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe" # For Windows OS
def scan(image):
try:
img = cv2.cvtColor(numpy.array(image), cv2.COLOR_RGB2BGR)
except:
img = cv2.imread(image)
# Apply OCR
data = pytesseract.image_to_string(img, config="-c tessedit"
"_char_whitelist=1234567890"
" --psm 6"
" ")
return data
And when I make it scan this image it just gives me ''. Nothing. I don't know whats wrong, works on every other digit number, what should I change? If you have some python ocr that works on this image, you can also send it.
Using Tesseract or any OCR can get really tricky. The pictures you mentioned worked perfectly might have better quality or are closely related to the dataset version you are using in your code/computer.
Some basic steps you can do to improve this are:
Add a new trained data file that has similar font to the font you are trying to detect
Do some preprocessing on the image, sharpen it, change resolution and color, basically the whole routine till you find the perfect mix
Try a different OCR
Let me know if this works!
Read the documentation, understand what are you doing and you will get the correct result. Hint: pretending that the single character is a uniform block of text is not wise.
Your picture works for me. My guess is that you didn't successfully read the image? You can debug by print(img.shape) or if img is None: print('None'). Python might be operating in a different directory. os.getcwd() gets Pythons current working directory. You can also do os.path.isfile(image) to see if Python can find file where you are looking.
This is what I tried:
import cv2,numpy
import pytesseract
# ~ pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe" # For Windows OS
img = cv2.imread('niner.png')
# Apply OCR
data = pytesseract.image_to_string(img, config="-c tessedit"
"_char_whitelist=1234567890"
" --psm 6"
" ")
print('tesseract version: ', pytesseract.get_tesseract_version())
print('=============================================')
print(data)
and the result is:
tesseract version: 4.0.0.20181030
leptonica-1.76.0
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.2.0
=============================================
9
♀
this should recognize the cars in the image and draw the squares around them, I don't know why it's not working... could someone give me some hints?
code and image with no detection:
IMAGE
import os
from cv2 import cv2
import matplotlib.pyplot as plt
import cvlib as cv
from cvlib.object_detection import draw_bbox
im =cv2.imread("C:\\Users\\gmobi\\PycharmProjects\\ComputerVisionStudent\\imagens\\carros.jpeg")
bbox, label, conf = cv.detect_common_objects(im)
output_image = draw_bbox(im, bbox, label, conf)
plt.imshow(output_image)
plt.show()
print('Number of cars in the image is ' + str(label.count('car')))
Installed packages:
Keras: 2.2.5
cvlib: 0.2.2
opencv-python: 4.1.1.26
tensorflow: 1.14.0
matplotlib: 3.1.1
source code from git:
https://github.com/sabiipoks/blog-posts/blob/master/Count_Number_of_Cars_in_Less_Than_10_Lines_of_Code_Using_Python.ipynb
my settings:
IMAGE
The network that cvlib is using for the object detection is YoloV3. This network expects to get an image in the RGB color space and not BGR color space which is what opencv cv2.imread is returning.
So i think you should convert the color space of the image from BGR to RGB.
im =cv2.imread("C:\\Users\\gmobi\\PycharmProjects\\ComputerVisionStudent\\imagens\\carros.jpeg")
im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
Another reason could be the size of the object in your image. The input size to Yolo is around 411x411 pixels so your image will be resized before it will be inserted to network and after the resize the objects will look to small. Try to crop the image and then insert it.
Im = im[:412,:412,:]
I used Zbar and OpenCV to read the QR code in the image below but both failed to detect it. For ZBar, I use pyzbar library as the python wrapper. There are images that QR is detected correctly and images really similar to the successful ones that fail. My phone camera can read the QR code in the uploaded image which means it is a valid one. Below is the code snippet:
from pyzbar.pyzbar import decode
from pyzbar.pyzbar import ZBarSymbol
import cv2
# zbar
results = decode(cv2.imread(image_path), symbols=[ZBarSymbol.QRCODE])
print(results)
# opencv
qr_decoder = cv2.QRCodeDetector()
data, bbox, rectified_image = qr_decoder.detectAndDecode(cv2.imread(image_path))
print(data, bbox)
What type of pre-processing will help to increase the rate of success for detecting QR codes?
zbar, which does some preprocessing, does not detect the QR code, which you can test running zbarimg image.jpg.
Good binarization is useful here. I got this to work using the kraken.binarization.nlbin() function of the Kraken library. The library is for OCR, but works very well for QR codes, too, by using non-linear processing. The Kraken binarization code is here.
Here is the code for the sample:
from kraken import binarization
from PIL import Image
from pyzbar.pyzbar import decode
from pyzbar.pyzbar import ZBarSymbol
image_path = "image.jpg"
# binarization using kraken
im = Image.open(image_path)
bw_im = binarization.nlbin(im)
# zbar
decode(bw_im, symbols=[ZBarSymbol.QRCODE])
[Decoded(data=b'DE-AAA002065', type='QRCODE', rect=Rect(left=1429, top=361, width=300, height=306), polygon=[Point(x=1429, y=361), Point(x=1429, y=667), Point(x=1729, y=667), Point(x=1723, y=365)])]
The following picture shows the clear image of the QR code after binarization:
I had a similar issue, and Seanpue's answer got me on the right track for this problem. Since I was already using the OpenCV library for image processing rather than PIL, I used it to perform Otsu's Binarization using the directions in an OpenCV tutorial on Image Thresholding. Here's my code:
import cv2
from pyzbar.pyzbar import decode
from pyzbar.pyzbar import ZBarSymbol
image_path = "qr.jpg"
# preprocessing using opencv
im = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
blur = cv2.GaussianBlur(im, (5, 5), 0)
ret, bw_im = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)
# zbar
decode(bw_im, symbols=[ZBarSymbol.QRCODE])
[Decoded(data=b'DE-AAA002065', type='QRCODE', rect=Rect(left=1429, top=362, width=300, height=305), polygon=[Point(x=1429, y=362), Point(x=1430, y=667), Point(x=1729, y=667), Point(x=1724, y=366)])]
Applying the gaussian blur is supposed to remove noise from the picture to make the binarization more effective, but for my application it didn't actually make much difference. What was vital was to convert the image to grayscale to make the threshold function work (done here by opening the file with the cv2.IMREAD_GRAYSCALE flag).
QReader use to work quite well for these cases.
from qreader import QReader
import cv2
if __name__ == '__main__':
# Initialize QReader
detector = QReader()
img = cv2.cvtColor(cv2.imread('92iKG.jpg'), cv2.COLOR_BGR2RGB)
# Detect and Decode the QR
print(detector.detect_and_decode(image=img))
This code output for this QR:
DE-AAA002065