I have read mountains of posts on pytesseract, but I cannot get it to read text off a dead simple image; It returns an empty string.
Here is the image:
I have tried scaling it, grayscaling it, and adjusting the contrast, thresholding, blurring, everything it says in other posts, but my problem is that I don't know what the OCR wants to work better. Does it want blurry text? High contrast?
Code to try:
import pytesseract
from PIL import Image
print pytesseract.image_to_string(Image.open(IMAGE FILE))
As you can see in my code, the image is stored locally on my computer, hence Image.open()
Trying something along the lines of
import pytesseract
from PIL import Image
import requests
import io
response = requests.get('https://i.stack.imgur.com/J2ojU.png')
img = Image.open(io.BytesIO(response.content))
text = pytesseract.image_to_string(img, lang='eng', config='--psm 7')
print(text)
with --psm values equal or larger than 6 did yield "Gm" for me.
If the image is stored locally (and in your working directory), just drop the response variable and change the definition of text with the lines
image_name = "J2ojU.png" # or whatever appropriate
text = pytesseract.image_to_string(Image.open(image_name), lang='eng', config='--psm 7')
There are several reasons:
Edges are not sharp and continuous (By sharp I mean smooth, not with teeth)
Image is too small, you need to resize
Font is missing (not mandatory, but trained font incredibly improve possibility of recognition)
Based on points 1) and 2) I was able to recognize text.
1) I resized image 3x and 2) I blurred the image to make edges smooth
import pytesseract
import cv2
import numpy as np
import urllib
import requests
pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract'
from PIL import Image
def url_to_image(url):
resp = urllib.request.urlopen(url)
image = np.asarray(bytearray(resp.read()), dtype="uint8")
image = cv2.imdecode(image, cv2.IMREAD_COLOR)
return image
url = 'https://i.stack.imgur.com/J2ojU.png'
img = url_to_image(url)
retval, img = cv2.threshold(img,200,255, cv2.THRESH_BINARY)
img = cv2.resize(img,(0,0),fx=3,fy=3)
img = cv2.GaussianBlur(img,(11,11),0)
img = cv2.medianBlur(img,9)
cv2.imshow('asd',img)
cv2.waitKey(0)
cv2.destroyAllWindows()
txt = pytesseract.image_to_string(img)
print('recognition:', txt)
>> recognition: Gm
Note:
This script is good for testing any image on web
Note 2:
All processing is based on your posted image
Note 3:
Text recognition is not easy. Every recognition requires special processing. If you try this steps with different image, it may not work at all. Important is to try a lot of recognition on images so you understand what tesseract wants
Related
I want to read the text from an image.
I use pytesseract in Python.
Here is my code:
import pytesseract
from PIL import Image
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
image = Image.open(r'a.jpg')
image.resize((150, 50),Image.ANTIALIAS).save("pic.jpg")
image = Image.open("pic.jpg")
captcha = pytesseract.image_to_string(image).replace(" ", "").replace("-", "").replace("$", "")
image
However, it returns empty string.
What should be the correct way?
Thanks.
i agree with #Jon Betts
tesseract is not very strong in OCR, only good in binary cases with right settings
CAPTCHAs ment to fool OCRs!
but if you really need to do it, you need to come up with the manual procedure for it,
i created the code below specifically for the type of CAPTCHAs that you gave (but its completely rigid and is not generalized/optimized for all cases)
psudo code
apply median blur
apply a threshold to get Blue colors only (binary image output from this stage)
apply opening to reduce small white pixels in binary image
give the image to tesseract with options:
limited whitelist of output chars
OEM 3 : tesseract + cube
PSM 8 : one word per image
Code
from PIL import Image
import pytesseract
import numpy as np
import cv2
img = cv2.imread('a.jpg')
img = cv2.medianBlur(img, 3)
# extract blue parts
img2 = np.zeros((img.shape[0], img.shape[1]), dtype=np.uint8)
cond = np.bitwise_and(img[:, :, 0] >= 100, img[:, :, 2] < 100)
img2[np.where(cond)] = 255
img = img2
# delete the noise
kernel = cv2.getStructuringElement(cv2.MORPH_CROSS, (3, 3))
img = cv2.morphologyEx(img, cv2.MORPH_OPEN, kernel)
str1 = pytesseract.image_to_string(Image.fromarray(img),
config='-c tessedit_char_whitelist=abcedfghijklmnopqrtuvwxyz0123456789 -oem 3 -psm 8')
cv2.imwrite("frame.png", img)
print(str1)
output
f2e4
image
in order to see full options of tesseract, type the following command tesseract --help-extra or refere to this_link
Tesseract is intended for performing OCR on text documents. In my experience it's good but a bit patchy even with very clean data.
In this case it appears you are trying to solve a CAPTCHA which is specifically designed to defeat OCR software. It's very likely you cannot use Tesseract to solve this issue, because:
It's not really designed for that
The scenario is adversarial:
The example is specifically designed to prevent what you are trying to do
If you could get it to work, the other party would likely change it to break again
If you want to proceed I would suggest:
Working on cleaning up the image before attempting to process it (can you get a nice readable black and white image?)
Train your own recognition network using a lot of examples
I want to extract text from the image:
I have tried using the below code to extract the text:
from PIL import Image
import pytesseract
img = "Offers.png"
tex = pytesseract.image_to_string(Image.open(img))
string = pytesseract.image_to_string(Image.open(img), config='--psm 6')
I could not extract text. tex variable returns an empty string, whereas the string variable returns a line of text.
What can I do to extract the complete text from the pamphlet image?
EDIT 1:
Since the previously provided image was low quality, I'm now providing some random images from google images with comparatively better quality.
Now when I try to implement the same code above to extract the text, I'm unable to extract the complete text.
EDIT 2:
img = cv2.imread('sale-banner-template-design_74379-121.jpg',0)
thesh, im_bw = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)
up_image = cv2.resize(img,None,fx=2,fy=3,interpolation=cv2.INTER_LINEAR)
t = pytesseract.image_to_string(up_image)
Removing colour, unnessary input and upscaling the image size. This helps tesseract a significant amount. You can do all of this with PIL and its various modules
I am using pytesseract, pillow,cv2 to OCR an image and get the text present in the image. Since my input is a scanned PDF document, I first converted it into an image (JPEG) format and then tried extracting the text. I am only half way there. The input is a table and the titles are not being displayed, since the titles have a black background. I also tried getstructuringelement but unable to figure out a way Here is what I did-
import cv2
import os
import numpy as np
import pytesseract
#import pillow
#Since scanned PDF can't be handled by pdf2image, convert the scanned PDF into a JPEG format using the below code-
filename = path
from pdf2image import convert_from_path
pages = convert_from_path(filename, 500) for page in pages:
page.save("dest", 'JPEG')
imgname = "path"
oriimg = cv2.imread(imgname,cv2.IMREAD_COLOR)
cv2.imshow("original image", oriimg)
cv2.waitKey(0)
#img = cv2.resize(oriimg,None,fx=0.5,fy=0.5,interpolation=cv2.INTER_CUBIC)
img = cv2.resize(oriimg,(700,1500),interpolation=cv2.INTER_AREA)
#here length height
cv2.imshow("lol", img)
cv2.waitKey(0)
cv2.imwrite("changed_dimensionsimgpath", img)
import PIL.Image
image = cv2.imread(imgname,cv2.IMREAD_COLOR)
grayedimg = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) grayedimg =
cv2.threshold(grayedimg, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
cv2.imwrite("H://newim.jpg", grayedimg)
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files (x86)\Tesseract-
OCR\tesseract.exe"
text = pytesseract.image_to_string(PIL.Image.open("path"))
print(text)
My input table looks like below. The regions which have black background are not being identified by OCR and not being extracted as text. --
I have 3 possible ways from an image-analysis perspective
Splitting
You can split the images in two part. First part is just your normal flow (load image, detect text on it). The second flow you first take the negative of the image (255 - img) and than detect text.
The two results will need to be merged afterwards.
difference filter
You can first apply a difference filter/edge detection this will high everything with a high contrast BUT can alter the shape of the letters if done to extreme or if some letters are way bigger.
contour finding + filling
Again an edge detection but now very thin and followed with an contour detection. This will redraw all letter in one color.
I've been using OpenCV methods to get images from my camera. I'd like to decode QR codes from those images using the zbar library, but after I convert the images to PIL to be processed by zbar, it doesn't seem like the decoding is working.
import cv2.cv as cv
import zbar
from PIL import Image
cv.NamedWindow("camera", 1)
capture = cv.CaptureFromCAM(0)
while True:
img = cv.QueryFrame(capture)
cv.ShowImage("camera", img)
if cv.WaitKey(10) == 27:
break
# create a reader
scanner = zbar.ImageScanner()
# configure the reader
scanner.parse_config('enable')
# obtain image data
pil = Image.fromstring("L", cv.GetSize(img), img.tostring())
width, height = pil.size
raw = pil.tostring()
# wrap image data
image = zbar.Image(width, height, 'Y800', raw)
# scan the image for barcodes
scanner.scan(image)
# extract results
for symbol in image:
# do something useful with results
print 'decoded', symbol.type, 'symbol', '"%s"' % symbol.data
cv.DestroyAllWindows()
You don't need to convert the OpenCV image to a PIL image in order to use it with zbar... you can go straight from an OpenCV image to zbar and avoid using PIL completely.
Now I don't know how you do this when the image source is from a camera, but if you load an image from disk, all you have to do is this:
cv_img = cv.LoadImageM(image, cv.CV_LOAD_IMAGE_GRAYSCALE)
width = cv_img.width
height = cv_img.height
raw = cv_img.tostring()
# wrap image data
image = zbar.Image(width, height, 'Y800', raw)
# scan the image for barcodes
scanner.scan(image)
# extract results
for symbol in image:
# do something useful with results
print 'decoded', symbol.type, 'symbol', '"%s"' % symbol.data
It appears you basically have to do the following for this to work:
Use LoadImageM instead of LoadImage
Make sure the image is grayscale so use cv.CV_LOAD_IMAGE_GRAYSCALE
If you are working in python I suggest you take a look at SimpleCV. You can either crib our implementation of bar code reading or use the library yourself. Here is the source for pulling barcodes out with zbar.
When loading a png image with PIL and OpenCV, there is a color shift. Black and white remain the same, but brown gets changed to blue.
I can't post the image because this site does not allow newbies to post images.
The code is written as below rather than use cv.LoadImageM, because in the real case the raw image is received over tcp.
Here is the code:
#! /usr/bin/env python
import sys
import cv
import cv2
import numpy as np
import Image
from cStringIO import StringIO
if __name__ == "__main__":
# load raw image from file
f = open('frame_in.png', "rb")
rawImage = f.read()
f.close()
#convert to mat
pilImage = Image.open(StringIO(rawImage));
npImage = np.array(pilImage)
cvImage = cv.fromarray(npImage)
#show it
cv.NamedWindow('display')
cv.MoveWindow('display', 10, 10)
cv.ShowImage('display', cvImage)
cv. WaitKey(0)
cv.SaveImage('frame_out.png', cvImage)
How can the color shift be fixed?
OpenCV's images have color channels arranged in the order BGR whereas PIL's is RGB. You will need to switch the channels like so:
import PIL.Image
import cv2
...
image = np.array(pilImage) # Convert PIL Image to numpy/OpenCV image representation
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR) # You can use cv2.COLOR_RGBA2BGRA if you are sure you have an alpha channel. You will only have alpha channel if your image format supports transparency.
...
#Krish: Thanks for pointing out the bug. I didn't have time to test the code the last time.
Hope this helps.
Change
pilImage = Image.open(StringIO(rawImage))
to
pilImage = Image.open(StringIO(rawImage)).convert("RGB")
Light alchemist's answer did not work, but it did explain the issue. Wouldn't the reverse be screwed up by the Apha channel, i.e. it changes BRGA to AGRB. I would think Froyo's answer would solve it, but it did not change the displayed image at all. What did work was reversing the colors in OpenCV. I'm too much of a newbie to know why. They seem equivalent to me. Reversing the colors in numpy would be preferred as additional processing is planned in numpy. But thanks for the help, the answers steered me in the right direction.
pilImage = Image.open(StringIO(rawImage));
bgrImage = np.array(pilImage)
cvBgrImage = cv.fromarray(bgrImage)
# Reverse BGR
cvRgbImage = cv.CreateImage(cv.GetSize(cvBgrImage),8,3)
cv.CvtColor(cvBgrImage, cvRgbImage, cv.CV_BGR2RGB)
#show it
cv.ShowImage('display', cvRgbImage)
cv. WaitKey(30) # ms to allow display