Image Preprocessing for OCR - Tessaract - python

Obviously this image is pretty tough as it is low clarity and is not a real word. However, with this code, I'm detecting nothing close:
import pytesseract
from PIL import Image, ImageEnhance, ImageFilter
image_name = 'NedNoodleArms.jpg'
im = Image.open(image_name)
im = im.filter(ImageFilter.MedianFilter())
enhancer = ImageEnhance.Contrast(im)
im = enhancer.enhance(2)
im = im.convert('1')
im.save(image_name)
text = pytesseract.image_to_string(Image.open(image_name))
print(text)
outputs
, Mdfiaodfiamms
Any ideas here? The image my contrasting function produces is:
Which looks decent? I don't have a ton of OCR experience. What preprocessing would you recommend here? I've tried resizing the image larger, which helps a little bit but not enough, along with a bunch of different filters from PIL. Nothing getting particularly close though

You are right, tesseract works better with higher resolutions so sometimes resizing the image helps - but don't convert to 1 bit.
I got good results converting to grayscale, making it 3 times as large and making the letters a bit brighter:
>>> im = Image.open('j78TY.png')\
.convert('L').resize([3 * _ for _ in im.size], Image.BICUBIC)\
.point(lambda p: p > 75 and p + 100)
>>> pytesseract.image_to_string(im)
'NedNoodleArms'
Check this jupyter notebook:

Related

Using **wand** to reduce image filesize for improved OCR performance?

I'm trying to write use the wand simple MagickWand API binding for Python to extract pages from a PDF, stitch them together into a single longer ("taller") image, and pass that image to Google Cloud Vision for OCR Text Detection. I keep running up against Google Cloud Vision's 10MB filesize limit.
I thought a good way to get the filesize down might be to eliminate all color channels and just feed Google a B&W image. I figured out how to get grayscale, but how can I make my color image into a B&W ("bilevel") one? I'm also open to other suggestions for getting the filesize down. Thanks in advance!
from wand.image import Image
selected_pages = [0,1]
imageFromPdf = Image(filename=pdf_filepath+str(selected_pages), resolution=600)
pages = len(imageFromPdf.sequence)
image = Image(
width=imageFromPdf.width,
height=imageFromPdf.height * pages
)
for i in range(pages):
image.composite(
imageFromPdf.sequence[i],
top=imageFromPdf.height * i,
left=0
)
image.colorspace = 'gray'
image.alpha_channel = False
image.format = 'png'
image
The following are several methods of getting a bilevel output from Python Wand (0.5.7). The last needs IM 7 to work. One note in my testing is that in IM 7, the first two results are swapped in terms of dithering or not dithering. But I have reported this to the Python Wand developer.
Input:
from wand.image import Image
from wand.display import display
# Using Wand 0.5.7, all images are not dithered in IM 6 and all images are dithered in IM 7
with Image(filename='lena.jpg') as img:
with img.clone() as img_copy1:
img_copy1.quantize(number_colors=2, colorspace_type='gray', treedepth=0, dither=False, measure_error=False)
img_copy1.auto_level()
img_copy1.save(filename='lena_monochrome_no_dither.jpg')
display(img_copy1)
with img.clone() as img_copy2:
img_copy2.quantize(number_colors=2, colorspace_type='gray', treedepth=0, dither=True, measure_error=False)
img_copy2.auto_level()
img_copy2.save(filename='lena_monochrome_dither.jpg')
display(img_copy2)
with img.clone() as img_copy3:
img_copy3.threshold(threshold=0.5)
img_copy3.save(filename='lena_threshold.jpg')
display(img_copy3)
# only works in IM 7
with img.clone() as img_copy4:
img_copy4.auto_threshold(method='otsu')
img_copy4.save(filename='lena_threshold_otsu.jpg')
display(img_copy4)
First output using IM 6:
Second output using IM 7:

How to get text from image

I want to read the text from an image.
I use pytesseract in Python.
Here is my code:
import pytesseract
from PIL import Image
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
image = Image.open(r'a.jpg')
image.resize((150, 50),Image.ANTIALIAS).save("pic.jpg")
image = Image.open("pic.jpg")
captcha = pytesseract.image_to_string(image).replace(" ", "").replace("-", "").replace("$", "")
image
However, it returns empty string.
What should be the correct way?
Thanks.
i agree with #Jon Betts
tesseract is not very strong in OCR, only good in binary cases with right settings
CAPTCHAs ment to fool OCRs!
but if you really need to do it, you need to come up with the manual procedure for it,
i created the code below specifically for the type of CAPTCHAs that you gave (but its completely rigid and is not generalized/optimized for all cases)
psudo code
apply median blur
apply a threshold to get Blue colors only (binary image output from this stage)
apply opening to reduce small white pixels in binary image
give the image to tesseract with options:
limited whitelist of output chars
OEM 3 : tesseract + cube
PSM 8 : one word per image
Code
from PIL import Image
import pytesseract
import numpy as np
import cv2
img = cv2.imread('a.jpg')
img = cv2.medianBlur(img, 3)
# extract blue parts
img2 = np.zeros((img.shape[0], img.shape[1]), dtype=np.uint8)
cond = np.bitwise_and(img[:, :, 0] >= 100, img[:, :, 2] < 100)
img2[np.where(cond)] = 255
img = img2
# delete the noise
kernel = cv2.getStructuringElement(cv2.MORPH_CROSS, (3, 3))
img = cv2.morphologyEx(img, cv2.MORPH_OPEN, kernel)
str1 = pytesseract.image_to_string(Image.fromarray(img),
config='-c tessedit_char_whitelist=abcedfghijklmnopqrtuvwxyz0123456789 -oem 3 -psm 8')
cv2.imwrite("frame.png", img)
print(str1)
output
f2e4
image
in order to see full options of tesseract, type the following command tesseract --help-extra or refere to this_link
Tesseract is intended for performing OCR on text documents. In my experience it's good but a bit patchy even with very clean data.
In this case it appears you are trying to solve a CAPTCHA which is specifically designed to defeat OCR software. It's very likely you cannot use Tesseract to solve this issue, because:
It's not really designed for that
The scenario is adversarial:
The example is specifically designed to prevent what you are trying to do
If you could get it to work, the other party would likely change it to break again
If you want to proceed I would suggest:
Working on cleaning up the image before attempting to process it (can you get a nice readable black and white image?)
Train your own recognition network using a lot of examples

Can someone explain me different type of modes of image?

enter image description here
I am new to the this image processing stuff. Why I am asking this question is because I have a code which works for RGB mode but doesnt for P mode ?
So I came to conclusion that it is something related to modes. I did some basic research on modes.but did not find any simple explanation. Will be helpful if someone can help me understand this.
CODE:
image=Image.open('image.png')
image.load()
image_data = np.asarray(image)
image_data_bw = image_data.max(axis=2)
non_empty_columns = np.where(image_data_bw.max(axis=0)>0)[0]
non_empty_rows = np.where(image_data_bw.max(axis=1)>0)[0]
cropBox = (min(non_empty_rows), max(non_empty_rows), min(non_empty_columns), max(non_empty_columns))
image_data_new = image_data[cropBox[0]:cropBox[1]+1, cropBox[2]:cropBox[3]+1 , :]
new_image = Image.fromarray(image_data_new)
new_image.save('cropped_image.png')
Codesource
Input to the code following Image:
Output should be like the following image(It is cropped to the edges of the picture. Please click on the image for understanding):
This Image is in RGBA mode.so the code is working fine for such images. But not with the image in P mode.
ERROR:
Error I get with P mode:
axis 2 is out of bounds for array of dimension 2
The answer you found greatly overcomplicates the process, by using numpy. The PIL library supports this usecase natively, with the image.getbbox() and image.crop() methods:
cropbox = image.getbbox()
new_image = image.crop(cropbox)
This works for all the different modes, regardless. The cropbox produced by image.getbbox() is exactly the same size as the one produced by the numpy route.
from PIL import Image
img = Image.open('Image.png')
print(x,y)
img.show()
cropbox_1 = img.getbbox()
new_image_1 = img.crop(cropbox_1)
new_image_1.save('Cropped_image,png')
new_image_1.show()
This code completely crops the image to the edges. Only if the images are having alpha channel, you might have to remove that channel by converting it.
ex. If it is a RGBA mode make it RGB and then use getbbox().
img = image.convert('RGB')
cropbox = img.getbbox()
image_1 = img.crop(cropbox)
addition of this should do the task.

Lower the brightness of all RGB pixels by 20% in Python?

I have been trying to teach myself more advanced methods in Python but can't seem to find anything similar to this problem to base my code off of.
First question: Is this only way to display an image in the terminal to install Pillow? I would prefer not to, as I'm trying to then teach what I learn to a very beginner student. My image.show() function doesn't do anything.
Second question: What is the best way to go about lowering the brightness of all RGB pixels in an image by 20%? What I have below doesn't do anything to the alter the brightness, but it also can compile completely. I would prefer the most simple way to go about this as far as importing minimal libraries.
Third Question: How do I made a new picture instead of changing the original? (IE- lower brightness 20%, "image-decreasedBrightness.jpg" is created from "image.jpg")
here is my code - sorry it isn't formatted correctly. Every time i tried to indent it would tab down to the tags bar.
import Image
import ImageEnhance
fileToBeOpened = raw_input("What is the file name? Include file type.")
image = Image.open(fileToBeOpened)
def decreaseBrightness(image):
image.show()
image = image.convert('L')
brightness = ImageEnhance.Brightness(image)
image = brightness.enhance(20)
image.show()
return image
decreaseBrightness(image)
To save the image as a file, there's an example on the documentation:
from PIL import ImageFile
fp = open("lena.pgm", "rb")
p = ImageFile.Parser()
while 1:
s = fp.read(1024)
if not s:
break
p.feed(s)
im = p.close()
im.save("copy.jpg")
The key function is im.save.
For a more in-depth solution, get a nice beverage, find a comfortable place to sit and enjoy your read:
Pillow 3.4.x Documentation.

How do I convert an RGB picture into graysacle using simplecv?

So working with windows, python 2.7 and simplecv I am making a live video with my webcam and want simplecv to give me a grayscale version of the video. Is there any simple way to achieve that?
I found the command
grayscale()
on the opencv page, which should do exactly that but when I run it I get the error:
NameError: name "grayscale" is not defined
I am currently using this prewritten code for object tracking but I don't know whether I should use the command I found, and where in the code I should put it, does anybody have an idea? :
print __doc__
import SimpleCV
display = SimpleCV.Display()
cam = SimpleCV.Camera()
normaldisplay = True
while display.isNotDone():
if display.mouseRight:
normaldisplay = not(normaldisplay)
print "Display Mode:", "Normal" if normaldisplay else "Segmented"
img = cam.getImage().flipHorizontal()
dist = img.colorDistance(SimpleCV.Color.BLACK).dilate(2)
segmented = dist.stretch(200,255)
blobs = segmented.findBlobs()
if blobs:
circles = blobs.filter([b.isCircle(0.2) for b in blobs])
if circles:
img.drawCircle((circles[-1].x, circles[-1].y), circles[-1].radius(),SimpleCV.Color.BLUE,3)
if normaldisplay:
img.show()
else:
segmented.show()
There are multiple ways to do this in SimpleCV.
One way has been already described, it's the toGray() method.
There's also a way you can do this with gaussian blur, which also helps to remove image noise:
from SimpleCV import *
img = Image("simplecv")
img.applyGaussianFilter(grayscale=True)
After the third line, img object contains the image with a lot less high-frequency noise, and converted to grayscale.
You may check out pyimagesearch.com who works with OpenCV, but he explains why applying Gaussian Blur is a good idea.
In simple cv theres a function called toGray() for example:
import SimpleCV as sv
img = img.jpg
sv.img.jpg.toGray()
return gimg.jpg

Categories