How to improve Tesseract's output - python

I have an image that looks like this:
And this is the processed image
I have tried pretty much everything. I processed the image like this:
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) #Converting to GrayScale
(h, w) = gray.shape[:2]
gray = cv2.resize(gray, (w*2, h*2))
thresh = cv2.threshold(gray, 150, 255.0, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
gray = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, rectKernel)
blur = cv2.GaussianBlur(gray,(1,1),cv2.BORDER_DEFAULT)
text = pytesseract.image_to_string(blur, config="--oem 1 --psm 6")
But Tesseract doesnt print out anything. I am using this version of tesseract
5.0.0-alpha.20201127
How do I improve it's performance? Its highly unreliable.
Edit:
The answer below did a wonderful job on the said image.
But when I apply this technique to image like this one I get wrong output
Why is that? They seem roughly the same.

The problem is characters are not in center of the image.
Sometimes, tesseract have difficulty recognizing the characters or digit if they are not on the center.
Therefore my suggestion is:
Center the characters
Up-sample and convert to gray-scale
Centering the characters:
cv2.copyMakeBorder(img, 50, 50, 50, 50, cv2.BORDER_CONSTANT, value=[255])
50 is just a padding variable, you can set to any other value.
The background turns blue because of the value. OpenCV read the image in BGR fashion. giving 255 as an input is same as [255, 0, 0] which is display blue channel, but not green and red respectively.
You can try with other values. For me it won't matter, since I'll convert it to gray-scale on the next step.
Up-sampling and converting to gray-scale:
The same steps you have done. The first three-line of your code.
Now when you read:
MEHVISH MUQADDAS
Code:
import cv2
import pytesseract
# Load the image
img = cv2.imread("onf0D.jpg")
# Center the image
img = cv2.copyMakeBorder(img, 50, 50, 50, 50, cv2.BORDER_CONSTANT, value=[255])
# Up-sample
img = cv2.resize(img, (0, 0), fx=2, fy=2)
# Convert to gray-scale
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# OCR
txt = pytesseract.image_to_string(gry, config="--psm 6")
print(txt)
Read more tesseract-improve-quality.
You don't need to do threshold, GaussianBlur or morphologyEx.
The reasons are:
Simple-Threshold is used to get the features of the image. Input images' features are already available.
You don't have to smooth the image, there is no illumination effect on the image.
You don't need to do segmentation, since background is plain-white.
Update-1
The second image requires pre-processing. However, applying simple-threshold won't work on this image. You need to remove the background using a binary mask, then you can apply OCR.
Result of the binary-mask:
Now, if you apply OCR:
IRUM FEROZ
Code:
import cv2
import numpy as np
import pytesseract
# Load the image
img = cv2.imread("jCMft.jpg")
# Center the image
img = cv2.copyMakeBorder(img, 50, 50, 50, 50, cv2.BORDER_CONSTANT, value=[255])
# Up-sample
img = cv2.resize(img, (0, 0), fx=2, fy=2)
# Convert to HSV color-space
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
# Adaptive-Threshold
msk = cv2.inRange(hsv, np.array([0, 0, 0]), np.array([179, 255, 130]))
# OCR
txt = pytesseract.image_to_string(msk, config="--psm 6")
print(txt)
Q:How do I find the lower and upper bounds of the cv2.inRange method?
A: You can use the following script.
Q: What did you change in the second image?
A: First I converted image to the HSV format, instead of gray-scale. The reason is I wanted remove the background. If you experiment with adaptiveThreshold you will see there are a lot of artifacts on the background limits the tesseract recognition. Then I used cv2.inRange to get a binary mask. Feeding binary-mask to the input gave me the desired result.

Related

tesseract detects only 4 words from image

I have very simple python code:
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = 'C:\\Tesseract-OCR\\tesseract.exe'
img = cv2.imread('1.png')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
hImg,wImg,_ = img.shape
#detecting words
boxes = pytesseract.image_to_data(img)
for x,b in enumerate(boxes.splitlines()):
if x!=0:
b = b.split()
if len(b) == 12:
x,y,w,h = int(b[6]), int(b[7]), int(b[8]), int(b[9])
cv2.rectangle(img, (x,y), (w+x,h+y), (0,0,255), 3)
cv2.imshow('result', img)
cv2.waitKey(0)
But result was interesting. It detected only 4 words. what could it be the reason?
You'll have better OCR results if you improve the quality of the image you are giving Tesseract.
While tesseract version 3.05 (and older) handle inverted image (dark background and light text) without problem, for 4.x version use dark text on light background.
Convert from BGR to HLS to later remove background colors from the numbers in the top half of the image. Then, create a "blue" mask with cv2.inRange and replace anything that's not "blue" with the color white.
hls=cv2.cvtColor(img,cv2.COLOR_BGR2HLS)
# Define lower and upper limits for the number colors.
blue_lo=np.array([114, 70, 70])
blue_hi=np.array([154, 225, 225])
# Mask image to only select "blue"
mask=cv2.inRange(hls,blue_lo,blue_hi)
# copy original image
img1 = img.copy()
img1[mask==0]=(255,255,255)
Help pytesseract by converting the image to black and white
This is converting an image to black and white. Tesseract does this internally (Otsu algorithm), but the result can be suboptimal, particularly if the page background is of uneven darkness.
rgb = cv2.cvtColor(img1, cv2.COLOR_HLS2RGB)
gray = cv2.cvtColor(rgb, cv2.COLOR_RGB2GRAY)
_, img1 = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
cv2.imshow('img_to_binary',img1)
Use image_to_data over the previously created img1 and continue applying your existing code.
...
hImg,wImg,_ = img.shape
#detecting words
boxes = pytesseract.image_to_data(img1)
for x,b in enumerate(boxes.splitlines()):
...
...

Improve OCR result from image using pytesseract

I am using pytesseract to read number from the screen in real-time.
The image mostly number, dot and 2 letters (M and R) as below.
In real-time number will keep changing but the letter M and R will stay the same place.
Background will always green with black letters.
As you can see the number on image is very clear but the pytesseract read the number and the result is not really satisfy. Sometime its read 7 become 1.
I would like to find the algorithms that help improce OCR result.
Currently I am using Pillow to convert image to gray scale and also try resize image bigger or smaller but still improve result much. Also applied filter on the image as below but result still not 100% correct.
img = cv2.imread('screenshot.png')
img = cv2.resize(img, None, fx=scale_factor, fy=scale_factor, interpolation=cv2.INTER_CUBIC)
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img = cv2.threshold(cv2.bilateralFilter(img, 5, 75, 75), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
text = tess.image_to_string(img)
Please help suggest any algorithms that will help improve this OCR result.
You can easily detect applying simple-thresholding
Threshold
Result
3845.86 M51.31 M 309.12 3860.43 R191.90 R23.44
Thresholding will show the features of the image.
Code:
import cv2
import pytesseract
img = cv2.imread("UEWHj.png")
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.threshold(gry, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
txt = pytesseract.image_to_string(thr)
print(txt)
cv2.imshow("thr", thr)
cv2.waitKey(0)

How to OCR image with Tesseract

I am starting to learn OpenCV and Tesseract, and have trouble with what seems to be a very simple example.
Here is an image that I am trying to OCR, that reads "171 m":
I do some preprocessing. Since blue is the dominant color of the text, I extract the blue channel and apply simple thresholding.
img = cv2.imread('171_m.png')[y, x, 0]
_, thresh = cv2.threshold(img, 150, 255, cv2.THRESH_BINARY_INV)
The resulting image looks like this:
Then throw that into Tesseract, with psm 7 for single line:
text = pytesseract.image_to_string(thresh, config='--psm 7')
print(text)
>>> lim
I also tried to restrict possible characters, and it gets a bit better, but not quite.
text = pytesseract.image_to_string(thresh, config='--psm 7 -c tessedit_char_whitelist=1234567890m')
print(text)
>>> 17m
OpenCV v4.1.1.
Tesseract v5.0.0-alpha.20190708
Any help appreciated.
Before throwing the image into Pytesseract, preprocessing can help. The desired text should be in black while the background should be in white. Here's an approach
Convert image to grayscale and enlarge image
Gaussian blur
Otsu's threshold
Invert image
After converting to grayscale, we enlarge the image using imutils.resize() and Gaussian blur. From here we Otsu's threshold to get a binary image
If you have noisy images, an additional step would be to use morphological operations to smooth or remove noise. But since your image is clean enough, we can simply invert the image to get our result
Output from Pytesseract using --psm 6
171m
import cv2
import pytesseract
import imutils
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
image = cv2.imread('1.png',0)
image = imutils.resize(image, width=400)
blur = cv2.GaussianBlur(image, (7,7), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
result = 255 - thresh
data = pytesseract.image_to_string(result, lang='eng',config='--psm 6')
print(data)
cv2.imshow('thresh', thresh)
cv2.imshow('result', result)
cv2.waitKey()
Disclaimer : This is not a solution, just a trial to partially solve this.
This process works only if you have knowledge of the number of the characters present in the image beforehand. Here is the trial code :
img0 = cv2.imread('171_m.png', 0)
adap_thresh = cv2.adaptiveThreshold(img0, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2)
text_adth = pytesseract.image_to_string(adap_thresh, config='--psm 7')
After adaptive thresholding, the produced image is like this :
Pytesseract gives output as :
171 mi.
Now, if you know, in advance, the number of characters present, you can slice the string read by pytesseract and get the desired output as '171m'.
I thought your image was not sharp enough, hence I applied the process described at How do I increase the contrast of an image in Python OpenCV to first sharpen your image and then proceed by extracting the blue layer and running the tesseract.
I hope this helps.
import cv2
import pytesseract
img = cv2.imread('test.png') #test.png is your original image
s = 128
img = cv2.resize(img, (s,int(s/2)), 0, 0, cv2.INTER_AREA)
def apply_brightness_contrast(input_img, brightness = 0, contrast = 0):
if brightness != 0:
if brightness > 0:
shadow = brightness
highlight = 255
else:
shadow = 0
highlight = 255 + brightness
alpha_b = (highlight - shadow)/255
gamma_b = shadow
buf = cv2.addWeighted(input_img, alpha_b, input_img, 0, gamma_b)
else:
buf = input_img.copy()
if contrast != 0:
f = 131*(contrast + 127)/(127*(131-contrast))
alpha_c = f
gamma_c = 127*(1-f)
buf = cv2.addWeighted(buf, alpha_c, buf, 0, gamma_c)
return buf
out = apply_brightness_contrast(img,0,64)
b, g, r = cv2.split(out) #spliting and using just the blue
pytesseract.image_to_string(255-b, config='--psm 7 -c tessedit_char_whitelist=1234567890m') # the 255-b here because the image has black backgorund and white numbers, 255-b switches the colors

How do i remove highlight from a printed text book in python?

I'm working on extracting the highlighted text from a text book. I have already done locating the highlight and extracting the text inside. To deal with the highlight, I converted the image to grayscale and used OTSU threshold to remove the background highlighted color.
This works great when the highlight is a light color like yellow or green but when the highlight is a dark color, the thresholding fails and i get black background covering most of the text which hinders the ocr reading.
I have tried normalising the brightness but it does not seem to work.
What I need is some way to determine the foreground and background color and then remove the background color. Or I need some way to dynamically threshold the image to get black text and white background.
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
normalized_gray = cv2.equalizeHist(gray)
(thresh, processed_image) = cv2.threshold(normalized_gray, 127, 255, cv2.THRESH_OTSU)
The test image:
https://ibb.co/856YtMx
Some test result:
When i run equalizeHist before thresholding.
https://ibb.co/HT0jpKW
When i run equalizeHist after thresholding.
https://ibb.co/ZXSz97J
When i use a Binary threshold, the text are blown away:
https://ibb.co/DLXywXz
e.g. something like this should work:
import cv2
import numpy as np
image = cv2.imread('photo-2019-08-12-12-44-59.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# adjust contrast
gray_contract = cv2.multiply(gray, 1.5)
# create a kernel for the erode
kernel = np.ones((2, 2), np.uint8)
img_eroded = cv2.erode(gray_contract, kernel, iterations=1)
# binarize with otsu
(thresh, otsu) = cv2.threshold(img_eroded, 127, 255,
cv2.THRESH_BINARY+cv2.THRESH_OTSU)
Also you can have a look at post How to remove shadow from scanned images using OpenCV
Adaptive threshold is what you need here.
My output with code. Can be fine tuned.
import cv2
import numpy as np
img = cv2.imread("high.jpg")
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
gaus = cv2.adaptiveThreshold(img_gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 20)
cv2.imshow("Gaussian", gaus)
cv2.waitKey(0)
cv2.imwrite('output.png', gaus)
UPDATE
Changed parameters to the adaptiveThreshold function, the second image you posted.
gaus = cv2.adaptiveThreshold(img_gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 8)

removing pixels less than n size(noise) in an image - open CV python

i am trying to remove noise in an image less and am currently running this code
import numpy as np
import argparse
import cv2
from skimage import morphology
# Construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required = True,
help = "Path to the image")
args = vars(ap.parse_args())
# Load the image, convert it to grayscale, and blur it slightly
image = cv2.imread(args["image"])
cv2.imshow("Image", image)
cv2.imwrite("image.jpg", image)
greenLower = np.array([50, 100, 0], dtype = "uint8")
greenUpper = np.array([120, 255, 120], dtype = "uint8")
green = cv2.inRange(image, greenLower, greenUpper)
#green = cv2.GaussianBlur(green, (3, 3), 0)
cv2.imshow("green", green)
cv2.imwrite("green.jpg", green)
cleaned = morphology.remove_small_objects(green, min_size=64, connectivity=2)
cv2.imshow("cleaned", cleaned)
cv2.imwrite("cleaned.jpg", cleaned)
cv2.waitKey(0)
However, the image does not seem to have changed from "green" to "cleaned" despite using the remove_small_objects function. why is this and how do i clean the image up? Ideally i would like to isolate only the image of the cabbage.
My thought process is after thresholding to remove pixels less than 100 in size, then smoothen the image with blur and fill up the black holes surrounded by white - that is what i did in matlab. If anybody could direct me to get the same results as my matlab implementation, that would be greatly appreciated. Thanks for your help.
Edit: made a few mistakes when changing the code, updated to what it currently is now and display the 3 images
image:
green:
clean:
my goal is to get somthing like this picture below from matlab implementation:
Preprocessing
A good idea when you're filtering an image is to lowpass the image or blur it a bit; that way neighboring pixels become a little more uniform in color, so it will ease brighter and darker spots on the image and keep holes out of your mask.
img = cv2.imread('image.jpg')
blur = cv2.GaussianBlur(img, (15, 15), 2)
lower_green = np.array([50, 100, 0])
upper_green = np.array([120, 255, 120])
mask = cv2.inRange(blur, lower_green, upper_green)
masked_img = cv2.bitwise_and(img, img, mask=mask)
cv2.imshow('', masked_img)
cv2.waitKey()
Colorspace
Currently, you're trying to contain an image by a range of colors with different brightness---you want green pixels, regardless of whether they are dark or light. This is much more easily accomplished in the HSV colorspace. Check out my answer here going in-depth on the HSV colorspace.
img = cv2.imread('image.jpg')
blur = cv2.GaussianBlur(img, (15, 15), 2)
hsv = cv2.cvtColor(blur, cv2.COLOR_BGR2HSV)
lower_green = np.array([37, 0, 0])
upper_green = np.array([179, 255, 255])
mask = cv2.inRange(hsv, lower_green, upper_green)
masked_img = cv2.bitwise_and(img, img, mask=mask)
cv2.imshow('', masked_img)
cv2.waitKey()
Removing noise in a binary image/mask
The answer provided by ngalstyan shows how to do this nicely with morphology. What you want to do is called opening, which is the combined process of eroding (which more or less just removes everything within a certain radius) and then dilating (which adds back to any remaining objects however much was removed). In OpenCV, this is accomplished with cv2.morphologyEx(img, cv2.MORPH_OPEN, kernel). The tutorials on that page show how it works nicely.
img = cv2.imread('image.jpg')
blur = cv2.GaussianBlur(img, (15, 15), 2)
hsv = cv2.cvtColor(blur, cv2.COLOR_BGR2HSV)
lower_green = np.array([37, 0, 0])
upper_green = np.array([179, 255, 255])
mask = cv2.inRange(hsv, lower_green, upper_green)
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (15, 15))
opened_mask = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel)
masked_img = cv2.bitwise_and(img, img, mask=opened_mask)
cv2.imshow('', masked_img)
cv2.waitKey()
Filling in gaps
In the above, opening was shown as the method to remove small bits of white from your binary mask. Closing is the opposite operation---removing chunks of black from your image that are surrounded by white. You can do this with the same idea as above, but using cv2.morphologyEx(img, cv2.MORPH_CLOSE, kernel). This isn't even necessary after the above in your case, as the mask doesn't have any holes. But if it did, you could close them up with closing. You'll notice my opening step actually removed a small bit of the plant at the bottom. You could actually fill those gaps with closing first, and then opening to remove the spurious bits elsewhere, but it's probably not necessary for this image.
Trying out new values for thresholding
You might want to get more comfortable playing around with different colorspaces and threshold levels to get a feel for what will work best for a particular image. It's not complete yet and the interface is a bit wonky, but I have a tool you can use online to try out different thresholding values in different colorspaces; check it out here if you'd like. That's how I quickly found values for your image.
Although, the above problem is solved using cv2.morphologyEx(img, cv2.MORPH_OPEN, kernel). But, if somebody wants to use morphology.remove_small_objects to remove area less than a specified size, for those this answer may be helpful.
Code I used to remove noise for above image is:
import numpy as np
import cv2
from skimage import morphology
# Load the image, convert it to grayscale, and blur it slightly
image = cv2.imread('im.jpg')
cv2.imshow("Image", image)
#cv2.imwrite("image.jpg", image)
greenLower = np.array([50, 100, 0], dtype = "uint8")
greenUpper = np.array([120, 255, 120], dtype = "uint8")
green = cv2.inRange(image, greenLower, greenUpper)
#green = cv2.GaussianBlur(green, (3, 3), 0)
cv2.imshow("green", green)
cv2.imwrite("green.jpg", green)
imglab = morphology.label(green) # create labels in segmented image
cleaned = morphology.remove_small_objects(imglab, min_size=64, connectivity=2)
img3 = np.zeros((cleaned.shape)) # create array of size cleaned
img3[cleaned > 0] = 255
img3= np.uint8(img3)
cv2.imshow("cleaned", img3)
cv2.imwrite("cleaned.jpg", img3)
cv2.waitKey(0)
Cleaned image is shown below:
To use morphology.remove_small_objects, first labeling of blobs is essential. For that I use imglab = morphology.label(green). Labeling is done like, all pixels of 1st blob is numbered as 1. similarly, all pixels of 7th blob numbered as 7 and so on. So, after removing small area, remaining blob's pixels values should be set to 255 so that cv2.imshow() can show these blobs. For that I create an array img3 of the same size as of cleaned image. I used img3[cleaned > 0] = 255 line to convert all pixels which value is more than 0 to 255.
It seems what you want to remove is a disconnected group of small blobs.
I think erode() will do a good job remove them with the right kernel.
Given an nxn kernel, erode moves the kernel through the image and replaces the center pixel by the minimum pixel in the kernel.
Then you can dilate() the resulting image to restore eroded edges of the green part.
Another option would be to use fastndenoising
##### option 1
kernel_size = (5,5) # should roughly have the size of the elements you want to remove
kernel_el = cv2.getStructuringElement(cv2.MORPH_RECT, kernel_size)
eroded = cv2.erode(green, kernel_el, (-1, -1))
cleaned = cv2.dilate(eroded, kernel_el, (-1, -1))
##### option 2
cleaned = cv2.fastNlMeansDenoisingColored(green, h=10)

Categories