tesseract detects only 4 words from image - python

I have very simple python code:
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = 'C:\\Tesseract-OCR\\tesseract.exe'
img = cv2.imread('1.png')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
hImg,wImg,_ = img.shape
#detecting words
boxes = pytesseract.image_to_data(img)
for x,b in enumerate(boxes.splitlines()):
if x!=0:
b = b.split()
if len(b) == 12:
x,y,w,h = int(b[6]), int(b[7]), int(b[8]), int(b[9])
cv2.rectangle(img, (x,y), (w+x,h+y), (0,0,255), 3)
cv2.imshow('result', img)
cv2.waitKey(0)
But result was interesting. It detected only 4 words. what could it be the reason?

You'll have better OCR results if you improve the quality of the image you are giving Tesseract.
While tesseract version 3.05 (and older) handle inverted image (dark background and light text) without problem, for 4.x version use dark text on light background.
Convert from BGR to HLS to later remove background colors from the numbers in the top half of the image. Then, create a "blue" mask with cv2.inRange and replace anything that's not "blue" with the color white.
hls=cv2.cvtColor(img,cv2.COLOR_BGR2HLS)
# Define lower and upper limits for the number colors.
blue_lo=np.array([114, 70, 70])
blue_hi=np.array([154, 225, 225])
# Mask image to only select "blue"
mask=cv2.inRange(hls,blue_lo,blue_hi)
# copy original image
img1 = img.copy()
img1[mask==0]=(255,255,255)
Help pytesseract by converting the image to black and white
This is converting an image to black and white. Tesseract does this internally (Otsu algorithm), but the result can be suboptimal, particularly if the page background is of uneven darkness.
rgb = cv2.cvtColor(img1, cv2.COLOR_HLS2RGB)
gray = cv2.cvtColor(rgb, cv2.COLOR_RGB2GRAY)
_, img1 = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
cv2.imshow('img_to_binary',img1)
Use image_to_data over the previously created img1 and continue applying your existing code.
...
hImg,wImg,_ = img.shape
#detecting words
boxes = pytesseract.image_to_data(img1)
for x,b in enumerate(boxes.splitlines()):
...
...

Related

How to improve Tesseract's output

I have an image that looks like this:
And this is the processed image
I have tried pretty much everything. I processed the image like this:
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) #Converting to GrayScale
(h, w) = gray.shape[:2]
gray = cv2.resize(gray, (w*2, h*2))
thresh = cv2.threshold(gray, 150, 255.0, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
gray = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, rectKernel)
blur = cv2.GaussianBlur(gray,(1,1),cv2.BORDER_DEFAULT)
text = pytesseract.image_to_string(blur, config="--oem 1 --psm 6")
But Tesseract doesnt print out anything. I am using this version of tesseract
5.0.0-alpha.20201127
How do I improve it's performance? Its highly unreliable.
Edit:
The answer below did a wonderful job on the said image.
But when I apply this technique to image like this one I get wrong output
Why is that? They seem roughly the same.
The problem is characters are not in center of the image.
Sometimes, tesseract have difficulty recognizing the characters or digit if they are not on the center.
Therefore my suggestion is:
Center the characters
Up-sample and convert to gray-scale
Centering the characters:
cv2.copyMakeBorder(img, 50, 50, 50, 50, cv2.BORDER_CONSTANT, value=[255])
50 is just a padding variable, you can set to any other value.
The background turns blue because of the value. OpenCV read the image in BGR fashion. giving 255 as an input is same as [255, 0, 0] which is display blue channel, but not green and red respectively.
You can try with other values. For me it won't matter, since I'll convert it to gray-scale on the next step.
Up-sampling and converting to gray-scale:
The same steps you have done. The first three-line of your code.
Now when you read:
MEHVISH MUQADDAS
Code:
import cv2
import pytesseract
# Load the image
img = cv2.imread("onf0D.jpg")
# Center the image
img = cv2.copyMakeBorder(img, 50, 50, 50, 50, cv2.BORDER_CONSTANT, value=[255])
# Up-sample
img = cv2.resize(img, (0, 0), fx=2, fy=2)
# Convert to gray-scale
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# OCR
txt = pytesseract.image_to_string(gry, config="--psm 6")
print(txt)
Read more tesseract-improve-quality.
You don't need to do threshold, GaussianBlur or morphologyEx.
The reasons are:
Simple-Threshold is used to get the features of the image. Input images' features are already available.
You don't have to smooth the image, there is no illumination effect on the image.
You don't need to do segmentation, since background is plain-white.
Update-1
The second image requires pre-processing. However, applying simple-threshold won't work on this image. You need to remove the background using a binary mask, then you can apply OCR.
Result of the binary-mask:
Now, if you apply OCR:
IRUM FEROZ
Code:
import cv2
import numpy as np
import pytesseract
# Load the image
img = cv2.imread("jCMft.jpg")
# Center the image
img = cv2.copyMakeBorder(img, 50, 50, 50, 50, cv2.BORDER_CONSTANT, value=[255])
# Up-sample
img = cv2.resize(img, (0, 0), fx=2, fy=2)
# Convert to HSV color-space
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
# Adaptive-Threshold
msk = cv2.inRange(hsv, np.array([0, 0, 0]), np.array([179, 255, 130]))
# OCR
txt = pytesseract.image_to_string(msk, config="--psm 6")
print(txt)
Q:How do I find the lower and upper bounds of the cv2.inRange method?
A: You can use the following script.
Q: What did you change in the second image?
A: First I converted image to the HSV format, instead of gray-scale. The reason is I wanted remove the background. If you experiment with adaptiveThreshold you will see there are a lot of artifacts on the background limits the tesseract recognition. Then I used cv2.inRange to get a binary mask. Feeding binary-mask to the input gave me the desired result.

remove the element attached to the image border

I'm using OpenCV to detect Pneumonia in chest-x-ray using Image Processing, so I need to remove the attached area to the image border to get the lung only, can anyone help me code this in python?
This image explains what I want this image after applying this methods: resized, Histogram Equalization, otsu Thresholded and inverse binary Thresholded, morphological processes(opening then closing)
This is the Original Image Original Image
This is how I would approach the problem in Python/OpenCV. Add a white border all around, flood fill it with black to replace the white, then remove the extra border.
Input:
import cv2
import numpy as np
# read image
img = cv2.imread('lungs.jpg')
h, w = img.shape[:2]
# convert to gray
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# add 1 pixel white border all around
pad = cv2.copyMakeBorder(gray, 1,1,1,1, cv2.BORDER_CONSTANT, value=255)
h, w = pad.shape
# create zeros mask 2 pixels larger in each dimension
mask = np.zeros([h + 2, w + 2], np.uint8)
# floodfill outer white border with black
img_floodfill = cv2.floodFill(pad, mask, (0,0), 0, (5), (0), flags=8)[1]
# remove border
img_floodfill = img_floodfill[1:h-1, 1:w-1]
# save cropped image
cv2.imwrite('lungs_floodfilled.png',img_floodfill)
# show the images
cv2.imshow("img_floodfill", img_floodfill)
cv2.waitKey(0)
cv2.destroyAllWindows()
You can try using a morphological reconstruction with a border as a marker. This is an analogue of the function imclearborder from Matlab or Octave.
import cv2
import numpy as np
img = cv2.imread('5R0Zs.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 40, 255, cv2.THRESH_BINARY)[1]
kernel = np.ones((7,7),np.uint8)
kernel2 = np.ones((3,3),np.uint8)
marker = thresh.copy()
marker[1:-1,1:-1]=0
while True:
tmp=marker.copy()
marker=cv2.dilate(marker, kernel2)
marker=cv2.min(thresh, marker)
difference = cv2.subtract(marker, tmp)
if cv2.countNonZero(difference) == 0:
break
mask=cv2.bitwise_not(marker)
mask_color = cv2.cvtColor(mask, cv2.COLOR_GRAY2BGR)
out=cv2.bitwise_and(img, mask_color)
cv2.imwrite('out.png', out)
cv2.imshow('result', out )
cv2.waitKey(0) # waits until a key is pressed
cv2.destroyAllWindows()

How to OCR image with Tesseract

I am starting to learn OpenCV and Tesseract, and have trouble with what seems to be a very simple example.
Here is an image that I am trying to OCR, that reads "171 m":
I do some preprocessing. Since blue is the dominant color of the text, I extract the blue channel and apply simple thresholding.
img = cv2.imread('171_m.png')[y, x, 0]
_, thresh = cv2.threshold(img, 150, 255, cv2.THRESH_BINARY_INV)
The resulting image looks like this:
Then throw that into Tesseract, with psm 7 for single line:
text = pytesseract.image_to_string(thresh, config='--psm 7')
print(text)
>>> lim
I also tried to restrict possible characters, and it gets a bit better, but not quite.
text = pytesseract.image_to_string(thresh, config='--psm 7 -c tessedit_char_whitelist=1234567890m')
print(text)
>>> 17m
OpenCV v4.1.1.
Tesseract v5.0.0-alpha.20190708
Any help appreciated.
Before throwing the image into Pytesseract, preprocessing can help. The desired text should be in black while the background should be in white. Here's an approach
Convert image to grayscale and enlarge image
Gaussian blur
Otsu's threshold
Invert image
After converting to grayscale, we enlarge the image using imutils.resize() and Gaussian blur. From here we Otsu's threshold to get a binary image
If you have noisy images, an additional step would be to use morphological operations to smooth or remove noise. But since your image is clean enough, we can simply invert the image to get our result
Output from Pytesseract using --psm 6
171m
import cv2
import pytesseract
import imutils
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
image = cv2.imread('1.png',0)
image = imutils.resize(image, width=400)
blur = cv2.GaussianBlur(image, (7,7), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
result = 255 - thresh
data = pytesseract.image_to_string(result, lang='eng',config='--psm 6')
print(data)
cv2.imshow('thresh', thresh)
cv2.imshow('result', result)
cv2.waitKey()
Disclaimer : This is not a solution, just a trial to partially solve this.
This process works only if you have knowledge of the number of the characters present in the image beforehand. Here is the trial code :
img0 = cv2.imread('171_m.png', 0)
adap_thresh = cv2.adaptiveThreshold(img0, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2)
text_adth = pytesseract.image_to_string(adap_thresh, config='--psm 7')
After adaptive thresholding, the produced image is like this :
Pytesseract gives output as :
171 mi.
Now, if you know, in advance, the number of characters present, you can slice the string read by pytesseract and get the desired output as '171m'.
I thought your image was not sharp enough, hence I applied the process described at How do I increase the contrast of an image in Python OpenCV to first sharpen your image and then proceed by extracting the blue layer and running the tesseract.
I hope this helps.
import cv2
import pytesseract
img = cv2.imread('test.png') #test.png is your original image
s = 128
img = cv2.resize(img, (s,int(s/2)), 0, 0, cv2.INTER_AREA)
def apply_brightness_contrast(input_img, brightness = 0, contrast = 0):
if brightness != 0:
if brightness > 0:
shadow = brightness
highlight = 255
else:
shadow = 0
highlight = 255 + brightness
alpha_b = (highlight - shadow)/255
gamma_b = shadow
buf = cv2.addWeighted(input_img, alpha_b, input_img, 0, gamma_b)
else:
buf = input_img.copy()
if contrast != 0:
f = 131*(contrast + 127)/(127*(131-contrast))
alpha_c = f
gamma_c = 127*(1-f)
buf = cv2.addWeighted(buf, alpha_c, buf, 0, gamma_c)
return buf
out = apply_brightness_contrast(img,0,64)
b, g, r = cv2.split(out) #spliting and using just the blue
pytesseract.image_to_string(255-b, config='--psm 7 -c tessedit_char_whitelist=1234567890m') # the 255-b here because the image has black backgorund and white numbers, 255-b switches the colors

OpenCV - Smoothing borders

I want to stitch multiple image patches to a new and mainly gray background image. The image patches contain colored elements which shall not be changed, if possible. Their shape and color is diverse. Like the new background image the borders of the image patches are also gray, just slightly different, but you can see strong borders if I just go by
ImgPatch = cv2.imread("C://...//ImagePatch.png")
NewBackground = cv2.imread("C://...//NewBackground.png")
height, width, channels = ImgPatch.shape
NewBackground[y:y+height,x:x+width] = ImgPatch
I tried cv2.seamlessClone() (docs.opencv.org) as explained in this tutorial:
www.learnopencv.com/seamless-cloning-using-opencv-python-cpp
The edges are perfectly smoothed, but unfortunately the colors of the elements are changed way too much. I know the approximate width and height of the gray border of each image patch. If i could specifically smooth that area that may be a start and lets the result look already better than what I have. I tried different masks with cv2.seamlessClone(), of which none of the tried ways workes. So unfortunately I couldn't find a correct way to blend only the border of the patches so far.
The following images visualize my problem in a very abstract way.
What I have:
Left: Background, Right: Image patch
What I want:
What I currently get by using cv2.seamlessClone():
Any help would be very much appreciated!
EDIT As I probably was not clear enough: The real images are way more complex and so unfortunately I can not get reasonable results for all image patches by using cv2.findContour... What I am looking for is a method to merge the borders, so you can not see the exact transition of patch to background anymore.
patch = cv2.imread('patch.png', cv2.IMREAD_UNCHANGED);
image = cv2.imread('image.png', cv2.IMREAD_UNCHANGED);
mask = 255 * np.ones(patch.shape, patch.dtype)
width, height, channels = image.shape
center = (height//2, width//2)
mixed_clone = cv2.seamlessClone(patch, image, mask, center, cv2.cv2.NORMAL_CLONE)
You could try to find contour in your image patch with cv2.findContour() (red spot). Then remove the background of the contour and save the image. You can finally combine the one you saved (red spot without background) with the gray background image with cv2.add(). I have combined some code I once played with and the code in OpenCV docs (for cv2.add()). Hope it helps a bit (Note the example ads the image in upper left corner - if you want elswhere you should change the code). Cheers!
Example:
import cv2
import numpy as np
from PIL import Image
img = cv2.imread('background2.png', cv2.IMREAD_UNCHANGED)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
ret, threshold = cv2.threshold(gray, 100, 255, cv2.THRESH_BINARY_INV)
height,width = gray.shape
mask = np.zeros((height,width), np.uint8)
_, contours, hierarchy = cv2.findContours(threshold,cv2.RETR_TREE,cv2.CHAIN_APPROX_NONE)
cnt = max(contours, key=cv2.contourArea)
cv2.drawContours(mask,[cnt], -1, (255,255,255),thickness=-1)
masked = cv2.bitwise_and(img, img, mask=mask)
_,thresh = cv2.threshold(mask,1,255,cv2.THRESH_BINARY)
contours = cv2.findContours(thresh,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)
x,y,w,h = cv2.boundingRect(contours[0])
circle = masked[y:y+h,x:x+w]
cv2.imwrite('temp.png', circle)
cv2.waitKey(0)
cv2.destroyAllWindows()
img = Image.open('temp.png')
img = img.convert("RGBA")
datas = img.getdata()
newData = []
for item in datas:
if item[0] == 0 and item[1] == 0 and item[2] == 0:
newData.append((255, 255, 255, 0))
else:
newData.append(item)
img.putdata(newData)
img.save('background3.png', "PNG")
img1 = cv2.imread('background1.png')
img2 = cv2.imread('background3.png')
rows,cols,channels = img2.shape
roi = img1[0:rows, 0:cols ]
img2gray = cv2.cvtColor(img2,cv2.COLOR_BGR2GRAY)
ret, mask = cv2.threshold(img2gray, 110, 255, cv2.THRESH_BINARY_INV)
mask_inv = cv2.bitwise_not(mask)
img1_bg = cv2.bitwise_and(roi,roi,mask = mask_inv)
img2_fg = cv2.bitwise_and(img2,img2,mask = mask)
dst = cv2.add(img1_bg,img2_fg)
img1[0:rows, 0:cols] = dst
cv2.imshow('img',img1)
cv2.waitKey(0)
cv2.destroyAllWindows()
Result:

Extracting Walls from Image in OpenCv

I have the following image of a maze:
Input:
The walls are white and the path is black. How can I extract one of the walls in a separate image in opencv?
This is the output I want
Output:
You can use the concept of flood fill algorithm for this problem.
In your input image notice how the 'walls' are distinct and are neighbored by black pixels (path). When you initialize the algorithm at any one pixel within this 'wall' they will be separated from the rest of the image.
Code:
path = r'C:\Users\Desktop'
filename = 'input.png'
img = cv2.imread(os.path.join(path, filename))
cv2.imshow('Original', img)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
ret, thresh = cv2.threshold(gray, 100, 255,cv2.THRESH_BINARY_INV)
cv2.imshow('thresh1', thresh)
im_floodfill = thresh.copy()
h, w = thresh.shape[:2]
mask = np.zeros((h+2, w+2), np.uint8)
cv2.floodFill(im_floodfill, mask, (0,0), 255)
cv2.imshow('im_floodfill', im_floodfill)
cv2.imshow('fin', cv2.bitwise_not(cv2.bitwise_not(im_floodfill) + thresh))
There is another example detailing this function on THIS POST

Categories