I'm using cv2 and pytesseract library to extract text from image. Here is the image (image3_3.png) and the python code:
def threshold_image(img_src):
"""Grayscale image and apply Otsu's threshold"""
# Grayscale
img_gray = cv2.cvtColor(img_src, cv2.COLOR_BGR2GRAY)
# Binarisation and Otsu's threshold
img_thresh = cv2.threshold(img_gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
return img_thresh
img = np.array(Image.open('image3_3.png'))
# Apply dilation and erosion to remove some noise
kernel = np.ones((1, 1), np.uint8)
img = cv2.dilate(img, kernel, iterations=1)
img = cv2.erode(img, kernel, iterations=1)
# normalise the image
norm_img = np.zeros((img.shape[0], img.shape[1]))
img = cv2.normalize(img, norm_img, 0, 255, cv2.NORM_MINMAX)
# Apply blur to smooth out the edges
img = cv2.GaussianBlur(img, (5, 5), 0)
string_ocr = pytesseract.image_to_string(threshold_image(img), lang = 'eng', config = '--psm 6')
print(string_ocr)
Here is the result:
Image A3. This is image A3 with more texts.
ISAS Visual Analytics
INow everyone can easily discover and share powerful
Nsights that inspire action
Why am I not getting the same exact text? Any help highly appreciated.
Related
I have an image as the input and my aim is to extract binary mask which will show only face area. Like pretty simple image segmentation.
My flow:
Input image -> Result
I believe that It could be solved with OpenCV threshold function.
My attempt:
So I tried to implement OpenCV
image = cv2.imread('input.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
kernel = np.ones((6, 6), dtype=np.uint8)
gray = cv2.blur(gray, (13, 13))
gray = cv2.erode(gray, kernel, 3)
_, thres = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY)
cv2.imwrite('result.png', thres)
But It returns pretty dirty result, eyes not flodfilled and some other features are detected.
import cv2
import numpy as np
# Load image, grayscale, Gaussian blur, Otsu's threshold
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (7,7), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Create rectangular structuring element and dilate
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5))
dilate = cv2.dilate(thresh, kernel, iterations=4)
cv2.imshow('dilate', dilate)
cv2.waitKey()
I am trying to mask text in image using this code. But it is working for only one image.
I have a cropped image and I am trying to get the numbers on that cropped image
Here's the code I am using
image = cv2.imread('Cropped.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (3,3), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=1)
invert = 255 - opening
data = pytesseract.image_to_string(invert, lang='eng', config='--psm 6')
print(data)
Here's the sample cropped image
All what I got some numbers and not all of them. How to enhance such an image to be able to extract only the numbers?
I tried the code on this image but doesn't return correct numbers
You can easily solve this with three-main steps
Upsampling
Applying simple-threshold
set configuration to digits
Upsampling for accurate recognition. Otherwise tesseract may misterpret the digits.
Threshold Displays only the features of the image.
**Configuration Setting will recognize the digits
Result
Upsampling
Threshold
Pytesseract
277032200746
Code:
import cv2
import pytesseract
img1 = cv2.imread("kEpyN.png") # "FX2in.png"
gry1 = cv2.cvtColor(img1, cv2.COLOR_BGR2GRAY)
(h, w) = gry1.shape[:2]
gry1 = cv2.resize(gry1, (w*2, h*2))
thr1 = cv2.threshold(gry1, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
txt1 = pytesseract.image_to_string(thr1, config="digits")
print("".join(t for t in txt1 if t.isalnum()))
cv2.imshow("thr1", thr1)
cv2.waitKey(0)
Update:
Most-probably a version mismatch causes extra words and digits.
One-way to solving is taking a range of the image
For instance, from the thresholded image:
(h_thr, w_thr) = thr1.shape[:2]
thr1 = thr1[0:h_thr-10, int(w_thr/2)-400:int(w_thr/2)+200]
Result will be:
Now if you read, result should be like this output
277032200746
Pytesseract is unable to extract text when texts are present in different colors . I tried using opencv to invert the image but it doesn't work for dark text colors.
The image:
import cv2
import pytesseract
from PIL import Image
def text(image):
image = cv2.resize(image, (0, 0), fx=7, fy=7)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imwrite("gray.png", gray)
blur = cv2.GaussianBlur(gray, (3, 3), 0)
cv2.imwrite("gray_blur.png", blur)
thresh = cv2.threshold(blur, 127, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
cv2.imwrite("thresh.png", thresh)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=1)
cv2.imwrite("opening.png", opening)
invert = 255 - opening
cv2.imwrite("invert.png", invert)
data = pytesseract.image_to_string(invert, lang="eng", config="--psm 7")
return data
Is there a way to extract both the texts from the given image: DEADLINE(red) and WHITE HOUSE(white)
You can use ImageOps to invert the image.And binaryzate the Image.
import pytesseract
from PIL import Image,ImageOps
import numpy as np
img = Image.open("OCR.png").convert("L")
img = ImageOps.invert(img)
# img.show()
threshold = 240
table = []
pixelArray = img.load()
for y in range(img.size[1]): # binaryzate it
List = []
for x in range(img.size[0]):
if pixelArray[x,y] < threshold:
List.append(0)
else:
List.append(255)
table.append(List)
img = Image.fromarray(np.array(table)) # load the image from array.
# img.show()
print(pytesseract.image_to_string(img))
The result:
The img in the end like this:
Here I am using below script to remove black spot near the image and remove line-through above number but it removes noise but not properly.
def get_string(img_path):
# Read image with opencv
img = cv2.imread(img_path)
# Convert to gray
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Apply dilation and erosion to remove some noise
kernel = np.ones((1, 1), np.uint8)
img = cv2.dilate(img, kernel, iterations=12)
img = cv2.erode(img, kernel, iterations=12)
# Write image after removed noise
cv2.imwrite(src_path + "removed_noise.png", img)
# Apply threshold to get image with only black and white
img = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
# Write the image after apply opencv to do some ...
cv2.imwrite(src_path + "thres.png", img)
# Recognize text with tesseract for python
result = pytesseract.image_to_string(Image.open(src_path + "vertical_final.jpg"))
# Remove template file
#os.remove(temp)
return result
but it's not working properly.
Input image:
Output Image:-
I need someone to help me out from these problems it's highly appreciated.
Source Code:-
def get_string(img_path):
# Read image with opencv
img = cv2.imread(img_path)
# Convert to gray
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Apply dilation and erosion to remove some noise
kernel = np.ones((1,20), np.uint8)
img = cv2.dilate(img, kernel, iterations=1)
img = cv2.erode(img, kernel, iterations=1)
#img = cv2.morphologyEx(img, cv2.MORPH_CLOSE, kernel)
kernel = np.ones((1, 1), np.uint8)
#img = cv2.morphologyEx(img, cv2.MORPH_CLOSE, kernel)
cv2.imwrite(src_path + "removed_noise.png", img)
img3 = cv2.subtract(cv2.imread(src_path + "removed_noise.png"),cv2.imread(src_path + "tax_amount.png"))
cv2.imwrite(src_path + "removed_noise_makes_00.png", img3)
lower_black = np.array([0,0,0], dtype = "uint16")
upper_black = np.array([70,70,70], dtype = "uint16")
black_mask = cv2.inRange(img3, lower_black, upper_black)
black_mask[np.where((black_mask == [0] ).all(axis = 1))] = [255]
opening = cv2.morphologyEx(black_mask, cv2.MORPH_CLOSE, kernel)
cv2.imwrite(src_path + "removed_noise_makes_00_1.png", opening)
# Recognize text with tesseract for python
result = pytesseract.image_to_string(Image.open(src_path + "removed_noise_makes_00_1.png"))
# Remove template file
#os.remove(temp)
return result
Where you do
kernel = np.ones((1, 1), np.uint8)
img = cv2.dilate(img, kernel, iterations=12)
You apply 12 times a dilation with a 1x1 structuring element (SE). Unless OpenCV does something special with such a SE, this code should not change your image at all.
You should create a larger SE:
kernel = np.ones((7, 7), np.uint8)
img = cv2.dilate(img, kernel, iterations=1)
img = cv2.erode(img, kernel, iterations=1)
This will first dilate and then erode the result. What this accomplishes is that small (thin) black regions disappear. These are the regions where the SE didn't fit. This is the same as
img = cv2.morphologyEx(img, cv2.MORPH_CLOSE, kernel)
To remove the long line, you want to apply a closing with an elongated SE:
kernel = np.ones((1, 30), np.uint8)
line = cv2.morphologyEx(img, cv2.MORPH_CLOSE, kernel)
This leaves only the horizontal line. The difference of img and line is the text without the line.
If you think of img as the sum of line and text, then img - line will be text. However, there is a small problem still: img has white background (255), and black foreground. So really, it is img = 255 - text - line, and the line image you found above is really 255 - line, because it also has white background. So directly taking the difference will not produce the desired effect.
The solution is to invert your images first:
img = 255 - img;
line = 255 - line;
text = img - line;