So I have applied contouring on a big image and reached the following cropped part of the image:
But now without using any machine learning model, how do I actually get the image to a text variable? I came to know about template matching but I do not understand how do I proceed from here. I do have images of letters and numbers (named according to their image value) stored in a directory, but how do I match each of them and get the text as a string? I don't want to use any ML model or library like pyTesseract.
I would appreciate any help.
Edit:
The code I have tried for template matching.
def templateMatch(image):
path = "location"
for image_path in os.listdir(path + "/characters-images"):
template = cv2.imread(os.path.join(path, "characters-images", image_path))
template = cv2.cvtColor(template, cv2.COLOR_BGR2GRAY)
template = template.astype(np.uint8)
image = image.astype(np.uint8)
res = cv2.matchTemplate(template, image, cv2.TM_SQDIFF_NORMED)
mn, _, mnLoc, _ = cv2.minMaxLoc(res)
if res is not None:
return image_path.replace(".bmp", "")
def match(image):
plate = ""
# mask = np.zeros(image.shape, dtype=np.uint8)
# print(image.shape)
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# print(image.shape)
# print(image)
thresh = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
cnts = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
(cnts, _) = contours.sort_contours(cnts, method="left-to-right")
for con in cnts:
area = cv2.contourArea(con)
if 800 > area > 200:
x, y, w, h = cv2.boundingRect(con)
# cv2.drawContours(mask, [c], 1, (255, 0, 0), 2)
temp = thresh[y:y+h, x:x+w]
character = templateMatching(temp)
if character is not None:
plate += character
return plate
How do I actually get the image to a text variable? I came to know about template matching but I do not understand how do I proceed from here.
Template matching is used to locate a object in an image given a template, not to extract text from an image. Matching a template with the position of the object in the image will not help to get the text as a string. For examples on how to apply dynamic scale variant template matching, take a look at how to isolate everything inside of a contour, scale it, and test the similarity to an image? and Python OpenCV line detection to detect X symbol in image. I don't understand why would wouldn't want to use an OCR library. If you want to extract text from the image as a string variable, you should use some type of deep/machine learning. PyTesseract is probably the easiest. Here's a solution using PyTesseract
The idea is to obtain a binary image using Otsu's threshold then perform contour area and aspect ratio filtering to extract the letter/number ROIs. From here we use Numpy slicing to crop each ROI onto a blank mask then apply OCR using Pytesseract. Here's a visualization of each step:
Binary image
Detected ROIs highlighted in green
Isolated ROIs on a blank mask ready for OCR
We use the --psm 6 configuration option to tell Pytesseract to assume a uniform block of text. Look here for more configuration options. Result from Pytesseract:
XS NB 23
Code
import cv2
import numpy as np
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
# Load image, create mask, grayscale, Otsu's threshold
image = cv2.imread('1.png')
mask = np.zeros(image.shape, dtype=np.uint8)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
# Filter for ROI using contour area and aspect ratio
cnts = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
area = cv2.contourArea(c)
peri = cv2.arcLength(c, True)
approx = cv2.approxPolyDP(c, 0.05 * peri, True)
x,y,w,h = cv2.boundingRect(approx)
aspect_ratio = w / float(h)
if area > 2000 and aspect_ratio > .5:
mask[y:y+h, x:x+w] = image[y:y+h, x:x+w]
# Perfrom OCR with Pytesseract
data = pytesseract.image_to_string(mask, lang='eng', config='--psm 6')
print(data)
cv2.imshow('thresh', thresh)
cv2.imshow('mask', mask)
cv2.waitKey()
An option is to consider the bounding box around the characters and to compute the correlation score between a character at hand and those in the training set. You will keep the largest correlation score. (One of SAD, SSD, normalized grayscale correlation or just Hamming distance if your work on a binary image).
You will need to develop a suitable strategy to ensure that the tested characters and the learnt characters have compatible sizes and are properly overlaid.
Related
I am trying to detect texts and remove those thick letters. The goal is same proposed by the following link: How to detect text on an X-Ray image with OpenCV. ("to extract the oriented bounding boxes as a matrix")
Two sample images:
Sample Image #1
Sample Image #2
Here, the fonts L, J, C, O (from 1st image) and L, D, A, N, circle-shape (from 2nd image) must be detected. I tried the methods proposed by above link (How to detect text on an X-Ray image with OpenCV), however, it fails to detect the texts in the background.
Original Image --> Binary image
Binary Image (thresholding)
Morph close --> Detected text
Morph close
detected texts (nothing)
As you can see, it fails to detect the texts from the background. It just returns black image. Don't know what happened.
The code:
import cv2
import numpy as np
# Load image, create mask, grayscale, Gaussian blur, Otsu's threshold
image = cv2.imread('HandImage.png')
original = image.copy()
blank = np.zeros(image.shape[:2], dtype=np.uint8)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (5,5), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
# Merge text into a single contour
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5))
close = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel, iterations=3)
# Find contours
cnts = cv2.findContours(close, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
# Filter using contour area and aspect ratio
x,y,w,h = cv2.boundingRect(c)
area = cv2.contourArea(c)
ar = w / float(h)
if (ar > 1.4 and ar < 4) or ar < .85 and area > 10 and area < 500:
# Find rotated bounding box
rect = cv2.minAreaRect(c)
box = cv2.boxPoints(rect)
box = np.int0(box)
cv2.drawContours(image,[box],0,(36,255,12),2)
cv2.drawContours(blank,[box],0,(255,255,255),-1)
# Bitwise operations to isolate text
extract = cv2.bitwise_and(thresh, blank)
extract = cv2.bitwise_and(original, original, mask=extract)
cv2.imshow('thresh', thresh)
cv2.imshow('image', image)
cv2.imshow('close', close)
cv2.imshow('extract', extract)
cv2.waitKey()
I am new to the computer vision and image manipulation... Please help!
Update:
I have figured out the error in my code. I had to adjust and identify the right size/area of the bounding box that selected for the texts.
The code:
# Find contours
cnts = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
cnts = sorted(cnts, key = lambda x: cv2.boundingRect(x)[0])
for c in cnts:
# Filter using contour area and aspect ratio (x1 = width, y1 = height)
x, y, x1, y1 = cv2.boundingRect(c)
area = cv2.contourArea(c)
if (30 < x1 < 200 or 350 < x1 < 500) and 50 < y1 < 350:
# Using the area above to calculate the rotation
rect = cv2.minAreaRect(c)
box = cv2.boxPoints(rect)
translated_box = box - np.mean(box, axis=0)
scaled_box = translated_box * 2 # 2 is scale factor
retranslated_box = scaled_box + np.mean(box, axis=0)
box = np.int0(retranslated_box)
cv2.drawContours(image, [box], 0, (36,255,12), 2)
cv2.drawContours(blank, [box], 0, (255,255,255), -1)
But I also liked the suggestion by #Mikel B, where he used a neural network method.
Not a solution in OpenCV as requested. But I leave it here in case a neural network-based solution could be of interest for anybody
First of all install the package by:
pip install craft-text-detector
Then copy paste the following
from craft_text_detector import Craft
# import craft functions
from craft_text_detector import (
read_image,
load_craftnet_model,
load_refinenet_model,
get_prediction,
export_detected_regions,
export_extra_results,
empty_cuda_cache
)
# set image path and export folder directory
image = '/path/to/JP6x0.png' # can be filepath, PIL image or numpy array
output_dir = 'outputs/'
# read image
image = read_image(image)
# load models
refine_net = load_refinenet_model(cuda=False)
craft_net = load_craftnet_model(cuda=False)
# perform prediction
prediction_result = get_prediction(
image=image,
craft_net=craft_net,
refine_net=refine_net,
text_threshold=0.5,
link_threshold=0.1,
low_text=0.1,
cuda=False,
long_size=1280
)
# export detected text regions
exported_file_paths = export_detected_regions(
image=image,
regions=prediction_result["boxes"],
output_dir=output_dir,
rectify=True
)
# export heatmap, detection points, box visualization
export_extra_results(
image=image,
regions=prediction_result["boxes"],
heatmaps=prediction_result["heatmaps"],
output_dir=output_dir
)
# unload models from gpu
empty_cuda_cache()
My results with these parameters:
I have the following image to analyze and extract the porosity out of it (black dots).
My issue is that my code might be picking up on the scratches (defects from polishing the cross-sections).
I am looking for a way to remove the scratches.
I tried Inpainting (https://github.com/Chandrika372/Image-Restoration-by-Inpainting-Algorithm/blob/main/Inpainting.py) but the scratches are too small for detection. I also found few examples running on C and with Matlab, but if I use my image as an input, the output is still the same; the scratches seem too fine for the algos.
Any help would be appreciated.
A simple Otsu's threshold should do it. The idea is to Gaussian blur to remove noise then cv2.bitwise_and to extract the dots.
Binary image
Result
You can also optionally filter out large/small dots using contour area filtering with a threshold value
import cv2
# Load image, grayscale, Gaussian blur, Otsu's threshold
image = cv2.imread("1.png")
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (5,5), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# OPTIONAL to filter out small/large dots using contour area filtering
# Adjust the area to only keep larger dots
'''
DOT_AREA = 10
cnts = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
area = cv2.contourArea(c)
if area < DOT_AREA:
cv2.drawContours(thresh, [c], -1, 0, -1)
'''
# Bitwise_and to extract dots
result = cv2.bitwise_and(image, image, mask=thresh)
result[thresh==0] = 255
cv2.imshow("thresh", thresh)
cv2.imshow("result", result)
cv2.waitKey()
I have two pictures of the same dimension and I want to detect and replace the white region in the first picture (black image) at the same location in the second picture. Is there any way to do this using OpenCV? I want to replace the blue region in the original image with the white region in the first picture.
First picture
Original image
If I'm understanding you correctly, you want to replace the white ROI on the black image onto the original image. Here's a simple approach:
Obtain binary image. Load image, grayscale, Gaussian blur, then Otsu's threshold
Extract ROI and replace. Find contours with cv2.findContours then filter using contour approximation with cv2.arcLength and cv2.approxPolyDP. With the assumption that the region is a rectangle, if the contour approximation result is 4 then we have found our desired region. In addition, we filter using cv2.contourArea to ensure that we don't include noise. Finally we obtain the bounding box coordinates with cv2.boundingRect and extract the ROI with Numpy slicing. Finally we replace the ROI into the original image.
Detected region to extract/replace highlighted in green
Extracted ROI
Result
Code
import cv2
# Load images, grayscale, Gaussian blur, Otsu's threshold
original = cv2.imread('1.jpg')
image = cv2.imread('2.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (3,3), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
# Find contours, filter using contour approximation + area, then extract
# ROI using Numpy slicing and replace into original image
cnts = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
peri = cv2.arcLength(c, True)
approx = cv2.approxPolyDP(c, 0.015 * peri, True)
area = cv2.contourArea(c)
if len(approx) == 4 and area > 1000:
x,y,w,h = cv2.boundingRect(c)
ROI = image[y:y+h,x:x+w]
original[y:y+h, x:x+w] = ROI
cv2.imshow('thresh', thresh)
cv2.imshow('ROI', ROI)
cv2.imshow('original', original)
cv2.waitKey()
I want to remove the color from the below image, due to this color I am unable to extract the text clearly from the image.
I am using the below code, but I am not getting the clear text,
import numpy as np
from PIL import Image
im = Image.open('my_file.tif')
im = im.convert('RGBA')
data = np.array(im)
# just use the rgb values for comparison
rgb = data[:,:,:3]
color = [246, 213, 139] # Original value
black = [0,0,0, 255]
white = [255,255,255,255]
mask = np.all(rgb == color, axis = -1)
# change all pixels that match color to white
data[mask] = white
# change all pixels that don't match color to black
##data[np.logical_not(mask)] = black
new_im = Image.fromarray(data)
new_im.save('new_file.tif')
and
def black_and_white(input_image_path,
output_image_path):
color_image = Image.open(input_image_path)
bw = color_image.convert('L')
bw.save(output_image_path)
Please help me with this...
Image 2:
I'm assuming you want to extract the quote. To do this, you can do a series of filtering operations to remove non-text contours. Once you have the processed result you can use an OCR tool such as Pytesseract for text extraction.
Result from OCR
On behalf of the hundreds of ACLU activists who
called on Governor Walker to veto House Bill
156, we are disappointed that he did not put
students or the Constitution first today.”
—Joshua A. Decker
Executive Director
Code
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
# Load image and threshold
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
# Connect text with a horizontal shaped kernel
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (10,3))
dilate = cv2.dilate(thresh, kernel, iterations=3)
# Remove non-text contours using aspect ratio filtering
cnts = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
x,y,w,h = cv2.boundingRect(c)
aspect = w/h
if aspect < 3:
cv2.drawContours(thresh, [c], -1, (0,0,0), -1)
# Invert image and OCR
result = 255 - thresh
data = pytesseract.image_to_string(result, lang='eng',config='--psm 6')
print(data)
cv2.imshow('result', result)
cv2.waitKey()
Try OpenCV conversion, but remember to use 3 chanels, otherwise you'll get error
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
I have a problem because I have separate registrations for photos. Now I would like to get the registration number from the photo. Unfortunately, the effectiveness of the code I wrote is very low and I would like to ask for help in achieving greater efficiency. any tips?
In the first phase, the photo looks like this
Then transform the photo to gray and only the black color contrasts
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
# define range of black color in HSV
lower_val = np.array([0,0,0])
upper_val = np.array([179,100,130])
# Threshold the HSV image to get only black colors
mask = cv2.inRange(hsv, lower_val, upper_val)
receives
What can I add or do to improve the effectiveness of the program. is there a way for the program to retrieve registrations a bit? Will this help
configr = ('-l eng --oem 1 --psm 6-c tessedit_char_whitelist=ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789')
text = pytesseract.image_to_string(mask,lang='eng', config=configr)
print(text)
Here's an approach:
Color threshold to extract black text. We load the image, convert to HSV colorspace, define a lower and upper color range, and use cv2.inRange() to color threshold and obtain a binary mask
Perform morphological operations. Create a kernel and perform morph close to fill holes in the contours.
Filter license plate contours. Find contours and filter using bounding rectangle area. If a contour passes this filter, we extract the ROI and paste it onto a new blank mask.
OCR using Pytesseract. We invert the image so desired text is black and throw it into Pytesseract.
Here's a visualization of each step:
Obtained mask from color thresholding + morph closing
Filter for license plate contours highlighted in green
Pasted plate contours onto a blank mask
Inverted image ready for Tesseract
Result from Tesseract OCR
PZ 689LR
Code
import numpy as np
import pytesseract
import cv2
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
# Load image, create blank mask, convert to HSV, define thresholds, color threshold
image = cv2.imread('1.png')
result = np.zeros(image.shape, dtype=np.uint8)
hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
lower = np.array([0,0,0])
upper = np.array([179,100,130])
mask = cv2.inRange(hsv, lower, upper)
# Perform morph close and merge for 3-channel ROI extraction
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))
close = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, kernel, iterations=1)
extract = cv2.merge([close,close,close])
# Find contours, filter using contour area, and extract using Numpy slicing
cnts = cv2.findContours(close, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
x,y,w,h = cv2.boundingRect(c)
area = w * h
if area < 5000 and area > 2500:
cv2.rectangle(image, (x, y), (x + w, y + h), (36,255,12), 3)
result[y:y+h, x:x+w] = extract[y:y+h, x:x+w]
# Invert image and throw into Pytesseract
invert = 255 - result
data = pytesseract.image_to_string(invert, lang='eng',config='--psm 6')
print(data)
cv2.imshow('image', image)
cv2.imshow('close', close)
cv2.imshow('result', result)
cv2.imshow('invert', invert)
cv2.waitKey()