If the title isn't clear let's say I have a list of images (10k+), and I have a target image I am searching for.
Here's an example of the target image:
Here's an example of images I will want to be searching to find something 'similar' (ex1, ex2, and ex3):
Here's the matching I do (I use KAZE)
from matplotlib import pyplot as plt
import numpy as np
import cv2
from typing import List
import os
import imutils
def calculate_matches(des1: List[cv2.KeyPoint], des2: List[cv2.KeyPoint]):
"""
does a matching algorithm to match if keypoints 1 and 2 are similar
#param des1: a numpy array of floats that are the descriptors of the keypoints
#param des2: a numpy array of floats that are the descriptors of the keypoints
#return:
"""
# bf matcher with default params
bf = cv2.BFMatcher(cv2.NORM_L2)
matches = bf.knnMatch(des1, des2, k=2)
topResults = []
for m, n in matches:
if m.distance < 0.7 * n.distance:
topResults.append([m])
return topResults
def compare_images_kaze():
cwd = os.getcwd()
target = os.path.join(cwd, 'opencv_target', 'target.png')
images_list = os.listdir('opencv_images')
for image in images_list:
# get my 2 images
img2 = cv2.imread(target)
img1 = cv2.imread(os.path.join(cwd, 'opencv_images', image))
for i in range(0, 360, int(360 / 8)):
# rotate my image by i
img_target_rotation = imutils.rotate_bound(img2, i)
# Initiate KAZE object with default values
kaze = cv2.KAZE_create()
kp1, des1 = kaze.detectAndCompute(img1, None)
kp2, des2 = kaze.detectAndCompute(img2, None)
matches = calculate_matches(des1, des2)
try:
score = 100 * (len(matches) / min(len(kp1), len(kp2)))
except ZeroDivisionError:
score = 0
print(image, score)
img3 = cv2.drawMatchesKnn(img1, kp1, img_target_rotation, kp2, matches,
None, flags=2)
img3 = cv2.cvtColor(img3, cv2.COLOR_BGR2RGB)
plt.imshow(img3)
plt.show()
plt.clf()
if __name__ == '__main__':
compare_images_kaze()
Here's the result of my code:
ex1.png 21.052631578947366
ex2.png 0.0
ex3.png 42.10526315789473
It does alright! It was able to tell that ex1 is similar and ex2 is not similar, however it states that ex3 is similar (even more similar than ex1). Any extra pre-processing or post-processing (maybe ml, assuming ml is actually useful) or just changes I can do to my method that can be done to keep only ex1 as similar and not ex3?
(Note this score I create is something I found online. Not sure if it's an accurate way to go about it)
ADDED MORE EXAMPLES BELOW
Another set of examples:
Here's what I am searching for
I want the above image to be similar to the middle and bottom images (NOTE: I rotate my target image by 45 degrees and compare it to the images below.)
Feature matching (as stated in answers below) were useful in found similarity with the second image, but not the third image (Even after rotating it properly)
Detecting The Most Similar Image
The Code
You can use template matching, where the image you want to detect if it's in the other images is the template. I have that small image saved in template.png, and the other three images in img1.png, img2.png and img3.png.
I defined a function that utilizes the cv2.matchTemplate to calculate the amount of confidence for if a template is in an image. Using the function on every image, the one that results ion the highest confidence is the image that contains the template:
import cv2
template = cv2.imread("template.png", 0)
files = ["img1.png", "img2.png", "img3.png"]
for name in files:
img = cv2.imread(name, 0)
print(f"Confidence for {name}:")
print(cv2.matchTemplate(img, template, cv2.TM_CCOEFF_NORMED).max())
The Output:
Confidence for img1.png:
0.8906427
Confidence for img2.png:
0.4427919
Confidence for img3.png:
0.5933967
The Explanation:
Import the opencv module, and read in the template image as grayscale by setting the second parameter of the cv2.imread method to 0:
import cv2
template = cv2.imread("template.png", 0)
Define your list of images of which you want to determine which one contains the template:
files = ["img1.png", "img2.png", "img3.png"]
Loop through the filenames and read in each one as a grayscale image:
for name in files:
img = cv2.imread(name, 0)
Finally, you can use the cv2.matchTemplate to detect the template in each image. There are many detection methods you can use, but for this I decided to use the cv2.TM_CCOEFF_NORMED method:
print(f"Confidence for {name}:")
print(cv2.matchTemplate(img, template, cv2.TM_CCOEFF_NORMED).max())
The output of the function ranges from between 0 and 1, and as you can see, it successfully detected that the first image is most likely to contain the template image (it has the highest level of confidence).
The Visualization
The Code
If detecting which image contains the template isn't enough, and you want a visualization, you can try the code below:
import cv2
import numpy as np
def confidence(img, template):
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
template = cv2.cvtColor(template, cv2.COLOR_BGR2GRAY)
res = cv2.matchTemplate(img, template, cv2.TM_CCOEFF_NORMED)
conf = res.max()
return np.where(res == conf), conf
files = ["img1.png", "img2.png", "img3.png"]
template = cv2.imread("template.png")
h, w, _ = template.shape
for name in files:
img = cv2.imread(name)
([y], [x]), conf = confidence(img, template)
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 0, 255), 2)
text = f'Confidence: {round(float(conf), 2)}'
cv2.putText(img, text, (x, y), 1, cv2.FONT_HERSHEY_PLAIN, (0, 0, 0), 2)
cv2.imshow(name, img)
cv2.imshow('Template', template)
cv2.waitKey(0)
The Output:
The Explanation:
Import the necessary libraries:
import cv2
import numpy as np
Define a function that will take in a full image and a template image. As the cv2.matchTemplate method requires grayscale images, convert the 2 images into grayscale:
def confidence(img, template):
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
template = cv2.cvtColor(template, cv2.COLOR_BGR2GRAY)
Use the cv2.matchTemplate method to detect the template in the image, and return the position of the point with the highest confidence, and return the highest confidence:
res = cv2.matchTemplate(img, template, cv2.TM_CCOEFF_NORMED)
conf = res.max()
return np.where(res == conf), conf
Define your list of images you want to determine which one contains the template, and read in the template image:
files = ["img1.png", "img2.png", "img3.png"]
template = cv2.imread("template.png")
Get the size of the template image to later use for drawing a rectangle on the images:
h, w, _ = template.shape
Loop though the filenames and read in each image. Using the confidence function we defined before, get the x y position of the top-left corner of the detected template and the confidence amount for the detection:
for name in files:
img = cv2.imread(name)
([y], [x]), conf = confidence(img, template)
Draw a rectangle on the image at the corner and put the text on the image. Finally, show the image:
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 0, 255), 2)
text = f'Confidence: {round(float(conf), 2)}'
cv2.putText(img, text, (x, y), 1, cv2.FONT_HERSHEY_PLAIN, (0, 0, 0), 2)
cv2.imshow(name, img)
Also, show the template for comparison:
cv2.imshow('Template', template)
cv2.waitKey(0)
I'm not sure, if the given images resemble your actual task or data, but for this kind of images, you could try simple template matching, cf. this OpenCV tutorial.
Basically, I just implemented the tutorial with some modifications:
import cv2
import matplotlib.pyplot as plt
# Read images
examples = [cv2.imread(img) for img in ['ex1.png', 'ex2.png', 'ex3.png']]
target = cv2.imread('target.png')
h, w = target.shape[:2]
# Iterate examples
for i, img in enumerate(examples):
# Template matching
# cf. https://docs.opencv.org/4.5.2/d4/dc6/tutorial_py_template_matching.html
res = cv2.matchTemplate(img, target, cv2.TM_CCOEFF_NORMED)
# Get location of maximum
_, max_val, _, top_left = cv2.minMaxLoc(res)
# Set up threshold for decision target found or not
thr = 0.7
if max_val > thr:
# Show found target in example
bottom_right = (top_left[0] + w, top_left[1] + h)
cv2.rectangle(img, top_left, bottom_right, (0, 255, 0), 2)
# Visualization
plt.figure(i, figsize=(10, 5))
plt.subplot(1, 2, 1), plt.imshow(img[..., [2, 1, 0]]), plt.title('Example')
plt.subplot(1, 2, 2), plt.imshow(res, vmin=0, vmax=1, cmap='gray')
plt.title('Matching result'), plt.colorbar(), plt.tight_layout()
plt.show()
These are the results:
----------------------------------------
System information
----------------------------------------
Platform: Windows-10-10.0.16299-SP0
Python: 3.9.1
PyCharm: 2021.1.1
Matplotlib: 3.4.1
OpenCV: 4.5.1
----------------------------------------
EDIT: To emphasize the information from the different colors, one might use the hue channel from the HSV color space for the template matching:
import cv2
import matplotlib.pyplot as plt
# Read images
examples = [
[cv2.imread(img) for img in ['ex1.png', 'ex2.png', 'ex3.png']],
[cv2.imread(img) for img in ['ex12.png', 'ex22.png', 'ex32.png']]
]
targets = [
cv2.imread('target.png'),
cv2.imread('target2.png')
]
# Iterate examples and targets
for i, (ex, target) in enumerate(zip(examples, targets)):
for j, img in enumerate(ex):
# Rotate last image from second data set
if (i == 1) and (j == 2):
img = cv2.rotate(img, cv2.ROTATE_90_CLOCKWISE)
h, w = target.shape[:2]
# Get hue channel from HSV color space
target_h = cv2.cvtColor(target, cv2.COLOR_BGR2HSV)[..., 0]
img_h = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)[..., 0]
# Template matching
# cf. https://docs.opencv.org/4.5.2/d4/dc6/tutorial_py_template_matching.html
res = cv2.matchTemplate(img_h, target_h, cv2.TM_CCOEFF_NORMED)
# Get location of maximum
_, max_val, _, top_left = cv2.minMaxLoc(res)
# Set up threshold for decision target found or not
thr = 0.6
if max_val > thr:
# Show found target in example
bottom_right = (top_left[0] + w, top_left[1] + h)
cv2.rectangle(img, top_left, bottom_right, (0, 255, 0), 2)
# Visualization
plt.figure(i * 10 + j, figsize=(10, 5))
plt.subplot(1, 2, 1), plt.imshow(img[..., [2, 1, 0]]), plt.title('Example')
plt.subplot(1, 2, 2), plt.imshow(res, vmin=0, vmax=1, cmap='gray')
plt.title('Matching result'), plt.colorbar(), plt.tight_layout()
plt.savefig('{}.png'.format(i * 10 + j))
plt.show()
New results:
The Concept
We can use the cv2.matchTemplate method to detect where an image is in another image, but for your second set of images you have rotation. Also, we'll need to take the colors into account.
cv2.matchTemplate will take in an image, a template (the other image) and a template detection method, and will return a grayscale array where the brightest point in the grayscale array will be the point with the most confidence that template is at that point.
We can use the template at 4 different angles and use the one that resulted in the highest confidence. When we detected a possible point that matched the template, we use a function (that we will define ourselves) to check if the most frequent colors in the template is present in the patch of the image we detected. If not, then ignore the patch, regardless of the amount of confidence returned.
The Code
import cv2
import numpy as np
def frequent_colors(img, vals=3):
colors, count = np.unique(np.vstack(img), return_counts=True, axis=0)
sorted_by_freq = colors[np.argsort(count)]
return sorted_by_freq[-vals:]
def get_templates(img):
template = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
for i in range(3):
yield cv2.rotate(template, i)
def detect(img, template, min_conf=0.45):
colors = frequent_colors(template)
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
conf_max = min_conf
shape = 0, 0, 0, 0
for tmp in get_templates(template):
h, w = tmp.shape
res = cv2.matchTemplate(img_gray, tmp, cv2.TM_CCOEFF_NORMED)
for y, x in zip(*np.where(res > conf_max)):
conf = res[y, x]
if conf > conf_max:
seg = img[y:y + h, x:x + w]
if all(np.any(np.all(seg == color, -1)) for color in colors):
conf_max = conf
shape = x, y, w, h
return shape
files = ["img1_2.png", "img2_2.png", "img3_2.png"]
template = cv2.imread("template2.png")
for name in files:
img = cv2.imread(name)
x, y, w, h = detect(img, template)
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 0, 255), 2)
cv2.imshow(name, img)
cv2.imshow('Template', template)
cv2.waitKey(0)
The Output
The Explanation
Import the necessary libraries:
import cv2
import numpy as np
Define a function, frequent_colors, that will take in an image and return the most frequent colors in the image. An optional parameter, val, is how many colors to return; if val is 3, then the 3 most frequent colors will be returned:
def frequent_colors(img, vals=3):
colors, count = np.unique(np.vstack(img), return_counts=True, axis=0)
sorted_by_freq = colors[np.argsort(count)]
return sorted_by_freq[-vals:]
Define a function, get_templates, that will take in an image, and yield the image (in grayscale) at 4 different angles - original, 90 clockwise, 180, and 90 counterclockwise:
def get_templates(img):
template = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
for i in range(3):
yield cv2.rotate(template, i)
Define a function, detect, that will take in an image and a template image, and return the x, y, w, h of the bounding box of the detected template on the image, and for this function we will be utilizing the frequent_colors and get_templates functions defined earlier. The min_conf parameter will be the minimum amount of confidence needed to classify a detection as an actual detection:
def detect(img, template, min_conf=0.45):
Detect the three most frequent colors in the template and store them in a variable, colors. Also, define a grayscale version of the main image:
colors = frequent_colors(template)
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
Define the initial value for the greatest confidence detected, and initial values for the detected patch:
conf_max = min_conf
shape = 0, 0, 0, 0
Loop though the grayscale templates at 4 angles, get the shape of the grayscale template (as rotation changes the shape), and use the cv2.matchTemplate method to get the grayscale array of detected templates on the image:
for tmp in get_templates(template):
h, w = tmp.shape
res = cv2.matchTemplate(img_gray, tmp, cv2.TM_CCOEFF_NORMED)
Loop though the x, y coordinates of the detected templates where the confidence is greater than conf_min, and store the confidence in a variable, conf. If conf is greater than the initial greatest confidence variable (conf_max), proceed to detect if all three most frequent colors in the template is present in the patch of the image:
for y, x in zip(*np.where(res > conf_max)):
conf = res[y, x]
if conf > conf_max:
seg = img[y:y + h, x:x + w]
if all(np.any(np.all(seg == color, -1)) for color in colors):
conf_max = conf
shape = x, y, w, h
At the end we can return the shape. If no template is detected in the image, the shape will be the initial values defined for it, 0, 0, 0, 0:
return shape
Finally, loop though each image and use the detect function we defined to get the x, y, w, h of the bounding box. Use the cv2.rectangle method to draw the bounding box onto the images:
files = ["img1_2.png", "img2_2.png", "img3_2.png"]
template = cv2.imread("template2.png")
for name in files:
img = cv2.imread(name)
x, y, w, h = detect(img, template)
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 0, 255), 2)
cv2.imshow(name, img)
cv2.imshow('Template', template)
cv2.waitKey(0)
First, the data appears in graphs, aren't you able to get the overlapping values from their numerical data?
And have you tried performing some edge detection for the change in color from white-blue and then from blue-red, fitting some circles to those edges and then checking if they overlap?
Since the input data is quite controlled (no organic photography or videos), perhaps you won't have to go the ML route.
I have captcha image as attached in this question.
I am trying to extract the text in the image. My following code is able to make all areas except the text and lines in white color
import cv2
from PIL import Image
import numpy as np
image1 = Image.open("E:\\python\\downloaded\\captcha.png").convert('L')
image2 = Image.open("E:\\python\\downloaded\\captcha.png").convert('L')
pix = image1.load()
for column in range(0, image1.height):
for row in range(0, image1.width):
if pix[row, column] >= 90:
pix[row, column] = 255
cv2.imshow("1", np.array(image2))
cv2.imshow("2", np.array(image1))
cv2.waitKey(0)
But I am trying to remove the line crossing the text, but it does not seem to work. I tried with below portion of code which is posted on other question in StackOverflow. But it does not work
def eliminate_zeros(self,vector):
return [(dex,v) for (dex,v) in enumerate(vector) if v!=0 ]
def get_line_position(self,img):
sumx=img.sum(axis=0)
list_without_zeros=self.eliminate_zeros(sumx)
min1,min2=heapq.nsmallest(2,list_without_zeros,key=itemgetter(1))
l=[dex for [dex,val] in enumerate(sumx) if val==min1[1] or val==min2[1]]
mindex=[l[0],l[len(l)-1]]
cols=img[:,mindex[:]]
col1=cols[:,0]
col2=cols[:,1]
col1_without_0=self.eliminate_zeros(col1)
col2_without_0=self.eliminate_zeros(col2)
line_length=len(col1_without_0)
dex1=col1_without_0[round(len(col1_without_0)/2)][0]
dex2=col2_without_0[round(len(col2_without_0)/2)][0]
p1=[dex1,mindex[0]]
p2=[dex2,mindex[1]]
return p1,p2,line_length
def remove_line(self,p1,p2,LL,img):
m=(p2[0]-p1[0])/(p2[1]-p1[1]) if p2[1]!=p1[1] else np.inf
w,h=len(img),len(img[0])
x=list(range(h))
y=list(map(lambda z : int(np.round(p1[0]+m*(z-p1[1]))),x))
img_removed_line=list(img)
for dex in range(h):
i,j=y[dex],x[dex]
i=int(i)
j=int(j)
rlist=[]
while i>=0 and i<len(img_removed_line)-1:
f1=i
if img_removed_line[i][j]==0 and img_removed_line[i-1][j]==0:
break
rlist.append(i)
i=i-1
i,j=y[dex],x[dex]
i=int(i)
j=int(j)
while i>=0 and i<len(img_removed_line)-1:
f2=i
if img_removed_line[i][j]==0 and img_removed_line[i+1][j]==0:
break
rlist.append(i)
i=i+1
if np.abs(f2-f1) in [LL+1,LL,LL-1]:
rlist=list(set(rlist))
for k in rlist:
img_removed_line[k][j]=0
return img_removed_line
I am new to CV and can someone help here to suggest the way?. Original and partially processed image files are attached here.
My approach is based on the fact that the line is thinner than the characters. In this example I used blurring, threshold and morphology to get rid of the line between the characters. The result is this:
import cv2
import numpy as np
image = cv2.imread('captcha.png')
image = cv2.blur(image, (3, 3))
ret, image = cv2.threshold(image, 90, 255, cv2.THRESH_BINARY)
image = cv2.dilate(image, np.ones((3, 1), np.uint8))
image = cv2.erode(image, np.ones((2, 2), np.uint8))
cv2.imshow("1", np.array(image))
cv2.waitKey(0)
You can use CV2 functions like threshold, dilate, bitwise_and and bitwise_not for removing unwanted lines from captcha
import numpy as np
import cv2
img = cv2.imread('captcha.jpg',0)
horizontal_inv = cv2.bitwise_not(img)
masked_img = cv2.bitwise_and(img, img, mask=horizontal_inv)
masked_img_inv = cv2.bitwise_not(masked_img)
kernel = np.ones((5,5),np.uint8)
dilation = cv2.dilate(masked_img_inv,kernel,iterations = 3)
ret,thresh2 = cv2.threshold(dilation,254,255,cv2.THRESH_BINARY_INV)
thresh2=cv2.bitwise_not(thresh2)
cv2.waitKey(0)
cv2.destroyAllWindows()
I am working on a project where I should apply and OCR on some documents.
The first step is to threshold the image and let only the writing (whiten the background).
Example of an input image: (For the GDPR and privacy reasons, this image is from the Internet)
Here is my code:
import cv2
import numpy as np
image = cv2.imread('b.jpg')
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
h = image.shape[0]
w = image.shape[1]
for y in range(0, h):
for x in range(0, w):
if image[y, x] >= 120:
image[y, x] = 255
else:
image[y, x] = 0
cv2.imwrite('output.jpg', image)
Here is the result that I got:
When I applied pytesseract to the output image, the results were not satisfying (I know that an OCR is not perfect). Although I tried to adjust the threshold value (in this code it is equal to 120), the result was not as clear as I wanted.
Is there a way to make a better threshold in order to only keep the writing in black and whiten the rest?
After digging deep in StackOverflow questions, I found this answer which is about removing watermark using opencv.
I adapted the code to my needs and this is what I got:
import numpy as np
import cv2
image = cv2.imread('a.png')
img = image.copy()
alpha =2.75
beta = -160.0
denoised = alpha * img + beta
denoised = np.clip(denoised, 0, 255).astype(np.uint8)
#denoised = cv2.fastNlMeansDenoising(denoised, None, 31, 7, 21)
img = cv2.cvtColor(denoised, cv2.COLOR_BGR2GRAY)
h = img.shape[0]
w = img.shape[1]
for y in range(0, h):
for x in range(0, w):
if img[y, x] >= 220:
img[y, x] = 255
else:
img[y, x] = 0
cv2.imwrite('outpu.jpg', img)
Here is the output image:
The good thing about this code is that it gives good results not only with this image, but also with all the images that I tested.
I hope it helps anyone who had the same problem.
You can use adaptive thresholding. From documentation :
In this, the algorithm calculate the threshold for a small regions of the image. So we get different thresholds for different regions of the same image and it gives us better results for images with varying illumination.
import numpy as np
import cv2
image = cv2.imread('b.jpg')
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
image = cv2.medianBlur(image ,5)
th1 = cv2.adaptiveThreshold(image,255,cv2.ADAPTIVE_THRESH_MEAN_C,\
cv2.THRESH_BINARY,11,2)
th2 = cv2.adaptiveThreshold(image,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,\
cv2.THRESH_BINARY,11,2)
cv2.imwrite('output1.jpg', th1 )
cv2.imwrite('output2.jpg', th2 )