If the title isn't clear let's say I have a list of images (10k+), and I have a target image I am searching for.
Here's an example of the target image:
Here's an example of images I will want to be searching to find something 'similar' (ex1, ex2, and ex3):
Here's the matching I do (I use KAZE)
from matplotlib import pyplot as plt
import numpy as np
import cv2
from typing import List
import os
import imutils
def calculate_matches(des1: List[cv2.KeyPoint], des2: List[cv2.KeyPoint]):
"""
does a matching algorithm to match if keypoints 1 and 2 are similar
#param des1: a numpy array of floats that are the descriptors of the keypoints
#param des2: a numpy array of floats that are the descriptors of the keypoints
#return:
"""
# bf matcher with default params
bf = cv2.BFMatcher(cv2.NORM_L2)
matches = bf.knnMatch(des1, des2, k=2)
topResults = []
for m, n in matches:
if m.distance < 0.7 * n.distance:
topResults.append([m])
return topResults
def compare_images_kaze():
cwd = os.getcwd()
target = os.path.join(cwd, 'opencv_target', 'target.png')
images_list = os.listdir('opencv_images')
for image in images_list:
# get my 2 images
img2 = cv2.imread(target)
img1 = cv2.imread(os.path.join(cwd, 'opencv_images', image))
for i in range(0, 360, int(360 / 8)):
# rotate my image by i
img_target_rotation = imutils.rotate_bound(img2, i)
# Initiate KAZE object with default values
kaze = cv2.KAZE_create()
kp1, des1 = kaze.detectAndCompute(img1, None)
kp2, des2 = kaze.detectAndCompute(img2, None)
matches = calculate_matches(des1, des2)
try:
score = 100 * (len(matches) / min(len(kp1), len(kp2)))
except ZeroDivisionError:
score = 0
print(image, score)
img3 = cv2.drawMatchesKnn(img1, kp1, img_target_rotation, kp2, matches,
None, flags=2)
img3 = cv2.cvtColor(img3, cv2.COLOR_BGR2RGB)
plt.imshow(img3)
plt.show()
plt.clf()
if __name__ == '__main__':
compare_images_kaze()
Here's the result of my code:
ex1.png 21.052631578947366
ex2.png 0.0
ex3.png 42.10526315789473
It does alright! It was able to tell that ex1 is similar and ex2 is not similar, however it states that ex3 is similar (even more similar than ex1). Any extra pre-processing or post-processing (maybe ml, assuming ml is actually useful) or just changes I can do to my method that can be done to keep only ex1 as similar and not ex3?
(Note this score I create is something I found online. Not sure if it's an accurate way to go about it)
ADDED MORE EXAMPLES BELOW
Another set of examples:
Here's what I am searching for
I want the above image to be similar to the middle and bottom images (NOTE: I rotate my target image by 45 degrees and compare it to the images below.)
Feature matching (as stated in answers below) were useful in found similarity with the second image, but not the third image (Even after rotating it properly)
Detecting The Most Similar Image
The Code
You can use template matching, where the image you want to detect if it's in the other images is the template. I have that small image saved in template.png, and the other three images in img1.png, img2.png and img3.png.
I defined a function that utilizes the cv2.matchTemplate to calculate the amount of confidence for if a template is in an image. Using the function on every image, the one that results ion the highest confidence is the image that contains the template:
import cv2
template = cv2.imread("template.png", 0)
files = ["img1.png", "img2.png", "img3.png"]
for name in files:
img = cv2.imread(name, 0)
print(f"Confidence for {name}:")
print(cv2.matchTemplate(img, template, cv2.TM_CCOEFF_NORMED).max())
The Output:
Confidence for img1.png:
0.8906427
Confidence for img2.png:
0.4427919
Confidence for img3.png:
0.5933967
The Explanation:
Import the opencv module, and read in the template image as grayscale by setting the second parameter of the cv2.imread method to 0:
import cv2
template = cv2.imread("template.png", 0)
Define your list of images of which you want to determine which one contains the template:
files = ["img1.png", "img2.png", "img3.png"]
Loop through the filenames and read in each one as a grayscale image:
for name in files:
img = cv2.imread(name, 0)
Finally, you can use the cv2.matchTemplate to detect the template in each image. There are many detection methods you can use, but for this I decided to use the cv2.TM_CCOEFF_NORMED method:
print(f"Confidence for {name}:")
print(cv2.matchTemplate(img, template, cv2.TM_CCOEFF_NORMED).max())
The output of the function ranges from between 0 and 1, and as you can see, it successfully detected that the first image is most likely to contain the template image (it has the highest level of confidence).
The Visualization
The Code
If detecting which image contains the template isn't enough, and you want a visualization, you can try the code below:
import cv2
import numpy as np
def confidence(img, template):
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
template = cv2.cvtColor(template, cv2.COLOR_BGR2GRAY)
res = cv2.matchTemplate(img, template, cv2.TM_CCOEFF_NORMED)
conf = res.max()
return np.where(res == conf), conf
files = ["img1.png", "img2.png", "img3.png"]
template = cv2.imread("template.png")
h, w, _ = template.shape
for name in files:
img = cv2.imread(name)
([y], [x]), conf = confidence(img, template)
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 0, 255), 2)
text = f'Confidence: {round(float(conf), 2)}'
cv2.putText(img, text, (x, y), 1, cv2.FONT_HERSHEY_PLAIN, (0, 0, 0), 2)
cv2.imshow(name, img)
cv2.imshow('Template', template)
cv2.waitKey(0)
The Output:
The Explanation:
Import the necessary libraries:
import cv2
import numpy as np
Define a function that will take in a full image and a template image. As the cv2.matchTemplate method requires grayscale images, convert the 2 images into grayscale:
def confidence(img, template):
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
template = cv2.cvtColor(template, cv2.COLOR_BGR2GRAY)
Use the cv2.matchTemplate method to detect the template in the image, and return the position of the point with the highest confidence, and return the highest confidence:
res = cv2.matchTemplate(img, template, cv2.TM_CCOEFF_NORMED)
conf = res.max()
return np.where(res == conf), conf
Define your list of images you want to determine which one contains the template, and read in the template image:
files = ["img1.png", "img2.png", "img3.png"]
template = cv2.imread("template.png")
Get the size of the template image to later use for drawing a rectangle on the images:
h, w, _ = template.shape
Loop though the filenames and read in each image. Using the confidence function we defined before, get the x y position of the top-left corner of the detected template and the confidence amount for the detection:
for name in files:
img = cv2.imread(name)
([y], [x]), conf = confidence(img, template)
Draw a rectangle on the image at the corner and put the text on the image. Finally, show the image:
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 0, 255), 2)
text = f'Confidence: {round(float(conf), 2)}'
cv2.putText(img, text, (x, y), 1, cv2.FONT_HERSHEY_PLAIN, (0, 0, 0), 2)
cv2.imshow(name, img)
Also, show the template for comparison:
cv2.imshow('Template', template)
cv2.waitKey(0)
I'm not sure, if the given images resemble your actual task or data, but for this kind of images, you could try simple template matching, cf. this OpenCV tutorial.
Basically, I just implemented the tutorial with some modifications:
import cv2
import matplotlib.pyplot as plt
# Read images
examples = [cv2.imread(img) for img in ['ex1.png', 'ex2.png', 'ex3.png']]
target = cv2.imread('target.png')
h, w = target.shape[:2]
# Iterate examples
for i, img in enumerate(examples):
# Template matching
# cf. https://docs.opencv.org/4.5.2/d4/dc6/tutorial_py_template_matching.html
res = cv2.matchTemplate(img, target, cv2.TM_CCOEFF_NORMED)
# Get location of maximum
_, max_val, _, top_left = cv2.minMaxLoc(res)
# Set up threshold for decision target found or not
thr = 0.7
if max_val > thr:
# Show found target in example
bottom_right = (top_left[0] + w, top_left[1] + h)
cv2.rectangle(img, top_left, bottom_right, (0, 255, 0), 2)
# Visualization
plt.figure(i, figsize=(10, 5))
plt.subplot(1, 2, 1), plt.imshow(img[..., [2, 1, 0]]), plt.title('Example')
plt.subplot(1, 2, 2), plt.imshow(res, vmin=0, vmax=1, cmap='gray')
plt.title('Matching result'), plt.colorbar(), plt.tight_layout()
plt.show()
These are the results:
----------------------------------------
System information
----------------------------------------
Platform: Windows-10-10.0.16299-SP0
Python: 3.9.1
PyCharm: 2021.1.1
Matplotlib: 3.4.1
OpenCV: 4.5.1
----------------------------------------
EDIT: To emphasize the information from the different colors, one might use the hue channel from the HSV color space for the template matching:
import cv2
import matplotlib.pyplot as plt
# Read images
examples = [
[cv2.imread(img) for img in ['ex1.png', 'ex2.png', 'ex3.png']],
[cv2.imread(img) for img in ['ex12.png', 'ex22.png', 'ex32.png']]
]
targets = [
cv2.imread('target.png'),
cv2.imread('target2.png')
]
# Iterate examples and targets
for i, (ex, target) in enumerate(zip(examples, targets)):
for j, img in enumerate(ex):
# Rotate last image from second data set
if (i == 1) and (j == 2):
img = cv2.rotate(img, cv2.ROTATE_90_CLOCKWISE)
h, w = target.shape[:2]
# Get hue channel from HSV color space
target_h = cv2.cvtColor(target, cv2.COLOR_BGR2HSV)[..., 0]
img_h = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)[..., 0]
# Template matching
# cf. https://docs.opencv.org/4.5.2/d4/dc6/tutorial_py_template_matching.html
res = cv2.matchTemplate(img_h, target_h, cv2.TM_CCOEFF_NORMED)
# Get location of maximum
_, max_val, _, top_left = cv2.minMaxLoc(res)
# Set up threshold for decision target found or not
thr = 0.6
if max_val > thr:
# Show found target in example
bottom_right = (top_left[0] + w, top_left[1] + h)
cv2.rectangle(img, top_left, bottom_right, (0, 255, 0), 2)
# Visualization
plt.figure(i * 10 + j, figsize=(10, 5))
plt.subplot(1, 2, 1), plt.imshow(img[..., [2, 1, 0]]), plt.title('Example')
plt.subplot(1, 2, 2), plt.imshow(res, vmin=0, vmax=1, cmap='gray')
plt.title('Matching result'), plt.colorbar(), plt.tight_layout()
plt.savefig('{}.png'.format(i * 10 + j))
plt.show()
New results:
The Concept
We can use the cv2.matchTemplate method to detect where an image is in another image, but for your second set of images you have rotation. Also, we'll need to take the colors into account.
cv2.matchTemplate will take in an image, a template (the other image) and a template detection method, and will return a grayscale array where the brightest point in the grayscale array will be the point with the most confidence that template is at that point.
We can use the template at 4 different angles and use the one that resulted in the highest confidence. When we detected a possible point that matched the template, we use a function (that we will define ourselves) to check if the most frequent colors in the template is present in the patch of the image we detected. If not, then ignore the patch, regardless of the amount of confidence returned.
The Code
import cv2
import numpy as np
def frequent_colors(img, vals=3):
colors, count = np.unique(np.vstack(img), return_counts=True, axis=0)
sorted_by_freq = colors[np.argsort(count)]
return sorted_by_freq[-vals:]
def get_templates(img):
template = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
for i in range(3):
yield cv2.rotate(template, i)
def detect(img, template, min_conf=0.45):
colors = frequent_colors(template)
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
conf_max = min_conf
shape = 0, 0, 0, 0
for tmp in get_templates(template):
h, w = tmp.shape
res = cv2.matchTemplate(img_gray, tmp, cv2.TM_CCOEFF_NORMED)
for y, x in zip(*np.where(res > conf_max)):
conf = res[y, x]
if conf > conf_max:
seg = img[y:y + h, x:x + w]
if all(np.any(np.all(seg == color, -1)) for color in colors):
conf_max = conf
shape = x, y, w, h
return shape
files = ["img1_2.png", "img2_2.png", "img3_2.png"]
template = cv2.imread("template2.png")
for name in files:
img = cv2.imread(name)
x, y, w, h = detect(img, template)
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 0, 255), 2)
cv2.imshow(name, img)
cv2.imshow('Template', template)
cv2.waitKey(0)
The Output
The Explanation
Import the necessary libraries:
import cv2
import numpy as np
Define a function, frequent_colors, that will take in an image and return the most frequent colors in the image. An optional parameter, val, is how many colors to return; if val is 3, then the 3 most frequent colors will be returned:
def frequent_colors(img, vals=3):
colors, count = np.unique(np.vstack(img), return_counts=True, axis=0)
sorted_by_freq = colors[np.argsort(count)]
return sorted_by_freq[-vals:]
Define a function, get_templates, that will take in an image, and yield the image (in grayscale) at 4 different angles - original, 90 clockwise, 180, and 90 counterclockwise:
def get_templates(img):
template = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
for i in range(3):
yield cv2.rotate(template, i)
Define a function, detect, that will take in an image and a template image, and return the x, y, w, h of the bounding box of the detected template on the image, and for this function we will be utilizing the frequent_colors and get_templates functions defined earlier. The min_conf parameter will be the minimum amount of confidence needed to classify a detection as an actual detection:
def detect(img, template, min_conf=0.45):
Detect the three most frequent colors in the template and store them in a variable, colors. Also, define a grayscale version of the main image:
colors = frequent_colors(template)
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
Define the initial value for the greatest confidence detected, and initial values for the detected patch:
conf_max = min_conf
shape = 0, 0, 0, 0
Loop though the grayscale templates at 4 angles, get the shape of the grayscale template (as rotation changes the shape), and use the cv2.matchTemplate method to get the grayscale array of detected templates on the image:
for tmp in get_templates(template):
h, w = tmp.shape
res = cv2.matchTemplate(img_gray, tmp, cv2.TM_CCOEFF_NORMED)
Loop though the x, y coordinates of the detected templates where the confidence is greater than conf_min, and store the confidence in a variable, conf. If conf is greater than the initial greatest confidence variable (conf_max), proceed to detect if all three most frequent colors in the template is present in the patch of the image:
for y, x in zip(*np.where(res > conf_max)):
conf = res[y, x]
if conf > conf_max:
seg = img[y:y + h, x:x + w]
if all(np.any(np.all(seg == color, -1)) for color in colors):
conf_max = conf
shape = x, y, w, h
At the end we can return the shape. If no template is detected in the image, the shape will be the initial values defined for it, 0, 0, 0, 0:
return shape
Finally, loop though each image and use the detect function we defined to get the x, y, w, h of the bounding box. Use the cv2.rectangle method to draw the bounding box onto the images:
files = ["img1_2.png", "img2_2.png", "img3_2.png"]
template = cv2.imread("template2.png")
for name in files:
img = cv2.imread(name)
x, y, w, h = detect(img, template)
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 0, 255), 2)
cv2.imshow(name, img)
cv2.imshow('Template', template)
cv2.waitKey(0)
First, the data appears in graphs, aren't you able to get the overlapping values from their numerical data?
And have you tried performing some edge detection for the change in color from white-blue and then from blue-red, fitting some circles to those edges and then checking if they overlap?
Since the input data is quite controlled (no organic photography or videos), perhaps you won't have to go the ML route.
Related
I am currently working on a small project, but I have an unresolved problem. That is I want to draw a shape through the desired objects , The first thing is to determine the coordinates of the starting and ending points but I don't have a specific idea yet but I don't know how to do it,I hope you can give me suggestions, Glad to have your help.
i want the result in like this
Here is an example using opencv you can draw rectangle over an object:
By giving the template image of the source image you can draw the shape over the image
# importing needed libraries
import cv2
import numpy as np
img_rgb = cv2.imread(source image) # opening the source image
img_gray = cv2.cvtColor(img_rgb, cv2.COLOR_BGR2GRAY) # converting to the gray scale
template = cv2.imread(template image,0) # opening the template image
w, h = template.shape[::-1] # giving the sape of template image
res = cv2.matchTemplate(img_gray,template,cv2.TM_CCOEFF_NORMED) # matching both the image using the opencv methods for matching object
threshold = 0.9
loc = np.where( res >= threshold)
for pt in zip(*loc[::-1]):
cv2.rectangle(img_rgb, pt, (pt[0] + w, pt[1] + h), (0,0,255), 1)
cv2.imshow('screen',img_rgb)
cv2.waitKey(0)
Source Image
Template Image
Result Image
This is what I do, I take some images and some of them contain information that I need. Being these the images:
How do I find that information? I use a template that contains two symbols (Euro and Dollar), when this symbols are found in any of the images, then I can process the image and try to extract the data that I need.
How do I extract the data? I take the dimensions of the found match, and since I know that the information to extract will always be contained to the right of the match, I dimension a box towards the right edge of my image, with which I make sure I have a box with the data to extract.
Here is the code, I will divide it into several sections to explain the process a little better:
1) Initial Settings for the Code (Imports, a list of images which will be processed, a couple of functions to filter the image and finally the configuration set for reading data from Tesseract):
import cv2
import numpy as np
from matplotlib import pyplot as plt
import pytesseract
from pytesseract import Output
imagenes = ["monitor1.jpg", "monitor2.jpg", "monitor3.jpg"]
# get grayscale image
def get_grayscale(image):
return cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Noise Removal (This is the filter I am using)
def remove_noise(image):
return cv2.medianBlur(image,5)
# The configuration we will use to read the images:
my_config = r"--psm 11 --oem 3"
2) Next, the template with which we will try to match is read and we take its dimensions (w=width and h=height).
We present the methods to find the matches and enter a loop reviewing image by image, trying to find a matching image:
# Reading the Template (Euro and Dollar):
# template_simbolo = cv2.imread('template_euro_dolar.jpg', 0)
template = cv2.imread('template_simbolos.jpg', 0)
w, h = template.shape[::-1]
# The methods we will use to find the matchs (This should be a list of 6 methods
# but working with a big list of images it takes an eternity, so we will only
# use one by now for the tests) :
methods = ['cv2.TM_CCOEFF']
# A loop where we are going to filter every image, find the matchs if there are
# and extract the data:
for img in imagenes:
print("**************************")
# Image to read:
img_rgb = cv2.imread(img)
# The image filtered in gray:
gray = get_grayscale(img_rgb)
img_gray = remove_noise(gray)
# With res we will find the matches but we only take the accurate ones (80% or more)
res = cv2.matchTemplate(img_gray, template, cv2.TM_CCOEFF_NORMED)
threshold = 0.8
loc = np.where(res >= threshold)
print(loc)
print(len(loc[0]))
3) In this part of the code, first we enclose the match in a box, we filter the original image, and we look for the coordinates of the match, once this is done, we proceed to enclose in a box the section where the desired information is found.
# If loc contains values it is because there is a match:
if len(loc[0]) > 0:
print("Match Found")
# We enclose the found matches in a box and save the result:
for pt in zip(*loc[::-1]):
cv2.rectangle(img_rgb, pt, (pt[0] + w, pt[1] + h), (0, 0, 255), 2)
cv2.imwrite('res_monitor.png', img_rgb)
# A loop of matching methods:
for meth in methods:
# We filter the image and change it to a gray color
gray = get_grayscale(img_rgb)
img_gray = remove_noise(gray)
# We evaluate the method to use and according to it we have some
# default coordinates
method = eval(meth)
min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(res)
print("min_val:", min_val)
print("max_val:", max_val)
print("min_loc:", min_loc)
print("max_loc:", max_loc)
# If the method is TM_SQDIFF or TM_SQDIFF_NORMED, take minimum
if method in [cv2.TM_SQDIFF, cv2.TM_SQDIFF_NORMED]:
top_left = min_loc
else:
top_left = max_loc
# To know the bottom right coordinate, we respectively take the value
# of top_left and add the width w and height h to know this coordinate.
w, h = template.shape[::-1]
bottom_right = (top_left[0] + w, top_left[1] + h)
print("top_left:", top_left)
print("bottom_right:", bottom_right)
print("x:", top_left[0])
print("y:", top_left[1])
# Now, in our original image, which we previously filtered, we will
# place a box or rectangle according to the dimensions established
# before. (top_left and bottom_right)
w, h = img_gray.shape[::-1]
print("w:", w)
print("h:", bottom_right[1])
cv2.rectangle(img_gray, top_left, bottom_right, 255, 2)
imagen = cv2.rectangle(img_gray, top_left, (w, bottom_right[1]), 255, 2)
x = top_left[0]
y = top_left[1]
w = w
h = bottom_right[1]
# Finally we crop this section of the code where we established the area
# to review and with pytesseract we look for the data that we can obtain
# from said cropped image.
crop_image = img_gray[y:h, x:w]
cv2.imwrite("croped.jpg", crop_image)
data = pytesseract.image_to_data(crop_image, config=my_config,
output_type=Output.DICT)
print(data, "\n")
4) Finally we create a dictionary to save the rate of the euro and the dollar, if everything goes well they will be saved correctly.
At the end of this process, a plot is shown to verify that the information was extracted correctly.
# We create a dictionary to store the values of Euro and Dollar
i = 0
currencies= {}
for value in data["text"]:
print(value)
try:
currency= value.replace(".", "").replace(",", ".")
currency= float(currency)
i = i + 1
if i == 1:
currencies["Euro"] = currency
elif i == 2 and currency< currencies["Euro"]:
currencies["Dolar"] = currency
except: ValueError
# We pass the image to string to obtain the rates of the currencies
text = pytesseract.image_to_string(crop_image, config = my_config)
print(text)
print(currencies)
# We graph the results and confirm that the data extraction and the
# demarcated area are correct.
plt.subplot(121),plt.imshow(res, cmap = 'gray')
plt.title('Matching Result'), plt.xticks([]), plt.yticks([])
plt.subplot(122),plt.imshow(img_gray, cmap = 'gray')
plt.title('Detected Point'), plt.xticks([]), plt.yticks([])
plt.suptitle(meth)
plt.show()
else:
print("DOES NOT MATCH")
The results:
With the code and all the logic presented above, it usually works very well on these images, but for some reason sometimes it doesn't read the image properly, it doesn't save the information in the desired way.
As can be seen, the area that the code takes is the desired one, but the currency dictionary does not record any information:
Which is strange because if I run the code on a longer list of images, that same image is recognized perfectly.
So the problem here is that sometimes it worked and sometimes it didn't, and I'm not quite sure why. Does anyone know what I can polish? What am I doing wrong? Any advice?
I have a 3D film image that has star-like figures in itself. I have a template whose shape is (21,21,1) that helps to find the initial point (rectangle) on the image. I have done the block matching part and be able to determine some matched coordinates which are sometimes correct and incorrect due to the different pixel intensity of the image. The image and template are all gray. The following are my codes, results, and the expected results. Any help (idea) to solve this problem will be acknowledged
Codes for template matching
image = cv2.imread('45_gray.jpg', 0 )
template = cv2.imread('45tmpl.jpg', 0)
(tW, tH) = template.shape[::-1]
result = cv2.matchTemplate(image, template, cv2.TM_CCOEFF_NORMED)
#print(result)
threshold = 0.8
(yCoords, xCoords) = np.where(result >= threshold)
clone = image.copy()
#print(yCoords, xCoords)
for (x, y) in zip(xCoords, yCoords):
# draw the bounding box on the image
cv2.rectangle(clone, (x, y), (x + tW, y + tH),(255, 247, 263), 3)
# show our output image *before* applying non-maxima suppression
#cv2.imshow("Before NMS", clone)
#cv2.waitKey(0)
cv2.imwrite('output match.jpg', clone)
I don't think this template matching will work out very well here. As you may notice in the input image there is a gradient from bottom left to top-right, The image is apprearing to fade in that direction. So on the top left side the features are not that pronounced for the template matching to work efficiently. I would recommend to first convert this image to a binary image using adaptiveThreshold technique as:
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img_adaptive = cv2.adaptiveThreshold(img_gray, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 15, 5)
Once you have the bianry image which looks consistent, that is not containing any gradient, you can either follow the template matching path, or in this case I chose to follow an alternate method of detecting the straight lines and then treating their intersection as the point of interest.
Now Detecting the lines could be done using cv2.HoughLinesP, but it has it's own set of parameters which need to be tweaked properly to get it working so I simply counted the number of while pixels present in each row and column respectively and filtered the local maxima points. That is, I chose the rows which had more white pixel count than neighbours and did the same for each column as well.
import cv2
import numpy as np
from scipy.signal import argrelextrema
img = cv2.imread("/home/anmol/Downloads/x5JQF.jpg")
img = cv2.resize(img, None, fx=0.4, fy=0.4)
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img_adaptive = cv2.adaptiveThreshold(img_gray, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 11, 5)
cv2.imwrite("./debug.png", img_adaptive)
row_white_pixel_count = np.array([np.count_nonzero(img_adaptive[i,:]) for i in range(img_adaptive.shape[0])])
col_white_pixel_count = np.array([np.count_nonzero(img_adaptive[:,i]) for i in range(img_adaptive.shape[1])])
row_maxima = argrelextrema(row_white_pixel_count, np.greater, order=50)[0]
col_maxima = argrelextrema(col_white_pixel_count, np.greater, order=50)[0]
all_intersection_coords = []
for row_idx in row_maxima:
for col_idx in col_maxima:
all_intersection_coords.append((col_idx, row_idx))
for coord in all_intersection_coords:
img = cv2.circle(img, coord, 10, (0, 240, 0), 2)
I work with logos and other simple graphics, in which there are no gradients or complex patterns. My task is to extract from the logo segments with letters and other elements.
To do this, I define the background color, and then I go through the picture in order to segment the images. Here is my code for more understanding:
MAXIMUM_COLOR_TRANSITION_DELTA = 100 # 0 - 765
def expand_segment_recursive(image, unexplored_foreground, segment, point, color):
height, width, _ = image.shape
# Unpack coordinates from point
py, px = point
# Create list of pixels to check
neighbourhood_pixels = [(py, px + 1), (py, px - 1), (py + 1, px), (py - 1, px)]
allowed_zone = unexplored_foreground & np.invert(segment)
for y, x in neighbourhood_pixels:
# Add pixel to segment if its coordinates within the image shape and its color differs from segment color no
# more than MAXIMUM_COLOR_TRANSITION_DELTA
if y in range(height) and x in range(width) and allowed_zone[y, x]:
color_delta = np.sum(np.abs(image[y, x].astype(np.int) - color.astype(np.int)))
print(color_delta)
if color_delta <= MAXIMUM_COLOR_TRANSITION_DELTA:
segment[y, x] = True
segment = expand_segment_recursive(image, unexplored_foreground, segment, (y, x), color)
allowed_zone = unexplored_foreground & np.invert(segment)
return segment
if __name__ == "__main__":
if len(sys.argv) < 2:
print("Pass image as the argument to use the tool")
exit(-1)
IMAGE_FILENAME = sys.argv[1]
print(IMAGE_FILENAME)
image = cv.imread(IMAGE_FILENAME)
height, width, _ = image.shape
# To filter the background I use median value of the image, as background in most cases takes > 50% of image area.
background_color = np.median(image, axis=(0, 1))
print("Background color: ", background_color)
# Create foreground mask to find segments in it (TODO: Optimize this part)
foreground = np.zeros(shape=(height, width, 1), dtype=np.bool)
for y in range(height):
for x in range(width):
if not np.array_equal(image[y, x], background_color):
foreground[y, x] = True
unexplored_foreground = foreground
for y in range(height):
for x in range(width):
if unexplored_foreground[y, x]:
segment = np.zeros(foreground.shape, foreground.dtype)
segment[y, x] = True
segment = expand_segment_recursive(image, unexplored_foreground, segment, (y, x), image[y, x])
cv.imshow("segment", segment.astype(np.uint8) * 255)
while cv.waitKey(0) != 27:
continue
Here is the desired result:
In the end of run-time I expect 13 extracted separated segments (for this particular image). But instead I got RecursionError: maximum recursion depth exceeded, which is not surprising as expand_segment_recursive() can be called for every pixel of the image. And since even with small image resolution of 600x500 i got at maximum 300K calls.
My question is how can I get rid of recursion in this case and possibly optimize the algorithm with Numpy or OpenCV algorithms?
You can actually use a thresholded image (binary) and connectedComponents to do this job in a couple of steps. Also, you may use findContours or other methods.
Here is the code:
import numpy as np
import cv2
# load image as greyscale
img = cv2.imread("hp.png", 0)
# puts 0 to the white (background) and 255 in other places (greyscale value < 250)
_, thresholded = cv2.threshold(img, 250, 255, cv2.THRESH_BINARY_INV)
# gets the labels and the amount of labels, label 0 is the background
amount, labels = cv2.connectedComponents(thresholded)
# lets draw it for visualization purposes
preview = np.zeros((img.shape[0], img.shape[2], 3), dtype=np.uint8)
print (amount) #should be 3 -> two components + background
# draw label 1 blue and label 2 green
preview[labels == 1] = (255, 0, 0)
preview[labels == 2] = (0, 255, 0)
cv2.imshow("frame", preview)
cv2.waitKey(0)
At the end, the thresholded image will look like this:
and the preview image (the one with the colored segments) will look like this:
With the mask you can always use numpy functions to get things like, coordinates of the segments you want or to color them (like I did with preview)
UPDATE
To get different colored segments, you may try to create a "border" between the segments. Since they are plain colors and not gradients, you can try to do an edge detector like canny and then put it black in the image....
import numpy as np
import cv2
img = cv2.imread("total.png", 0)
# background to black
img[img>=200] = 0
# get edges
canny = cv2.Canny(img, 60, 180)
# make them thicker
kernel = np.ones((3,3),np.uint8)
canny = cv2.morphologyEx(canny, cv2.MORPH_DILATE, kernel)
# apply edges as border in the image
img[canny==255] = 0
# same as before
amount, labels = cv2.connectedComponents(img)
preview = np.zeros((img.shape[0], img.shape[1], 3), dtype=np.uint8)
print (amount) #should be 14 -> 13 components + background
# color them randomly
for i in range(1, amount):
preview[labels == i] = np.random.randint(0,255, size=3, dtype=np.uint8)
cv2.imshow("frame", preview )
cv2.waitKey(0)
The result is:
I want to perform operation on the region of my interest..that is central rectangular table which you can see in the image.
I am able to give the co-ordinates of my region of interest manually and crop that part
img = cv2.imread('test12.jpg',0)
box = img[753:1915,460:1315]
but i want to crop that part automatically without giving the pixels or coordinates manually.Can anyone please help me with this?
http://picpaste.com/test12_-_Copy-BXqHMAnd.jpg this is my original image.
http://picpaste.com/boxdemo-zHz57dBM.jpg this is my cropped image.
for doing this I entered the coordinates of the desired region and cropped.
But , now i have to deal with many similar images where the coordinates of my region of interest will slightly vary. I want a method which will detect the table(my region of interest) and crop it.
Currently I'm using this
img = cv2.imread('test12.jpg',0)
box = img[753:1915,460:1315]
to crop my image.
You could try using the openCV Template Matching to find the coordinates of your rectangular table within the image.
Template Matching
The following is a test program to find the coordinates for images I am trying to find.
from __future__ import print_function
import cv2
import numpy as np
from matplotlib import pyplot as plt
try:
img = cv2.imread(r'new_webcam_image.jpg',0)
template = cv2.imread(r'table_template.jpg',0)
except IOError as e:
print("({})".format(e))
else:
img2 = img.copy()
w, h = template.shape[::-1]
# All the 6 methods for comparison in a list
methods = ['cv2.TM_CCOEFF', 'cv2.TM_CCOEFF_NORMED', 'cv2.TM_CCORR',
'cv2.TM_CCORR_NORMED', 'cv2.TM_SQDIFF', 'cv2.TM_SQDIFF_NORMED']
for meth in methods:
img = img2.copy()
method = eval(meth)
# Apply template Matching
res = cv2.matchTemplate(img,template,method)
min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(res)
print("Method: %s" , meth)
print("min_val: " , min_val)
print("max_val: " , max_val)
print("min_loc: " , min_loc)
print("max_loc: " , max_loc)
print(" ")
# If the method is TM_SQDIFF or TM_SQDIFF_NORMED, take minimum
if method in [cv2.TM_SQDIFF, cv2.TM_SQDIFF_NORMED]:
top_left = min_loc
else:
top_left = max_loc
bottom_right = (top_left[0] + w, top_left[1] + h)
cv2.rectangle(img,top_left, bottom_right, 255, 2)
plt.subplot(121),plt.imshow(res,cmap = 'gray')
plt.title('Matching Result'), plt.xticks([]), plt.yticks([])
plt.subplot(122),plt.imshow(img,cmap = 'gray')
plt.title('Detected Point'), plt.xticks([]), plt.yticks([])
plt.suptitle(meth) #; plt.legend([min_val, max_val, min_loc, max_loc], ["min_val", "max_val", "min_loc", "max_loc"])
plt.show()
box = img[top_left[1]:top_left[1]+h,0:bottom_right[1]+w]
cv2.imshow("cropped", box)
cv2.waitKey(0)
I don't have a full solution for you. The code shown was based on some code I was using to fix output from a scanner. The template solution to me sounds like a better approach, but the following should give you something else to work with.
import cv2
imageSrc = cv2.imread("test12.jpg")
# First cut the source down slightly
h = imageSrc.shape[0]
w = imageSrc.shape[1]
cropInitial = 50
imageSrc = imageSrc[100:50+(h-cropInitial*2), 50:50+(w-cropInitial*2)]
# Threshold the image and find edges (to reduce the amount of pixels to count)
ret, imageDest = cv2.threshold(imageSrc, 220, 255, cv2.THRESH_BINARY_INV)
imageDest = cv2.Canny(imageDest, 100, 100, 3)
# Create a list of remaining pixels
points = cv2.findNonZero(imageDest)
# Calculate a bounding rectangle for these points
hull = cv2.convexHull(points)
x,y,w,h = cv2.boundingRect(hull)
# Crop the original image to the bounding rectangle
imageResult = imageSrc[y:y+h,x:x+w]
cv2.imwrite("test12 cropped.jpg", imageResult)
The output does not crop as much as you need. Playing with the various threshold parameters should improve your results.
I suggest using imshow at various points on imageThresh and imageDest so you can see what is happening at each stage in the code. Hopefully this helps you progress.