opencv pattern matching not works - python

I need find all images within image, for this idea I have found great solution:
import cv2
import numpy as np
img_rgb = cv2.imread('source.png')
img_gray = cv2.cvtColor(img_rgb, cv2.COLOR_BGR2GRAY)
template = cv2.imread('block.png', 0)
w, h = template.shape[::-1]
res = cv2.matchTemplate(img_gray, template, cv2.TM_CCOEFF)
threshold = 0.8
loc = np.where(res >= threshold)
for pt in zip(*loc[::-1]):
cv2.rectangle(img_rgb, pt, (pt[0] + w, pt[1] + h), (0, 0, 255), 2)
# cv2.imwrite('res.png', img_rgb)
cv2.imshow('output', img_rgb)
cv2.waitKey(0)
Source data:
https://i.stack.imgur.com/cE5bM.png (source)
https://i.stack.imgur.com/BgzAA.png (template)
I tried to use this code but failed.
What I see now:
What I expected to get:
What's wrong?
I am using python 3.5 and opencv 3.3.0.10
PS: very interesting thing that another solution works perfect but finds only 1 match (best one)

I am definitely no expert on OpenCV and it's various template matching methods (though coincidentally I had started to play around with it).
However, a couple of things in your example stand out.
You use the cv2.TM_CCOEFF method which gives results that are universally way above the 0.8 threshold. So everywhere in the image matches giving a massive red rectangle blob.
If you want to use this method try cv2.TM_CCOEFF_NORMED to normalise the results to below 1.
But my best 10 minute attempt was using;
method = cv2.TM_CCORR_NORMED
and setting
threshold = 0.512
which gave;
This is fairly unsatisfactory though because the threshold had to be 'tuned' fairly precisely to remove most of the mismatches. There is undoubtedly a better way to get a more reliable stand-out match.

Related

Rectangle detection inaccuracy using approxPolyDP() in openCV

As part of a program which contains a series of images to be processed, I first need to detect a green-coloured rectangle. I'm trying to write a program that doesn't use colour masking, since the lighting and glare on the images will make it difficult to find the appropriate HSV ranges.
(p.s. I already have two questions based on this program, but this one is unrelated to those. It's not a follow up, I want to address a separate issue.)
I used the standard rectangle detection technique, making use of findContours() and approxPolyDp() methods. I added some constraints that got rid of unnecessary rectangles (like aspectRatio>2.5, since my desired rectangle is clearly the "widest" and area>1500, to discard random small rectangles) .
import numpy as np
import cv2 as cv
img = cv.imread("t19.jpeg")
width=0
height=0
start_x=0
start_y=0
end_x=0
end_y=0
output = img.copy()
gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
#threshold
th = cv.adaptiveThreshold(gray,255,cv.ADAPTIVE_THRESH_GAUSSIAN_C,cv.THRESH_BINARY,9,2)
cv.imshow("th",th)
#rectangle detection
contours, _ = cv.findContours(th, cv.RETR_TREE, cv.CHAIN_APPROX_NONE)
for contour in contours:
approx = cv.approxPolyDP(contour, 0.01* cv.arcLength(contour, True), True)
cv.drawContours(img, [approx], 0, (0, 0, 0), 5)
x = approx.ravel()[0]
y = approx.ravel()[1]
x1 ,y1, w, h = cv.boundingRect(approx)
a=w*h
if len(approx) == 4 and x>15 :
aspectRatio = float(w)/h
if aspectRatio >= 2.5 and a>1500:
print(x1,y1,w,h)
width=w
height=h
start_x=x1
start_y=y1
end_x=start_x+width
end_y=start_y+height
cv.rectangle(output, (start_x,start_y), (end_x,end_y), (0,0,255),3)
cv.putText(output, "rectangle "+str(x1)+" , " +str(y1-5), (x1, y1-5), cv.FONT_HERSHEY_COMPLEX, 0.5, (0, 0, 0))
cv.imshow("op",output)
print("start",start_x,start_y)
print("end", end_x,end_y)
print("width",width)
print("height",height)
It is working flawlessly for all the images, except one:
I used adaptive thresholding to create the threshold, which was used by the findContours() method.
I tried displaying the threshold and the output , and it looks like this:
The thresholds for the other images also looked similar...so I can't pinpoint what exactly has gone wrong in the rectangle detection procedure.
Some tweaks I have tried:
Changing the last two parameters in the adaptive parameters method.
I tried 11,1 , 9,1, and for both of them, the rectangle in the
threshold looked more prominent : but in this case the output
detected no rectangles at all.
I have already disregarded otsu thresholding, as it is not working
for about 4 of my test images.
What exactly can I tweak in the rectangle detection procedure for it to detect this rectangle?
I also request , if possible, only slight modifications to this method and not some entirely new method. As I have mentioned, this method is working perfectly for all of my other test images, and if the new suggested method works for this image but fails for the others, then I'll find myself back here asking why it failed.
Edit: The method that abss suggested worked for this image, however failed for:
image 4
image 1, far off
Other test images:
image 1, normal
image 2
image 3
image 9, part 1
image 9, part 2
You can easily do it by adding this line of code after your threshold
kernel = cv.getStructuringElement(cv.MORPH_RECT,(3,3))
th = cv.morphologyEx(th,cv.MORPH_OPEN,kernel)
This will remove noise within the image. you can see this link for more understanding about morphologyEx https://docs.opencv.org/master/d9/d61/tutorial_py_morphological_ops.html
The results I got is shown below
I have made a few modifications to your code so that it works with all of your test images. There are a few false positives that you may have to filter based on HSV color range for green (since your target is always a shade of green). Alternately you can take into account the fact that the one of the child hierarchy of your ROI contour is going to be > 0.4 or so times than the outer contour. Here are the modifications:
Used DoG for thresholding useful contours
Changed arcLength multiplier to 0.5 instead of 0.1 as square corners are not smooth
cv2.RETR_CCOMP to get 2 level hierarchy
Moved ApproxPolyDP inside to make it more efficient
Contour filter area changed to 600 to filter ROI for all test images
Removed a little bit of unnecessary code
Check with all the other test images that you may have and modify the parameters accordingly.
img = cv2.imread("/path/to/your_image")
width=0
height=0
start_x=0
start_y=0
end_x=0
end_y=0
output = img.copy()
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
gw, gs, gw1, gs1, gw2, gs2 = (3,1.0,7,3.0, 3, 2.0)
img_blur = cv2.GaussianBlur(gray, (gw, gw), gs)
g1 = cv2.GaussianBlur(img_blur, (gw1, gw1), gs1)
g2 = cv2.GaussianBlur(img_blur, (gw2, gw2), gs2)
ret, thg = cv2.threshold(g2-g1, 127, 255, cv2.THRESH_BINARY)
contours, hier = cv2.findContours(thg, cv2.RETR_CCOMP, cv2.CHAIN_APPROX_NONE)
img_cpy = img.copy()
width=0
height=0
start_x=0
start_y=0
end_x=0
end_y=0
for i in range(len(contours)):
if hier[0][i][2] == -1:
continue
x ,y, w, h = cv2.boundingRect(contours[i])
a=w*h
aspectRatio = float(w)/h
if aspectRatio >= 2.5 and a>600:
approx = cv2.approxPolyDP(contours[i], 0.05* cv2.arcLength(contours[i], True), True)
if len(approx) == 4 and x>15 :
width=w
height=h
start_x=x
start_y=y
end_x=start_x+width
end_y=start_y+height
cv2.rectangle(img_cpy, (start_x,start_y), (end_x,end_y), (0,0,255),3)
cv2.putText(img_cpy, "rectangle "+str(x)+" , " +str(y-5), (x, y-5), cv2.FONT_HERSHEY_COMPLEX, 0.5, (0, 0, 0))
plt.imshow(img_cpy)
print("start",start_x,start_y)
print("end", end_x,end_y)

Group contours with the same y value

I have been following a tutorial about computer vision and doing a little project to read the time from a game. The game time is formatted h:m. So far I got the h and m figured out using findContours, but I'm having trouble isolating the colon as the character shape is not continuous. Because of this when I try to matchTemplate the code freaks out and starts to use the dot to match to all the other digits.
Are there ways to group the contours by X?
Here are simplified code to get the reference digits, the code to get digits from the screen is basically the same.
refCnts = cv2.findContours(ref.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
refCnts = imutils.grab_contours(refCnts)
refCnts = contours.sort_contours(refCnts, method="left-to-right")[0]
digits = {}
# loop over the OCR-A reference contours
for (i, c) in enumerate(refCnts):
# compute the bounding box for the digit, extract it, and resize
# it to a fixed size
(x, y, w, h) = cv2.boundingRect(c)
roi = ref[y:y + h, x:x + w]
roi = cv2.resize(roi, (10, 13))
digits[i] = roi
Im new to python and opencv. Apologies in advance if this is a dumb question.
Here is the reference image I'm using:
Here is the input image I'm trying to read:
Do you have to use findCountours? Because there are better suited methods for such problems. For instance, you can use template matching as shown below:
These are input, template (cut out from your reference image), and output images:
import cv2
import numpy as np
# Read the input image & convert to grayscale
input_rgb = cv2.imread('input.png')
input_gray = cv2.cvtColor(input_rgb, cv2.COLOR_BGR2GRAY)
# Read the template (Using 0 to read image in grayscale mode)
template = cv2.imread('template.png', 0)
# Perform template matching - more on this here: https://docs.opencv.org/4.0.1/df/dfb/group__imgproc__object.html#ga3a7850640f1fe1f58fe91a2d7583695d
res = cv2.matchTemplate(input_gray,template,cv2.TM_CCOEFF_NORMED)
# Store the coordinates of matched area
# found the threshold value of .56 using trial & error using the input image - might be different in your game
lc = np.where( res >= 0.56)
# Draw a rectangle around the matched region
# I used the width and height of the template image but in practice you need to use a better method to accomplish this
w, h = template.shape[::-1]
for pt in zip(*lc[::-1]):
cv2.rectangle(input_rgb, pt, (pt[0] + w, pt[1] + h), (0,255,255), 1)
# display output
cv2.imshow('Detected',input_rgb)
# cv2.imwrite('output.png', input_rgb)
# cv2.waitKey(0)
# cv2.destroyAllWindows()
You may also look into text detection & recognition using openCV.

Python Opencv: Filter Image for Text Detection

I have these set of images I want to de-noise in order to run OCR on :
I am trying to read the 1973 from the image.
I have tried
import cv2,numpy as np
img=cv2.imread('uxWbP.png',0)
img = cv2.resize(img, (0, 0), fx=2, fy=2)
copy_img=np.copy(img)
#adaptive threshold as the image has different lighting conditions in different areas
thresh = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 21, 2)
contours, _ = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
#kill small contours
for i_cnt, cnt in enumerate(sorted(contours, key=lambda x: cv2.boundingRect(x)[0])):
_area = cv2.contourArea(cnt)
x, y, w, h = cv2.boundingRect(cnt)
x_y_area = w * h
if 10000 < x_y_area and x_y_area < 400000:
pass
# cv2.rectangle(copy_img, (x, y), (x + w, y + h), (255, 0, 255), 2)
# cv2.putText(copy_img, str(int(x_y_area)) + ' , ' + str(w) + ' , ' + str(h), (x, y + 10), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 0, 255), 2)
# cv2.drawContours(copy_img, [cnt], 0, (0, 255, 0), 1)
elif 10000 > x_y_area:
#write over small contours
cv2.drawContours(thresh, [cnt], -1, 255, -1)
cv2.imshow('img',copy_img)
cv2.imshow('thresh',thresh)
cv2.waitKey(0)
Which significantly improves the image to:
Any recommendations on how to filter this image sufficiently either on improvements to the filtered image or complete change from the start, that I could run OCR or some ML detection scripts on this? I'd like to split out the numbers for detection, but open to other methods as well.
Another thing to try - either separately from the blurring (or in combination with it - is the erosion/dilation game, as hinted at in the comment by #eldesgraciado , to whom I think a good part of the credit for these answers should go.
These two (erosion and dilation) can be applied one after the other, repeatedly. I think the trick is to change the kernel size. Anyway, I know I've used that to reduce noise in the past. Here's one example of dilation:
>>> import cv2
>>> import numpy as np
>>> im_0 = cv2.imread("FWM8b.png")
>>> k_size = 3
>>> kernel = np.ones((k_size, k_size), np.uint8)
>>> im_dilated = cv2.dilate(im_0, kernel, iterations=1)
>>> cv2.imshow("d", im_dilated)
>>> cv2.waitKey(0)
Make whatever kernel you want for erosion, and check out the effects.
>>> im_eroded = cv2.erode(im_0, kernel, iterations=1)
>>> cv2.imshow("erosion", im_eroded)
>>> cv2.waitKey(0)
Edit with possible improvements:
>>> im_blurred = cv2.GaussianBlur(im_dilated, (0, 0), 3)
>>> im_better = cv2.addWeighted(im_0, 0.5, im_blurred, 1.2, 0)
# Getting closer.
^ dilated, blurred, and combined (added) with original, 1st way
# Even better, I think.
im_better2 = cv2.addWeighted(im_0, 0.9, im_blurred, 1.7, 0)
^ dilated, blurred, and combined (added) with original, 2nd way
You could do artifact removal, but be careful not to get rid of the stalk of the 7. If you can keep the 7 together, you can do connected-component analysis and keep the biggest connected components.
You could sum the values of pixels on each column and each row, which would probably lead to something like this (very approximated - almost time for work). Note that I was much more careful with the green curve - sums of columns - but the consistency of scaling is probably off.
Note that this is more a sum of (255 - pixel_value). That could find you rectangles where your to-be-found glyphs (digits) should be. You could do a 2-d map of column_pixel_sum + row_pixel_sum, or just do some approximation, as I have done below.
Also to feel free to rotate the image (or take pixel sums at different angles), and combine your info for each rotation.
Lots of other things to try ... the suggestion by #eldesgraciado of a noise model is especially intriguing.
Another thing you could try out is to create a "noise model" and subtract it from the original image. First, take the image and apply Gaussian Blur with very low parameters, just barely blurring it, next subtract this mask from the image. From here, the steps are experimental: The difference should be again blurred and thresholded. Save this image. You run this pre-processing with various parameters and saving each time the final binary image, then, average the masks obtained so far. The persistent blobs should be the ones you are looking for... like some sort of spatial bandstop, I guess...
Keep experimenting.
Unsharp mask (my other answer) on this result image. More noise gone, but hurts the 7.
My first thought is to put on a Gaussian blur for a sort of "unsharp filter". (I think my second idea is better; it combines this blur-and-add with the erosion/dilation game. I posted it as a separate answer, because I think it is a different-enough strategy to merit that.) #eldesgraciado noted frequency stuff, which is basically what we're doing here. I'll put on some code and explanation. (Here is one answer to an SO post that has a lot about sharpening - the answer linked is a more variable unsharp mask written in Python. Do take the time to look at other answers - including this one, one of many simple implementations that look just like mine - though some are written in different programming languages.) You'll need to mess with parameters. It's possible this won't work, but it's the first thing I thought of.
>>> import cv2
>>> im_0 = cv2.imread("FWM8b.png")
>>> cv2.imshow("FWM8b.png", im_0)
>>> cv2.waitKey(0)
## Press any key.
>>> ## Here's where we get to frequency. We'll use a Gaussian Blur.
## We want to take out the "frequency" of changes from white to black
## and back to white that are less than the thickness of the "1973"
>>> k_size = 0 ## This is the kernal size - the "width frequency",
## if you will. Using zero gives a width based on sigmas in
## the Gaussian function.
## You'll want to experiment with this and the other
## parameters, perhaps trying to run OCR over the image
## after each combination of parameters.
## Hint, avoid even numbers, and think of it as a radius
>>> gs_border = 3
>>> im_blurred = cv2.GaussianBlur(im_0, (k_size, k_size), gs_border)
>>> cv2.imshow("gauss", im_blurred)
>>> cv2.waitKey(0)
Okay, my parameters probably didn't blur this enough. The parts of the words that you want to get rid of aren't really blurry. I doubt you'll even see much of a difference from the original, but hopefully you'll get the idea.
We're going to multiply the original image by a value, multiply the blurry image by a value, and subtract value*blurry from value*orig. Code will be clearer, I hope.
>>> orig_img_multiplier = 1.5
>>> blur_subtraction_factor = -0.5
>>> gamma = 0
>>> im_better = cv2.addWeighted(im_0, orig_img_multiplier, im_blurred, blur_subtraction_factor, gamma)
>>> cv2.imshow("First shot at fixing", im_better)
Yeah, not too much different. Mess around with the parameters, try to do the blur before you do your adaptive threshold, and try some other methods. I can't guarantee it will work, but hopefully it will get you started going somewhere.
Edit
This is a great question. Responding to the tongue-in-cheek criticism of #eldesgraciado
Ah, naughty, naughty. Trying to break them CAPTCHA codes, huh? They are difficult to break for a reason. The text segmentation, as you see, is non-trivial. In your particular image, there’s a lot of high-frequency noise, you could try some frequency filtering first and see what result you get.
I submit the following from the Wikipedia article on reCAPTCHA (archived).
reCAPTCHA has completely digitized the archives of The New York Times and books from Google Books, as of 2011.three The archive can be searched from the New York Times Article Archive.four Through mass collaboration, reCAPTCHA was helping to digitize books that are too illegible to be scanned by computers, as well as translate books to different languages, as of 2015.five
Also look at this article (archived).
I don't think this CAPTCHA is part of Massive-scale Online Collaboration, though.
Edit: Some other type of sharpening will be needed. I just realized that I'm applying 1.5 and -0.5 multipliers to pixels which usually have values very close to 0 or 255, meaning I'm probably just recovering the original image after the sharpening. I welcome any feedback on this.
Also, from comments with #eldesgracio:
Someone probably knows a better sharpening algorithm than the one I used. Blur it enough, and maybe threshold on average values over an n-by-n grid (pixel density). I don't know to much about the whole adaptive-thresholding-then-contours thing. Maybe that could be re-done after the blurring...
Just to give you some ideas ...
Here's a blur with k_size = 5
Here's a blur with k_size = 25
Note those are the BLURS, not the fixes. You'll likely need to mess with the orig_img_multiplier and blur_subtraction_factor based on the frequency (I can't remember exactly how, so I can't really tell you how it's done.) Don't hesitate to fiddle with gs_border, gamma, and anything else you might find in the documentation for the methods I've shown.
Good luck with it.
By the way, the frequency is more something based on the 2-D Fast Fourier Transform, and possibly based on kernel details. I've just messed around with this stuff myself - definitely not an expert AND definitely happy if someone wants to give more details - but I hope I've given a basic idea. Adding some jitter noise (up and down or side to side blurring, rather than radius-based), might be helpful as well.

Determine if a specific image is contained within another, with a simple True/False

I would like to know if a big image contains a small image. The small image can be semi-transparent (similar to watermark, so it's not a fully filled photo). I've tried following different SO answers on this topic, but they're all matching the EXACT photo, but what I am looking for is whether the photo exists with 80% accuracy as the photo will be a lossy rendered version of the original one.
This is a procedure of how the images I am searching in will be generated:
Use any photo, put a semi-transparent "watermark" on it within Photoshop and save it. Then I want to check if the "watermark" exists within created photo with certain percent of accuracy (80% is good enough).
I've tried using the original template matching example provided on their docs page but I'm getting barely any match at all.
This is the code I'm using:
import cv2
import numpy as np
img_rgb = cv2.imread('photo2.jpeg')
img_gray = cv2.cvtColor(img_rgb, cv2.COLOR_BGR2GRAY)
template = cv2.imread('small-image.png', 0)
w, h = template.shape[::-1]
res = cv2.matchTemplate(img_gray,template,cv2.TM_CCOEFF_NORMED)
threshold = 0.7
loc = np.where( res >= threshold)
for pt in zip(*loc[::-1]):
cv2.rectangle(img_rgb, pt, (pt[0] + w, pt[1] + h), (0,0,255), 2)
cv2.imshow('output', img_rgb)
cv2.waitKey(0)
Here are the photos I've been using for the test, as this is something similar I am trying to make a match on.
small-image.png
photo2.jpeg
I am assuming the whole watermark will have the same RGB values and the text will have a little different RGB values otherwise this technique will not work. Based on this we can obtain the RGB values of a pixel of the small image and treated it as a mask by using cv2.inRange to find those pixel values in the large image. Similarly a mask is also created for the small image using those pixel values.
small = cv2.imread('small_find.png')
large = cv2.imread('large_find.jpg')
pixel = np.reshape(small[3,3], (1,3))
lower =[pixel[0,0]-10,pixel[0,1]-10,pixel[0,2]-10]
lower = np.array(lower, dtype = 'uint8')
upper =[pixel[0,0]+10,pixel[0,1]+10,pixel[0,2]+10]
upper = np.array(upper, dtype = 'uint8')
mask = cv2.inRange(large,lower, upper)
mask2 = cv2.inRange(small, lower, upper)
I had to take a buffer value of 20 because the values were not clearly matching in the large image otherwise only 1 is enough in either upper or lower. Then we find contours in mask and find values of its bounding rectangle which is cut out and reshaped to the size of mask2.
im, contours, hierarchy = cv2.findContours(mask,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)
#cv2.drawContours(large, contours, -1, (0,0,255), 1)
cnt = max(contours, key = cv2.contourArea)
x,y,w,h = cv2.boundingRect(cnt)
wanted_part = mask[y:y+h, x:x+w]
wanted_part = cv2.resize(wanted_part, (mask2.shape[1], mask2.shape[0]), interpolation = cv2.INTER_LINEAR)
The two masks side by side (inverted them otherwise they were not visible).
For comparing they you can use any parameter and check whether it satisfies your condition or not. I used mean square error and got error of only 6.20 which is very low.
def MSE(img1, img2):
squared_diff = img1 - img2
summed = np.sum(squared_diff)
num_pix = img1.shape[0] * img1.shape[1] #img1 and 2 should have same shape
err = summed / num_pix
return err

Python openCV matchTemplate on grayscale image with masking

I have a project where I want to locate a bunch of arrows in images that look like so: ibb.co/dSCAYQ
with the following template: ibb.co/jpRUtQ
I'm using cv2's template matching feature in Python. My algorithm is to rotate the template 360 degrees and match for each rotation. I get the following result: ibb.co/kDFB7k
As you can see, it works well except for the 2 arrows that are really close, such that another arrow is in the black region of the template.
I am trying to use a mask, but it seems that cv2 is not applying my masks at all, i.e. no matter what values that mask array has, the matching is the same. Have been trying this for two days but cv2's limited documentation is not helping.
Here is my code:
import numpy as np
import cv2
import os
from scipy import misc, ndimage
STRIPPED_DIR = #Image dir
TMPL_DIR = #Template dir
MATCH_THRESH = 0.9
MATCH_RES = 1 #specifies degree-interval at which to match
def make_templates():
base = misc.imread(os.path.join(TMPL_DIR,'base.jpg')) # The templ that I rotate to make 360 templates
for deg in range(360):
print('making template: ' + str(deg))
tmpl = ndimage.rotate(base, deg)
misc.imsave(os.path.join(TMPL_DIR, 'tmp' + str(deg) + '.jpg'), tmpl)
def make_masks():
for deg in range(360):
tmpl = cv2.imread(os.path.join(TMPL_DIR, 'tmp' + str(deg) + '.jpg'), 0)
ret2, mask = cv2.threshold(tmpl, 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)
cv2.imwrite(os.path.join(TMPL_DIR, 'mask' + str(deg) + '.jpg'), mask)
def match(img_name):
img_rgb = cv2.imread(os.path.join(STRIPPED_DIR, img_name))
img_gray = cv2.cvtColor(img_rgb, cv2.COLOR_BGR2GRAY)
for deg in range(0, 360, MATCH_RES):
tmpl = cv2.imread(os.path.join(TMPL_DIR, 'tmp' + str(deg) + '.jpg'), 0)
mask = cv2.imread(os.path.join(TMPL_DIR, 'mask' + str(deg) + '.jpg'), 0)
w, h = tmpl.shape[::-1]
res = cv2.matchTemplate(img_gray, tmpl, cv2.TM_CCORR_NORMED, mask=mask)
loc = np.where( res >= MATCH_THRESH)
for pt in zip(*loc[::-1]):
cv2.rectangle(img_rgb, pt, (pt[0] + w, pt[1] + h), (0,0,255), 2)
cv2.imwrite('res.png',img_rgb)
Some things that I think could be wrong but not sure how to fix:
The number of channels the mask/tmpl/img should have. I have tried an example with colored 4-channel pngs stackoverflow eg., but not sure how it translates to grayscale or 3-channel jpegs.
The values of the mask array. e.g. Should masked out pixels be 1 or 255?
Any help is greatly appreciated.
UPDATE
I fixed a trivial error in my code; mask=mask must be used in the argument for matchTemplate(). This combined with using mask values of 255 made the difference. However, now I get a ton of false positives like so:
http://ibb.co/esfTnk Note that the false positives are more strongly correlated than the true positives.
Any pointers on how to fix my masks to resolve this? Right now I am simply using a black-and-white conversion of my templates.
You've already figured out the first questions, but I'll expand a bit on them:
For a binary mask, it should be of type uint8 where the values are simply zero or non-zero. The locations with zero are ignored, and are included in the mask if they are non-zero. You can pass a float32 instead as a mask, in which case, it lets you weight the pixels; so a value of 0 is ignore, 1 is include, and .5 is include but only give it half as much weight as another pixel. Note that a mask is only supported for TM_SQDIFF and TM_CCORR_NORMED, but that's fine since you're using the latter. Masks for matchTemplate are single channel only. And as you found out, mask is not a positional argument, so it must be called with the key in the argument, mask=your_mask. All of this is pretty explicit in this page on the OpenCV docs.
Now to the new issue:
It's related to the method you're using and the fact that you're using jpgs. Have a look at the formulas for the normed methods. Where the image is completely zero, you're going to get faulty results because you'll be dividing by zero. But that's not the exact problem---because that returns nan and np.nan > value always returns false, so you'll never be drawing a square from nan values.
Instead the problem is right at the edge cases where you get a hint of a non-zero value; and because you're using jpg images, not all black values are exactly 0; in fact, many aren't. Note from the formula you're diving by the mean values, and the mean values will be extremely small when you have values like 1, 2, 5, etc inside your image window, so it will blow up the correlation value. You should use TM_SQDIFF instead (because it's the only other method which allows a mask). Additionally because you're using jpg most of your masks are worthless, since any non-zero value (even 1) counts as an inclusion. You should use pngs for the masks. As long as the templates have a proper mask, shouldn't matter whether you use jpg or png for the templates.
With TM_SQDIFF, instead of looking for the maximum values, you're looking for the minimum---you want the smallest difference between the template and image patch. You know that the difference should be really small---exactly 0 for a pixel-perfect match, which you probably won't get. You can play around with thresholding a little bit. Note that you're always going to get pretty close values for every rotation, because the nature of your template---the little arrow bar hardly adds that many positive values, and it's not necessarily guaranteed that the one degree discretization its exactly right (unless you made the image that way). But even an arrow facing the totally wrong direction is going to still going to be extremely close since there's a lot of overlap; and the arrow facing close to the right direction will be really close to values with the exactly right direction.
Preview what the result of the square difference is while you're running the code:
res = cv2.matchTemplate(img_gray, tmpl, cv2.TM_SQDIFF, mask=mask)
cv2.imshow("result", res.astype(np.uint8))
if cv2.waitKey(0) & 0xFF == ord('q'):
break
You can see that basically every orientation of template matches closely.
Anyways, it seems a threshold of 8 nailed it:
The only thing I modified in your code was changing to pngs for all images, switching to TM_SQDIFF, making sure loc looks for values less than the threshold instead of greater than, and using a MATCH_THRESH of 8. At least I think that's all I changed. Have a look just in case:
import numpy as np
import cv2
import os
from scipy import misc, ndimage
STRIPPED_DIR = ...
TMPL_DIR = ...
MATCH_THRESH = 8
MATCH_RES = 1 #specifies degree-interval at which to match
def make_templates():
base = misc.imread(os.path.join(TMPL_DIR,'base.jpg')) # The templ that I rotate to make 360 templates
for deg in range(360):
print('making template: ' + str(deg))
tmpl = ndimage.rotate(base, deg)
misc.imsave(os.path.join(TMPL_DIR, 'tmp' + str(deg) + '.png'), tmpl)
def make_masks():
for deg in range(360):
tmpl = cv2.imread(os.path.join(TMPL_DIR, 'tmp' + str(deg) + '.png'), 0)
ret2, mask = cv2.threshold(tmpl, 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)
cv2.imwrite(os.path.join(TMPL_DIR, 'mask' + str(deg) + '.png'), mask)
def match(img_name):
img_rgb = cv2.imread(os.path.join(STRIPPED_DIR, img_name))
img_gray = cv2.cvtColor(img_rgb, cv2.COLOR_BGR2GRAY)
for deg in range(0, 360, MATCH_RES):
tmpl = cv2.imread(os.path.join(TMPL_DIR, 'tmp' + str(deg) + '.png'), 0)
mask = cv2.imread(os.path.join(TMPL_DIR, 'mask' + str(deg) + '.png'), 0)
w, h = tmpl.shape[::-1]
res = cv2.matchTemplate(img_gray, tmpl, cv2.TM_SQDIFF, mask=mask)
loc = np.where(res < MATCH_THRESH)
for pt in zip(*loc[::-1]):
cv2.rectangle(img_rgb, pt, (pt[0] + w, pt[1] + h), (0,0,255), 2)
cv2.imwrite('res.png',img_rgb)

Categories