I need to determine the location of yogurts in the supermarket. Source photo looks like
With template:
I using SIFT to extract key points of template:
img1 = cv.imread('train.jpg')
sift = cv.SIFT_create()# queryImage
kp1, des1 = sift.detectAndCompute(img1, None)
path = glob.glob("template.jpg")
cv_img = []
l=0
for img in path:
img2 = cv.imread(img) # trainImage
# Initiate SIFT detector
# find the keypoints and descriptors with SIFT
kp2, des2 = sift.detectAndCompute(img2,None)
# FLANN parameters
FLANN_INDEX_KDTREE = 1
index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
search_params = dict(checks=50) # or pass empty dictionary
flann = cv.FlannBasedMatcher(index_params,search_params)
matches = flann.knnMatch(des1,des2,k=2)
# Need to draw only good matches, so create a mask
# ratio test as per Lowe's paper
if (l < len(matches)):
l = len(matches)
image = img2
match = matches
h_query, w_query, _= img2.shape
matchesMask = [[0,0] for i in range(len(match))]
good_matches = []
good_matches_indices = {}
for i,(m,n) in enumerate(match):
if m.distance < 0.7*n.distance:
matchesMask[i]=[1,0]
good_matches.append(m)
good_matches_indices[len(good_matches) - 1] = i
bboxes = []
src_pts = np.float32([ kp1[m.queryIdx].pt for m in good ]).reshape(-1,2)
dst_pts = np.float32([ kp2[m.trainIdx].pt for m in good ]).reshape(-1,2)
model, inliers = initialize_ransac(src_pts, dst_pts)
n_inliers = np.sum(inliers)
matched_indices = [good_matches_indices[idx] for idx in inliers.nonzero()[0]]
print(len(matched_indices))
model, inliers = ransac(
(src_pts, dst_pts),
AffineTransform, min_samples=4,
residual_threshold=4, max_trials=20000
)
n_inliers = np.sum(inliers)
print(n_inliers)
matched_indices = [good_matches_indices[idx] for idx in inliers.nonzero()[0]]
print(matched_indices)
q_coordinates = np.array([(0, 0), (h_query, w_query)])
coords = model.inverse(q_coordinates)
print(coords)
h_query, w_query,_ = img2.shape
q_coordinates = np.array([(0, 0), (h_query, w_query)])
coords = model.inverse(q_coordinates)
print(coords)
# bboxes_list.append((i, coords))
M, mask = cv.findHomography(src_pts, dst_pts, cv.RANSAC, 2)
draw_params = dict(matchColor = (0,255,0),
singlePointColor = (255,0,0),
matchesMask = matchesMask,
flags = cv.DrawMatchesFlags_DEFAULT)
img3 = cv.drawMatchesKnn(img1,kp1,image,kp2,match,None,**draw_params)
plt.imshow(img3),plt.show()
Result of SIFT looks like
The question is what is the best way to clasterise points to obtain rectangles, representing each yogurt? I tried RANSAC, but this method doesn't work in this case.
I am proposing an approach based on what is discussed in this paper. I have modified the approach a bit because the use-case is not entirely same but they do use SIFT features matching to locate multiple objects in video frames. They have used PCA for reducing time but that may not be required for still images.
Sorry I could not write a code for this as it will take a lot of time but I believe this should work to locate all the occurrences of the template object.
The modified approach is like this:
Divide the template image into regions: left, middle, right along the horizontal
and top, bottom along the vertical
Now when you match features between the template and source image, you will get features matched from some of the keypoints from these regions on multiple locations on the source image. You can use these keypoints to identify which region of the template is present at what location(s) in the source image. If there are overlapping regions i.e. keypoints from different regions matched with close keypoints in source image then that would mean a wrong match.
Mark each set of matching keypoints within a neighborhood on source image as left, center, right, top, bottom depending upon if they have majority matches from keypoints of a particular region in the template image.
Starting from each left region on source image move towards right and if we find a central region followed by a right region then this area of source image between regions marked as left and right, can be marked as location of one template object.
There could be overlapping objects which could result in a left region followed by another left region when moving in right direction from the left region. The area between the two left regions can be marked as one template object.
For further refined locations, each area of source image marked as one template object can be cropped and re-matched with the template image.
Try working spatially: for each key-point in img2 get some bounding box around and consider only the points in there for your ransac homography to check for best fit.
You can also work with overlapping windows and later discard similar resulting homographys
Here is you can do
Base Image = Whole picture of shelf
Template Image = Single product image
Get SIFT matches from both images. (base and template image)
Do feature matching.
Get all the points in base image which are matching. (refer to figure)
Create Cluster based on size of template image. (here threshold in 50px)
Get Bounding box of clusters.
Crop each bounding box cluter and check matches with template image.
Accept all cluters which has atleast minimum percentage of matched. (here taken minimum 10% of keypoints)
def plot_pts(img, pts):
img_plot = img.copy()
for i in range(len(pts)):
img_plot = cv2.circle(img_plot, (int(pts[i][0]), int(pts[i][1])), radius=7, color=(255, 0, 0), thickness=-1)
plt.figure(figsize=(20, 10))
plt.imshow(img_plot)
def plot_bbox(img, bbox_list):
img_plot = img.copy()
for i in range(len(bbox_list)):
start_pt = bbox_list[i][0]
end_pt = bbox_list[i][2]
img_plot = cv2.rectangle(img_plot, pt1=start_pt, pt2=end_pt, color=(255, 0, 0), thickness=2)
plt.figure(figsize=(20, 10))
plt.imshow(img_plot)
def get_distance(pt1, pt2):
x1, y1 = pt1
x2, y2 = pt2
return np.sqrt(np.square(x1 - x2) + np.square(y1 - y2))
def check_centroid(pt, centroid):
x, y = pt
cx, cy = centroid
distance = get_distance(pt1=(x, y), pt2=(cx, cy))
if distance < max_distance:
return True
else:
return False
def update_centroid(pt, centroids_list):
new_centroids_list = centroids_list.copy()
flag_new_centroid = True
for j, c in enumerate(centroids_list):
temp_centroid = np.mean(c, axis=0)
if_close = check_centroid(pt, temp_centroid)
if if_close:
new_centroids_list[j].append(pt)
flag_new_centroid = False
break
if flag_new_centroid:
new_centroids_list.append([pt])
new_centroids_list = recheck_centroid(new_centroids_list)
return new_centroids_list
def recheck_centroid(centroids_list):
new_centroids_list = [list(set(c)) for c in centroids_list]
return new_centroids_list
def get_bbox(pts):
minn_x, minn_y = np.min(pts, axis=0)
maxx_x, maxx_y = np.max(pts, axis=0)
return [[minn_x, minn_y], [maxx_x, minn_y], [maxx_x, maxx_y], [minn_x, maxx_y]]
class RotateAndTransform:
def __init__(self, path_img_ref):
self.path_img_ref = path_img_ref
self.ref_img = self._read_ref_image()
#sift
self.sift = cv2.SIFT_create()
#feature matching
self.bf = cv2.BFMatcher()
# FLANN parameters
FLANN_INDEX_KDTREE = 1
index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
search_params = dict(checks=50) # or pass empty dictionary
self.flann = cv2.FlannBasedMatcher(index_params,search_params)
def _read_ref_image(self):
ref_img = cv2.imread(self.path_img_ref, cv2.IMREAD_COLOR)
ref_img = cv2.cvtColor(ref_img, cv2.COLOR_BGR2RGB)
return ref_img
def read_src_image(self, path_img_src):
self.path_img_src = path_img_src
# read images
# ref_img = cv2.imread(self.path_img_ref, cv2.IMREAD_COLOR)
src_img = cv2.imread(path_img_src, cv2.IMREAD_COLOR)
src_img = cv2.cvtColor(src_img, cv2.COLOR_BGR2RGB)
return src_img
def convert_bw(self, img):
img_bw = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
return img_bw
def get_keypoints_descriptors(self, img_bw):
keypoints, descriptors = self.sift.detectAndCompute(img_bw,None)
return keypoints, descriptors
def get_matches(self, src_descriptors, ref_descriptors, threshold=0.6):
matches = self.bf.knnMatch(ref_descriptors, src_descriptors, k=2)
flann_matches = self.flann.knnMatch(ref_descriptors, src_descriptors,k=2)
good_matches = []
good_flann_matches = []
# Apply ratio test for Brute Force
for m,n in matches:
if m.distance <threshold*n.distance:
good_matches.append([m])
print(f'Numner of BF Match: {len(matches)}, Number of good BF Match: {len(good_matches)}')
# Apply ratio test for FLANN
for m,n in flann_matches:
if m.distance < threshold*n.distance:
good_flann_matches.append([m])
# matches = sorted(matches, key = lambda x:x.distance)
print(f'Numner of FLANN Match: {len(flann_matches)}, Number of good Flann Match: {len(good_flann_matches)}')
return good_matches, good_flann_matches
def get_src_dst_pts(self, good_flann_matches, ref_keypoints, src_keypoints):
pts_src = []
pts_ref = []
n = len(good_flann_matches)
for i in range(n):
ref_index = good_flann_matches[i][0].queryIdx
src_index = good_flann_matches[i][0].trainIdx
pts_src.append(src_keypoints[src_index].pt)
pts_ref.append(ref_keypoints[ref_index].pt)
return np.array(pts_src), np.array(pts_ref)
def extend_bbox(bbox, increment=0.1):
bbox_new = bbox.copy()
bbox_new[0] = [bbox_new[0][0] - int(bbox_new[0][0] * increment), bbox_new[0][1] - int(bbox_new[0][1] * increment)]
bbox_new[1] = [bbox_new[1][0] + int(bbox_new[1][0] * increment), bbox_new[1][1] - int(bbox_new[1][1] * increment)]
bbox_new[2] = [bbox_new[2][0] + int(bbox_new[2][0] * increment), bbox_new[2][1] + int(bbox_new[2][1] * increment)]
bbox_new[3] = [bbox_new[3][0] - int(bbox_new[3][0] * increment), bbox_new[3][1] + int(bbox_new[3][1] * increment)]
return bbox_new
def crop_bbox(img, bbox):
y, x = bbox[0]
h, w = bbox[1][0] - bbox[0][0], bbox[2][1] - bbox[0][1]
return img[x: x + w, y: y + h, :]
base_img = cv2.imread(path_img_base)
ref_img = cv2.imread(path_img_ref)
rnt = RotateAndTransform(path_img_ref)
ref_img_bw = rnt.convert_bw(img=rnt.ref_img)
ref_keypoints, ref_descriptors = rnt.get_keypoints_descriptors(ref_img_bw)
base_img = rnt.read_src_image(path_img_src = path_img_base)
base_img_bw = rnt.convert_bw(img=base_img)
base_keypoints, base_descriptors = rnt.get_keypoints_descriptors(base_img_bw)
good_matches, good_flann_matches = rnt.get_matches(src_descriptors=base_descriptors, ref_descriptors=ref_descriptors, threshold=0.6)
ref_points = []
for gm in good_flann_matches:
x, y = ref_keypoints[gm[0].queryIdx].pt
x, y = int(x), int(y)
ref_points.append((x, y))
max_distance = 50
centroids = [[ref_points[0]]]
for i in tqdm(range(len(ref_points))):
pt = ref_points[i]
centroids = update_centroid(pt, centroids)
bbox = [get_bbox(c) for c in centroi[![enter image description here][1]][1]ds]
centroids = [np.mean(c, axis=0) for c in centroids]
print(f'Number of Points: {len(good_flann_matches)}, centroids: {len(centroids)}')
data = []
for i in range(len(bbox)):
temp_crop_img = crop_bbox(ref_img, extend_bbox(bbox[i], 0.01))
temp_crop_img_bw = rnt.convert_bw(img=temp_crop_img)
temp_crop_keypoints, temp_crop_descriptors = rnt.get_keypoints_descriptors(temp_crop_img_bw)
good_matches, good_flann_matches = rnt.get_matches(src_descriptors=base_descriptors, ref_descriptors=temp_crop_descriptors, threshold=0.6)
temp_data = {'image': temp_crop_img,
'num_matched': len(good_flann_matches),
'total_keypoints' : len(base_keypoints),
}
data.append(temp_data)
filter_data = [{'num_matched' : i['num_matched'], 'image': i['image']} for i in data if i['num_matched'] > 25]
for i in range(len(filter_data)):
temp_num_match = filter_data[i]['num_matched']
plt.figure()
plt.title(f'num matched: {temp_num_match}')
plt.imshow(filter_data[i]['image'])
First you could detect any item that is on the shelf with a network like this, it's pre-trained in this exact context and works pretty well. You should also rectify the image before feeding it to the network. You will obtain bounding boxes for every product (maybe some false positive/negative, but that's another issue). Then you can match each box with the template using SIFT and calculating a score (it's up to you define which score works), but I suggest to use another approach like a siamese network if you a consistent dataset.
Related
I want to develop a face alignment program. There is a video, from which the face is extracted and aligned. It is happening in the following way: there is a result frame, constructed from the first frame of the video, and then the face from every next frame is aligned to it and rerecorded as a result frame. Alignment is performed via homography. So for every frame, I need to do the operation of finding keypoints, matching them for current face and result face, and computing homography.
Here is the problem. In my pipeline keypoints for the current frame must not be computed repeatedly. Instead, the following algorithm is proposed:
There are some predefined points in the format of 2d numpy array. (in general, they could be any points on the image, but for example, let's imagine these points are some face landmarks)
For the first frame using akaze feature detector I search for keypoints in the area close to the initial points from item 1.
I use cv2.calcOpticalFlowPyrLK to track those keypoints, so in the next frame I do not detect them again, but use tracked keypoints from the previous frame.
So here is the code of this:
# Parameters for lucas kanade optical flow
lk_params = dict( winSize = (15,15),
maxLevel = 2,
criteria = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 0.03))
# previous keypoints are the keypoints from the previous frame. It is the list of cv2.Keypoint
# here I cast them to the input format for optical flow
coord_keypoints = np.array(list(map(lambda point: [point.pt[0], point.pt[1]], previous_keypoints)), dtype = np.float32)
p0 = coord_keypoints.copy().reshape((-1, 1, 2))
# oldFace_gray and faceImg1 are the faces from previous and current frame respectively
p1, st, err = cv2.calcOpticalFlowPyrLK(oldFace_gray, faceImg1, p0, None, **lk_params)
indices = np.where(st==1)[0]
good_new = p1[st==1]
good_old = p0[st==1]
# Here I cast tracked points back to the type of cv2.Keypoint for description and matching
keypoints1 = []
for idx, point in zip(indices, good_new):
keypoint = cv2.KeyPoint(x=point[0], y=point[1],
_size=previous_keypoints[idx].size,
_class_id=previous_keypoints[idx].class_id,
_response=previous_keypoints[idx].response)
keypoints1.append(keypoint)
# here I create descriptors for keypoints defined above for current face and find and describe keypoints for result face
akaze = cv2.AKAZE_create(threshold = threshold)
keypoints1, descriptors1 = akaze.compute(faceImg1, keypoints1)
keypoints2, descriptors2 = akaze.detectAndCompute(faceImg2, mask=None)
# Then I want to filter keypoints for result face by their distance to points on current face and previous result face
# For that firstly define a function
def landmarkCondition(point, landmarks, eps):
for borderPoint in landmarks:
if np.linalg.norm(np.array(point.pt) - np.array(borderPoint)) < eps:
return True
return False
# Then use filters. landmarks_result is 2d numpy array of coordinates of keypoints founded on the previous result face.
keypoints_descriptors2 = (filter(lambda x : landmarkCondition(x[0], landmarks_result, eps_result), zip(keypoints2, descriptors2)))
keypoints_descriptors2 = list(filter(lambda x : landmarkCondition(x[0], good_new, eps_initial), keypoints_descriptors2))
keypoints2, descriptors2 = [], []
for keypoint, descriptor in keypoints_descriptors2:
keypoints2.append(keypoint)
descriptors2.append(descriptor)
descriptors2 = np.array(descriptors2)
# Match founded keypoints
height, width, channels = coloredFace2.shape
matcher = cv2.DescriptorMatcher_create(cv2.DESCRIPTOR_MATCHER_BRUTEFORCE_SL2)
matches = matcher.match(descriptors1, descriptors2, None)
# # Sort matches by score
matches.sort(key=lambda x: x.distance, reverse=False)
numGoodMatches = int(len(matches) * GOOD_MATCH_PERCENT)
matches = matches[:numGoodMatches]
# I want to eliminate obviously bad matches. Since two images are meant to be similar, lines connecting two correspoindg points on images should be almost horizontal with length approximately equal width of the image
def correct(point1, point2 , width, eps=NOT_ZERO_DIVIDER):
x1, y1 = point1
x2, y2 = point2
angle = abs((y2-y1) / (x2 - x1 + width + eps))
length = x2 - x1 + width
return True if angle < CRITICAL_ANGLE and (length > (1-RELATIVE_DEVIATION) * width and length < (1 + RELATIVE_DEVIATION) * width) else False
goodMatches = []
for i, match in enumerate(matches):
if correct(keypoints1[match.queryIdx].pt, keypoints2[match.trainIdx].pt, width):
goodMatches.append(match)
# Find homography
points1 = np.zeros((len(goodMatches), 2), dtype=np.float32)
points2 = np.zeros((len(goodMatches), 2), dtype=np.float32)
for i, match in enumerate(goodMatches):
points1[i, :] = keypoints1[match.queryIdx].pt
points2[i, :] = keypoints2[match.trainIdx].pt
h, mask = cv2.findHomography(points1, points2, method)
height, width, channels = coloredFace2.shape
result = cv2.warpPerspective(coloredFace1, h, (width, height))
resultGray = cv2.cvtColor(result, cv2.COLOR_BGR2GRAY)
The result of such matching and aligning very poor. If I compute keypoints for both images on every steps without tracking, the result is quite good. Do I make a mistake somewhere?
P.S. I am not sure about posting minimum reproducing example because there is a lot of preprocessing of frames from video.
I am trying to learn OpenCV in order to improve a script I wrote for comparing engineering drawings. I am using the code (see below) found on this tutorial but I am having zero success with it. In the tutorial the author uses the example of a blank form for the reference image and a photo of the completed form as the image to align. My situation is very similar because I am attempting to use a blank drawing title block as my reference image and a scanned image of a drawing as my image to align.
My goal is to use OpenCV to clean up the scanned engineering drawings so that they are aligned properly but no matter what I try in the MAX_FEATURES and GOOD_MATCH_PERCENT parameters, I get an image that looks like a black and white star burst. Also, when I review the "matches.jpg" file generated by the script, it appears that there are no correct matches. I have tried multiple drawings and I get the same results.
Can anyone see a reason why this script would not work in the way I am trying to use it?
from __future__ import print_function
import cv2
import numpy as np
MAX_FEATURES = 500
GOOD_MATCH_PERCENT = 0.15
def alignImages(im1, im2):
# Convert images to grayscale
im1Gray = cv2.cvtColor(im1, cv2.COLOR_BGR2GRAY)
im2Gray = cv2.cvtColor(im2, cv2.COLOR_BGR2GRAY)
# Detect ORB features and compute descriptors.
orb = cv2.ORB_create(MAX_FEATURES)
keypoints1, descriptors1 = orb.detectAndCompute(im1Gray, None)
keypoints2, descriptors2 = orb.detectAndCompute(im2Gray, None)
# Match features.
matcher = cv2.DescriptorMatcher_create(cv2.DESCRIPTOR_MATCHER_BRUTEFORCE_HAMMING)
matches = matcher.match(descriptors1, descriptors2, None)
# Sort matches by score
matches.sort(key=lambda x: x.distance, reverse=False)
# Remove not so good matches
numGoodMatches = int(len(matches) * GOOD_MATCH_PERCENT)
matches = matches[:numGoodMatches]
# Draw top matches
imMatches = cv2.drawMatches(im1, keypoints1, im2, keypoints2, matches, None)
cv2.imwrite("matches.jpg", imMatches)
# Extract location of good matches
points1 = np.zeros((len(matches), 2), dtype=np.float32)
points2 = np.zeros((len(matches), 2), dtype=np.float32)
for i, match in enumerate(matches):
points1[i, :] = keypoints1[match.queryIdx].pt
points2[i, :] = keypoints2[match.trainIdx].pt
# Find homography
h, mask = cv2.findHomography(points1, points2, cv2.RANSAC)
# Use homography
height, width, channels = im2.shape
im1Reg = cv2.warpPerspective(im1, h, (width, height))
return im1Reg, h
if __name__ == '__main__':
# Read reference image
refFilename = "form.jpg"
print("Reading reference image : ", refFilename)
imReference = cv2.imread(refFilename, cv2.IMREAD_COLOR)
# Read image to be aligned
imFilename = "scanned-form.jpg"
print("Reading image to align : ", imFilename);
im = cv2.imread(imFilename, cv2.IMREAD_COLOR)
print("Aligning images ...")
# Registered image will be resotred in imReg.
# The estimated homography will be stored in h.
imReg, h = alignImages(im, imReference)
# Write aligned image to disk.
outFilename = "aligned.jpg"
print("Saving aligned image : ", outFilename);
cv2.imwrite(outFilename, imReg)
# Print estimated homography
print("Estimated homography : \n", h)
Template Image:
Image to Align:
Expected output Image:
Here is one way in Python/OpenCV using a Rigid Affine Transformation (scale, rotation and translation only - no skew or perspective) to warp one image to match the other. It uses findTransformECC() -- Enhanced Correlation Coefficient Maximization) -- to get the rotation matrix and then uses warpAffine to do the rigid warping.
Template:
Image to be warped:
import cv2
import numpy as np
import math
import sys
# Get the image files from the command line arguments
# These are full paths to the images
# image2 will be warped to match image1
# argv[0] is name of script
image1 = sys.argv[1]
image2 = sys.argv[2]
outfile = sys.argv[3]
# Read the images to be aligned
# im2 is to be warped to match im1
im1 = cv2.imread(image1);
im2 = cv2.imread(image2);
# Convert images to grayscale for computing the rotation via ECC method
im1_gray = cv2.cvtColor(im1,cv2.COLOR_BGR2GRAY)
im2_gray = cv2.cvtColor(im2,cv2.COLOR_BGR2GRAY)
# Find size of image1
sz = im1.shape
# Define the motion model - euclidean is rigid (SRT)
warp_mode = cv2.MOTION_EUCLIDEAN
# Define 2x3 matrix and initialize the matrix to identity matrix I (eye)
warp_matrix = np.eye(2, 3, dtype=np.float32)
# Specify the number of iterations.
number_of_iterations = 5000;
# Specify the threshold of the increment
# in the correlation coefficient between two iterations
termination_eps = 1e-3;
# Define termination criteria
criteria = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, number_of_iterations, termination_eps)
# Run the ECC algorithm. The results are stored in warp_matrix.
(cc, warp_matrix) = cv2.findTransformECC (im1_gray, im2_gray, warp_matrix, warp_mode, criteria, None, 1)
# Warp im2 using affine
im2_aligned = cv2.warpAffine(im2, warp_matrix, (sz[1],sz[0]), flags=cv2.INTER_LINEAR + cv2.WARP_INVERSE_MAP);
# write output
cv2.imwrite(outfile, im2_aligned)
# Print rotation angle
row1_col0 = warp_matrix[0,1]
angle = math.degrees(math.asin(row1_col0))
print(angle)
Result:
Resulting Angle of Rotation (in deg):
-0.3102187026194794
Note, you can change the background color in the affineWarp to white if desired.
Also make the termination epsilon smaller by an order of magnitude or two for more accuracy, but longer processing times.
The other Rigid Affine approach that I mentioned in my comments earlier is to use ORB feature matching, filter the key points, then use estimateAffinePartial2D() to get the rigid affine matrix. Then use that to warp the image. For large angles this seems to me to be more reliable than the ECC method. But the ECC method seems more accurate for small rotations.
import cv2
import numpy as np
import math
import sys
MAX_FEATURES = 10000
GOOD_MATCH_PERCENT = 0.15
DIFFY_THRESH = 2
# Get the image files from the command line arguments
# These are full paths to the images
# image[2] will be warped to match image[1]
# argv[0] is name of script
file1 = sys.argv[1]
file2 = sys.argv[2]
outFile = sys.argv[3]
# Read image1
image1 = cv2.imread(file1, cv2.IMREAD_COLOR)
# Read image2 to be warped to match image1
image2 = cv2.imread(file2, cv2.IMREAD_COLOR)
# Convert images to grayscale
image1Gray = cv2.cvtColor(image1, cv2.COLOR_BGR2GRAY)
image2Gray = cv2.cvtColor(image2, cv2.COLOR_BGR2GRAY)
# Detect ORB features and compute descriptors.
orb = cv2.ORB_create(MAX_FEATURES)
keypoints1, descriptors1 = orb.detectAndCompute(image1Gray, None)
keypoints2, descriptors2 = orb.detectAndCompute(image2Gray, None)
# Match features.
matcher = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
matches = matcher.match(descriptors1, descriptors2, None)
# Sort matches by score
matches.sort(key=lambda x: x.distance, reverse=False)
# Remove not so good matches
numGoodMatches = int(len(matches) * GOOD_MATCH_PERCENT)
matches = matches[:numGoodMatches]
#print('numgood',numGoodMatches)
# Extract location of good matches and filter by diffy if rotation is small
points1 = np.zeros((len(matches), 2), dtype=np.float32)
points2 = np.zeros((len(matches), 2), dtype=np.float32)
for i, match in enumerate(matches):
points1[i, :] = keypoints1[match.queryIdx].pt
points2[i, :] = keypoints2[match.trainIdx].pt
# initialize empty arrays for newpoints1 and newpoints2 and mask
newpoints1 = np.empty(shape=[0, 2], dtype=np.float32)
newpoints2 = np.empty(shape=[0, 2], dtype=np.float32)
matches_Mask = [0] * len(matches)
count=0
for i in range(len(matches)):
pt1 = points1[i]
pt2 = points2[i]
pt1x, pt1y = zip(*[pt1])
pt2x, pt2y = zip(*[pt2])
diffy = np.float32( np.float32(pt2y) - np.float32(pt1y) )
if abs(diffy) < DIFFY_THRESH:
newpoints1 = np.append(newpoints1, [pt1], axis=0).astype(np.uint8)
newpoints2 = np.append(newpoints2, [pt2], axis=0).astype(np.uint8)
matches_Mask[i]=1
count += 1
# Find Affine Transformation
# note swap of order of newpoints here so that image2 is warped to match image1
m, inliers = cv2.estimateAffinePartial2D(newpoints2,newpoints1)
# Use affine transform to warp im2 to match im1
height, width, channels = image1.shape
image2Reg = cv2.warpAffine(image2, m, (width, height))
# Write aligned image to disk.
cv2.imwrite(outFile, image2Reg)
# Print angle
row1_col0 = m[1,0]
print('row1_col0:',row1_col0)
angle = math.degrees(math.asin(row1_col0))
print('angle', angle)
Result Image:
Result Rotation Angle:
-0.6123936361765413
After some trial and error I determined that I don't need to find a homography in order to align my images properly. Since my images only need to be scaled and rotated slightly, my best option is to find the outer most points of the drawing title block and align one image to the other with a transform.
My approach is to use the Harris corner finding function to find all of the corners on the drawing, then do a simple calculation to find the points that are the shortest distance to the corners of the drawing canvas (these are the outside corners of the drawing title block). I then take 3 of the points (top left, top right, and bottom left) and use a transform to scale/rotate one drawing to the other.
Below is the code that I used:
import cv2
import numpy as np
import math
img1 = cv2.imread('reference.jpg')
img2 = cv2.imread('to-be-aligned.jpg')
#Find the corner points of img1
h1,w1,c=img1.shape
gray1 = cv2.cvtColor(img1, cv2.COLOR_BGR2GRAY)
gray1 = np.float32(gray1)
dst1 = cv2.cornerHarris(gray1,5,3,0.04)
ret1, dst1 = cv2.threshold(dst1,0.1*dst1.max(),255,0)
dst1 = np.uint8(dst1)
ret1, labels1, stats1, centroids1 = cv2.connectedComponentsWithStats(dst1)
criteria1 = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 100, 0.001)
corners1 = cv2.cornerSubPix(gray1,np.float32(centroids1),(5,5),(-1,-1),criteria1)
#Find the corner points of img2
h2,w2,c=img2.shape
gray2 = cv2.cvtColor(img2, cv2.COLOR_BGR2GRAY)
gray2 = np.float32(gray2)
dst2 = cv2.cornerHarris(gray2,5,3,0.04)
ret2, dst2 = cv2.threshold(dst2,0.1*dst2.max(),255,0)
dst2 = np.uint8(dst2)
ret2, labels2, stats2, centroids2 = cv2.connectedComponentsWithStats(dst2)
criteria2 = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 100, 0.001)
corners2 = cv2.cornerSubPix(gray2,np.float32(centroids2),(5,5),(-1,-1),criteria2)
#Find the top left, top right, and bottom left outer corners of the drawing frame for img1
a1=[0,0]
b1=[w1,0]
c1=[0,h1]
a1_dist=[]
b1_dist=[]
c1_dist=[]
for i in corners1:
temp_a1=math.sqrt((i[0]-a1[0])**2+(i[1]-a1[1])**2)
temp_b1=math.sqrt((i[0]-b1[0])**2+(i[1]-b1[1])**2)
temp_c1=math.sqrt((i[0]-c1[0])**2+(i[1]-c1[1])**2)
a1_dist.append(temp_a1)
b1_dist.append(temp_b1)
c1_dist.append(temp_c1)
print("Image #1 (reference):")
print("Top Left:")
print(corners1[a1_dist.index(min(a1_dist))])
print("Top Right:")
print(corners1[b1_dist.index(min(b1_dist))])
print("Bottom Left:")
print(corners1[c1_dist.index(min(c1_dist))])
#Find the top left, top right, and bottom left outer corners of the drawing frame for img2
a2=[0,0]
b2=[w2,0]
c2=[0,h2]
a2_dist=[]
b2_dist=[]
c2_dist=[]
for i in corners2:
temp_a2=math.sqrt((i[0]-a2[0])**2+(i[1]-a2[1])**2)
temp_b2=math.sqrt((i[0]-b2[0])**2+(i[1]-b2[1])**2)
temp_c2=math.sqrt((i[0]-c2[0])**2+(i[1]-c2[1])**2)
a2_dist.append(temp_a2)
b2_dist.append(temp_b2)
c2_dist.append(temp_c2)
print("Image #2 (image to align):")
print("Top Left:")
print(corners2[a2_dist.index(min(a2_dist))])
print("Top Right:")
print(corners2[b2_dist.index(min(b2_dist))])
print("Bottom Left:")
print(corners2[c2_dist.index(min(c2_dist))])
#Create the points for img1
point1 = np.zeros((3,2), dtype=np.float32)
point1[0][0]=corners1[a1_dist.index(min(a1_dist))][0]
point1[0][1]=corners1[a1_dist.index(min(a1_dist))][1]
point1[1][0]=corners1[b1_dist.index(min(b1_dist))][0]
point1[1][1]=corners1[b1_dist.index(min(b1_dist))][1]
point1[2][0]=corners1[c1_dist.index(min(c1_dist))][0]
point1[2][1]=corners1[c1_dist.index(min(c1_dist))][1]
#Create the points for img2
point2 = np.zeros((3,2), dtype=np.float32)
point2[0][0]=corners2[a2_dist.index(min(a2_dist))][0]
point2[0][1]=corners2[a2_dist.index(min(a2_dist))][1]
point2[1][0]=corners2[b2_dist.index(min(b2_dist))][0]
point2[1][1]=corners2[b2_dist.index(min(b2_dist))][1]
point2[2][0]=corners2[c2_dist.index(min(c2_dist))][0]
point2[2][1]=corners2[c2_dist.index(min(c2_dist))][1]
#Make sure points look ok:
print(point1)
print(point2)
#Transform the image
m = cv2.getAffineTransform(point2,point1)
image2Reg = cv2.warpAffine(img2, m, (w1, h1), borderValue=(255,255,255))
#Highlight found points in red:
img1[dst1>0.1*dst1.max()]=[0,0,255]
img2[dst2>0.1*dst2.max()]=[0,0,255]
#Output the images:
cv2.imwrite("output-img1-harris.jpg", img1)
cv2.imwrite("output-img2-harris.jpg", img2)
cv2.imwrite("output-harris-transform.jpg",image2Reg)
I am using the following code to overlay images taken using different microscopy. The two images describe the same tissue but with different techniques.
def match_images_using_orb(em_image, confocal_image):
gray_confocal = cv2.cvtColor(confocal_image, cv2.COLOR_RGB2GRAY)
#em_image is already gray
orb = cv2.ORB_create(500)
keypoints1, descriptors1 = orb.detectAndCompute(em_image, None)
keypoints2, descriptors2 = orb.detectAndCompute(gray_confocal, None)
#brute force matcher
matcher = cv2.BFMatcher(cv2.NORM_HAMMING)
matches = matcher.match(descriptors1, descriptors2)
matches.sort(key = lambda x: x.distance, reverse = False)
top_matches = int(len(matches) * 0.1)
matches = matches[:top_matches]
imMatches = cv2.drawMatches(em_image, keypoints1, gray_confocal, keypoints2, matches, None)
points1 = np.zeros((len(matches), 2), dtype = np.float32)
points2 = np.zeros((len(matches), 2), dtype = np.float32)
for i, match in enumerate(matches):
points1[i, :] = keypoints1[match.queryIdx].pt
points2[i, :] = keypoints2[match.trainIdx].pt
h, _ = cv2.findHomography(points1, points2, cv2.RANSAC)
height, width, _ = em_image.shape
try:
#exclude negative homography
if h[ h < 0].size == 0:
em_reg = cv2.warpPerspective(confocal_image, h, (width, height))
else:
return False
except:
return False
else:
return (imMatches, em_reg, h)
Since the two images are different but they share common marks, using this algorithm might not be right.
My target is to know where is the green color and red color located in the larger image. To do so, I need to rotate the the image according to homography depending on some land marks as shown in below image(just an example)
So my question is; if I already know some landmarks like the blue color in both of the two images (blue color) how to feed the homography of these landmarks to algorithm manually to rotate the small image in the write position so I can know where is green and red colors located (unknown).
My goal is to
deskew a scanned image such that its text is perfectly placed on top of the text of the original image. (subtracting the images would remove the text)
prevent any loss of information on the deskewed image
I use SURF features to feed the findHomography function. Then I use the warpPerspective function to transform the scanned image. The resulting image almost perfectly fits onto the original image.
However, the scanned image has content on its corners which is lost after the transformation because the text in the scanned image is smaller and has to be scaled up.
Deskewing an image that has slightly smaller text
Information at the borders of the image is cropped
To avoid any loss of information, I convert the image to RGBA and set the borderValue parameter in warpPerspective such that any added background has transparent color. I remove the transparent pixels after the transformation again. This procedure works but seems highly inefficient.
Question: I'm looking for a working code example (C++ or Python) that shows how to do this more efficiently.
Image has been deskewed and content is preserved. However, the text of the two pictures isn't on top of each other anymore
Text position is off because the warped image has a different size than what warpPerspective expected
After transforming the image the problem is that the two images aren't aligned anymore because the dimensions of the transformed image are different than what the warpPerspective method expected.
Question: How can I realign the two images? It would be great if there was a way to do incorporate this into the previous step already. Again, a working code example would be very helpful.
Here's the code that I have so far. It deskews the image while preserving its content, however, the text is not on top of the original text anymore.
import math
import cv2
import numpy as np
class Deskewer:
def __init__(self, hessianTreshold = 5000):
self.__hessianThresh = hessianTreshold
self.imgOrigGray, self.imgSkewed, self.imgSkewedGray = None, None, None
def start(self, imgOrig, imgSkewed):
self.imgOrigGray = cv2.cvtColor(imgOrig, cv2.COLOR_BGR2GRAY)
self.imgSkewed = imgSkewed # final transformation will be performed on color image
self.imgSkewedGray = cv2.cvtColor(imgSkewed, cv2.COLOR_BGR2GRAY) # prior calculation is faster on gray
kp1, des1, kp2, des2 = self.__detectFeatures()
goodMatches = self.__flannMatch(des1, des2)
MIN_MATCH_COUNT = 10
M = None
if len(goodMatches) > MIN_MATCH_COUNT:
M, _ = self.__findHomography(kp1, kp2, goodMatches)
else:
print("Not enough matches are found - %d/%d" % (len(goodMatches), MIN_MATCH_COUNT))
return
return self.__deskew(M)
def __detectFeatures(self):
surf = cv2.xfeatures2d.SURF_create(self.__hessianThresh)
kp1, des1 = surf.detectAndCompute(self.imgOrigGray, None)
kp2, des2 = surf.detectAndCompute(self.imgSkewedGray, None)
return kp1, des1, kp2, des2
def __flannMatch(self, des1, des2):
global matches
FLANN_INDEX_KDTREE = 0
index_params = dict(algorithm=FLANN_INDEX_KDTREE, trees=5)
search_params = dict(checks=50)
flann = cv2.FlannBasedMatcher(index_params, search_params)
matches = flann.knnMatch(des1, des2, k=2)
# store all the good matches as per Lowe's ratio test.
good = []
for m, n in matches:
if m.distance < 0.7 * n.distance:
good.append(m)
return good
def __findHomography(self, kp1, kp2, goodMatches):
src_pts = np.float32([kp1[m.queryIdx].pt for m in goodMatches
]).reshape(-1, 1, 2)
dst_pts = np.float32([kp2[m.trainIdx].pt for m in goodMatches
]).reshape(-1, 1, 2)
M, mask = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC, 5.0)
matchesMask = mask.ravel().tolist()
i = matchesMask.index(1)
# TODO: This is a matching point before the warpPerspective call. How can I calculate this point AFTER the call?
print("POINTS: object(", src_pts[i][0][1], ",", src_pts[i][0][0], ") - scene(", dst_pts[i][0][1], ",", dst_pts[i][0][0], ")")
return M, mask
def getComponents(self, M):
# ((translationx, translationy), rotation, (scalex, scaley), shear)
a = M[0, 0]
b = M[0, 1]
c = M[0, 2]
d = M[1, 0]
e = M[1, 1]
f = M[1, 2]
p = math.sqrt(a * a + b * b)
r = (a * e - b * d) / (p)
q = (a * d + b * e) / (a * e - b * d)
translation = (c, f)
scale = (p, r) # p = x-Axis, r = y-Axis
shear = q
theta = math.atan2(b, a)
degrees = math.atan2(b, a) * 180 / math.pi
return (translation, theta, degrees, scale, shear)
def __deskew(self, M):
# this info might come in handy here for calculating the dsize of warpPerspective?
translation, theta, degrees, scale, shear = self.getComponents(M)
# Alpha channel allows me to set unique feature to pixels that are created during warpPerspective
imSkewedAlpha = cv2.cvtColor(self.imgSkewed, cv2.COLOR_BGR2BGRA)
# These sizes have been randomly choosen to make sure that all the contents fit in the new canvas
height = 5000
width = 5000
shift = -500
M2 = np.array([[1, 0, shift],
[0, 1, shift],
[0, 0, 1]])
M3 = np.dot(M, M2)
# TODO: How can I calculate the dsize argument?
# Newly created pixels are set to transparent
im_out = cv2.warpPerspective(imSkewedAlpha, M3,
(height, width), flags=cv2.WARP_INVERSE_MAP, borderMode=cv2.BORDER_CONSTANT, borderValue=(255, 0, 0, 0))
# http://codereview.stackexchange.com/a/132933
# Mask of non-black pixels (assuming image has a single channel).
mask = im_out[:, :, 3] == 255
# Coordinates of non-black pixels.
coords = np.argwhere(mask)
# Bounding box of non-black pixels.
x0, y0 = coords.min(axis=0)
x1, y1 = coords.max(axis=0) + 1 # slices are exclusive at the top
# Get the contents of the bounding box.
cropped = im_out[x0:x1, y0:y1]
# TODO: The warped image needs to align nicely on the original image
return cropped
origImg = cv2.imread("Letter.png")
skewedImg = cv2.imread("A4.png")
deskewed = Deskewer().start(origImg, skewedImg)
cv2.imshow("Original", origImg)
cv2.imshow("Deskewed", deskewed)
cv2.waitKey(0)
Original and skewed image (with additional content) for testing
I have the SIFT keypoints of two images (calculated with Python + OpenCV 3).
I want to filter them by their y-coordinate.
Specifically, I want to remove all matching points whose difference of y-coordinate is higher than the image height divided by 10, for example:
If two matching points are A(x1, y1) and B(x2, y2):
if abs(y2 - y1) > imageHeight / 10 then remove that maching points.
What I have test
Here is the code I have tested. I'm removing keypoints, but not what I want to remove.
# Load the two images
img1 = cv2.imread(PATH + "image1.jpg", -1)
img2 = cv2.imread(PATH + "image2.jpg", -1)
# Get their dimensions
height, width = img1.shape[:2]
# Resize them (they are too big)
img1 = cv2.resize(img1, (width / 4, height / 4))
img2 = cv2.resize(img2, (width / 4, height / 4))
# Get the resized image's dimensions
height, width = img1.shape[:2]
# Initiate SIFT detector
sift = X2D.SIFT_create()
# find the keypoints and descriptors with SIFT
kp1, des1 = sift.detectAndCompute(img1,None)
kp2, des2 = sift.detectAndCompute(img2,None)
# BFMatcher with default params
bf = cv2.BFMatcher()
matches = bf.knnMatch(des1,des2, k=2)
### Here the filtering attempt ###
# Alloc a new vector for filtered matches
filteredMatches = [None] * len(matches)
# Counter that will count how many matches I have at the end
counter = 0
# for each match
for i in range(len(matches)):
# Get the "img1" heypoint
leftPoint = kp1[ matches[i][0].queryIdx ].pt #'left' image
# Get the "img2" keypoint
rightPoint = kp2[ matches[i][0].trainIdx ].pt #'right' image
# substract the y-coordinate of both points and compare
# with height / 10
if( abs(leftPoint[1] - rightPoint[1]) < height / 10):
# if the difference is lower than higher / 10, add it
# to the new list and increment the counter:
filteredMatches[counter] = matches[i]
counter += 1
# fix the filtered list size
matches = matches[:counter]
I'm not sure if I'm using queryIdx and trainIdx correctly, but according with this post (What is `query` and `train` in openCV features2D) I think so.
I have found the solution. First of all, according to drawMatchesKnn documentation:
keypoints1[i] has a corresponding point in keypoints2[matches[i]]
In my code, 'keypoints1' is kp1, 'keypoints2' is kp2 and 'matches' is matches.
The correspondency between kp1 and kp2 is: kp1[i] matches with kp2[ matches[i].trailIdx ].
Here the finally function which filter the keypoints removing all of them which y-coordinate is higher than the image's height * n, where n is a given number (between 0 and 1):
def filterMatches(kp1, kp2, matches, imgHeight, thresFactor = 0.4):
"""
Removes the matches that correspond to a pair of keypoints (kp1, kp2)
which y-coordinate difference is lower than imgHeight * thresFactor.
Args:
kp1 (array of cv2.KeyPoint): Key Points.
kp2 (array of cv2.KeyPoint): Key Points.
matches (array of cv2.DMATCH): Matches between kp1 and kp2.
imgHeight (Integer): height of the image that has produced kp1 or kp2.
thresFactor (Float): Use to calculate the threshold. Threshold is
imgHeight * thresFactor.
Returns:
array of cv2.DMATCH: filtered matches.
"""
filteredMatches = [None]*len(matches)
counter = 0
threshold = imgHeight * thresFactor
for i in range(len(kp1)):
srcPoint = kp1[ matches[i][0].queryIdx ].pt
dstPoint = kp2[ matches[i][0].trainIdx ].pt
diff = abs(srcPoint[1] - dstPoint[1])
if( diff < threshold):
filteredMatches[counter] = matches[i]
counter += 1
return filteredMatches[:counter]