Related
I need to determine the location of yogurts in the supermarket. Source photo looks like
With template:
I using SIFT to extract key points of template:
img1 = cv.imread('train.jpg')
sift = cv.SIFT_create()# queryImage
kp1, des1 = sift.detectAndCompute(img1, None)
path = glob.glob("template.jpg")
cv_img = []
l=0
for img in path:
img2 = cv.imread(img) # trainImage
# Initiate SIFT detector
# find the keypoints and descriptors with SIFT
kp2, des2 = sift.detectAndCompute(img2,None)
# FLANN parameters
FLANN_INDEX_KDTREE = 1
index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
search_params = dict(checks=50) # or pass empty dictionary
flann = cv.FlannBasedMatcher(index_params,search_params)
matches = flann.knnMatch(des1,des2,k=2)
# Need to draw only good matches, so create a mask
# ratio test as per Lowe's paper
if (l < len(matches)):
l = len(matches)
image = img2
match = matches
h_query, w_query, _= img2.shape
matchesMask = [[0,0] for i in range(len(match))]
good_matches = []
good_matches_indices = {}
for i,(m,n) in enumerate(match):
if m.distance < 0.7*n.distance:
matchesMask[i]=[1,0]
good_matches.append(m)
good_matches_indices[len(good_matches) - 1] = i
bboxes = []
src_pts = np.float32([ kp1[m.queryIdx].pt for m in good ]).reshape(-1,2)
dst_pts = np.float32([ kp2[m.trainIdx].pt for m in good ]).reshape(-1,2)
model, inliers = initialize_ransac(src_pts, dst_pts)
n_inliers = np.sum(inliers)
matched_indices = [good_matches_indices[idx] for idx in inliers.nonzero()[0]]
print(len(matched_indices))
model, inliers = ransac(
(src_pts, dst_pts),
AffineTransform, min_samples=4,
residual_threshold=4, max_trials=20000
)
n_inliers = np.sum(inliers)
print(n_inliers)
matched_indices = [good_matches_indices[idx] for idx in inliers.nonzero()[0]]
print(matched_indices)
q_coordinates = np.array([(0, 0), (h_query, w_query)])
coords = model.inverse(q_coordinates)
print(coords)
h_query, w_query,_ = img2.shape
q_coordinates = np.array([(0, 0), (h_query, w_query)])
coords = model.inverse(q_coordinates)
print(coords)
# bboxes_list.append((i, coords))
M, mask = cv.findHomography(src_pts, dst_pts, cv.RANSAC, 2)
draw_params = dict(matchColor = (0,255,0),
singlePointColor = (255,0,0),
matchesMask = matchesMask,
flags = cv.DrawMatchesFlags_DEFAULT)
img3 = cv.drawMatchesKnn(img1,kp1,image,kp2,match,None,**draw_params)
plt.imshow(img3),plt.show()
Result of SIFT looks like
The question is what is the best way to clasterise points to obtain rectangles, representing each yogurt? I tried RANSAC, but this method doesn't work in this case.
I am proposing an approach based on what is discussed in this paper. I have modified the approach a bit because the use-case is not entirely same but they do use SIFT features matching to locate multiple objects in video frames. They have used PCA for reducing time but that may not be required for still images.
Sorry I could not write a code for this as it will take a lot of time but I believe this should work to locate all the occurrences of the template object.
The modified approach is like this:
Divide the template image into regions: left, middle, right along the horizontal
and top, bottom along the vertical
Now when you match features between the template and source image, you will get features matched from some of the keypoints from these regions on multiple locations on the source image. You can use these keypoints to identify which region of the template is present at what location(s) in the source image. If there are overlapping regions i.e. keypoints from different regions matched with close keypoints in source image then that would mean a wrong match.
Mark each set of matching keypoints within a neighborhood on source image as left, center, right, top, bottom depending upon if they have majority matches from keypoints of a particular region in the template image.
Starting from each left region on source image move towards right and if we find a central region followed by a right region then this area of source image between regions marked as left and right, can be marked as location of one template object.
There could be overlapping objects which could result in a left region followed by another left region when moving in right direction from the left region. The area between the two left regions can be marked as one template object.
For further refined locations, each area of source image marked as one template object can be cropped and re-matched with the template image.
Try working spatially: for each key-point in img2 get some bounding box around and consider only the points in there for your ransac homography to check for best fit.
You can also work with overlapping windows and later discard similar resulting homographys
Here is you can do
Base Image = Whole picture of shelf
Template Image = Single product image
Get SIFT matches from both images. (base and template image)
Do feature matching.
Get all the points in base image which are matching. (refer to figure)
Create Cluster based on size of template image. (here threshold in 50px)
Get Bounding box of clusters.
Crop each bounding box cluter and check matches with template image.
Accept all cluters which has atleast minimum percentage of matched. (here taken minimum 10% of keypoints)
def plot_pts(img, pts):
img_plot = img.copy()
for i in range(len(pts)):
img_plot = cv2.circle(img_plot, (int(pts[i][0]), int(pts[i][1])), radius=7, color=(255, 0, 0), thickness=-1)
plt.figure(figsize=(20, 10))
plt.imshow(img_plot)
def plot_bbox(img, bbox_list):
img_plot = img.copy()
for i in range(len(bbox_list)):
start_pt = bbox_list[i][0]
end_pt = bbox_list[i][2]
img_plot = cv2.rectangle(img_plot, pt1=start_pt, pt2=end_pt, color=(255, 0, 0), thickness=2)
plt.figure(figsize=(20, 10))
plt.imshow(img_plot)
def get_distance(pt1, pt2):
x1, y1 = pt1
x2, y2 = pt2
return np.sqrt(np.square(x1 - x2) + np.square(y1 - y2))
def check_centroid(pt, centroid):
x, y = pt
cx, cy = centroid
distance = get_distance(pt1=(x, y), pt2=(cx, cy))
if distance < max_distance:
return True
else:
return False
def update_centroid(pt, centroids_list):
new_centroids_list = centroids_list.copy()
flag_new_centroid = True
for j, c in enumerate(centroids_list):
temp_centroid = np.mean(c, axis=0)
if_close = check_centroid(pt, temp_centroid)
if if_close:
new_centroids_list[j].append(pt)
flag_new_centroid = False
break
if flag_new_centroid:
new_centroids_list.append([pt])
new_centroids_list = recheck_centroid(new_centroids_list)
return new_centroids_list
def recheck_centroid(centroids_list):
new_centroids_list = [list(set(c)) for c in centroids_list]
return new_centroids_list
def get_bbox(pts):
minn_x, minn_y = np.min(pts, axis=0)
maxx_x, maxx_y = np.max(pts, axis=0)
return [[minn_x, minn_y], [maxx_x, minn_y], [maxx_x, maxx_y], [minn_x, maxx_y]]
class RotateAndTransform:
def __init__(self, path_img_ref):
self.path_img_ref = path_img_ref
self.ref_img = self._read_ref_image()
#sift
self.sift = cv2.SIFT_create()
#feature matching
self.bf = cv2.BFMatcher()
# FLANN parameters
FLANN_INDEX_KDTREE = 1
index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
search_params = dict(checks=50) # or pass empty dictionary
self.flann = cv2.FlannBasedMatcher(index_params,search_params)
def _read_ref_image(self):
ref_img = cv2.imread(self.path_img_ref, cv2.IMREAD_COLOR)
ref_img = cv2.cvtColor(ref_img, cv2.COLOR_BGR2RGB)
return ref_img
def read_src_image(self, path_img_src):
self.path_img_src = path_img_src
# read images
# ref_img = cv2.imread(self.path_img_ref, cv2.IMREAD_COLOR)
src_img = cv2.imread(path_img_src, cv2.IMREAD_COLOR)
src_img = cv2.cvtColor(src_img, cv2.COLOR_BGR2RGB)
return src_img
def convert_bw(self, img):
img_bw = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
return img_bw
def get_keypoints_descriptors(self, img_bw):
keypoints, descriptors = self.sift.detectAndCompute(img_bw,None)
return keypoints, descriptors
def get_matches(self, src_descriptors, ref_descriptors, threshold=0.6):
matches = self.bf.knnMatch(ref_descriptors, src_descriptors, k=2)
flann_matches = self.flann.knnMatch(ref_descriptors, src_descriptors,k=2)
good_matches = []
good_flann_matches = []
# Apply ratio test for Brute Force
for m,n in matches:
if m.distance <threshold*n.distance:
good_matches.append([m])
print(f'Numner of BF Match: {len(matches)}, Number of good BF Match: {len(good_matches)}')
# Apply ratio test for FLANN
for m,n in flann_matches:
if m.distance < threshold*n.distance:
good_flann_matches.append([m])
# matches = sorted(matches, key = lambda x:x.distance)
print(f'Numner of FLANN Match: {len(flann_matches)}, Number of good Flann Match: {len(good_flann_matches)}')
return good_matches, good_flann_matches
def get_src_dst_pts(self, good_flann_matches, ref_keypoints, src_keypoints):
pts_src = []
pts_ref = []
n = len(good_flann_matches)
for i in range(n):
ref_index = good_flann_matches[i][0].queryIdx
src_index = good_flann_matches[i][0].trainIdx
pts_src.append(src_keypoints[src_index].pt)
pts_ref.append(ref_keypoints[ref_index].pt)
return np.array(pts_src), np.array(pts_ref)
def extend_bbox(bbox, increment=0.1):
bbox_new = bbox.copy()
bbox_new[0] = [bbox_new[0][0] - int(bbox_new[0][0] * increment), bbox_new[0][1] - int(bbox_new[0][1] * increment)]
bbox_new[1] = [bbox_new[1][0] + int(bbox_new[1][0] * increment), bbox_new[1][1] - int(bbox_new[1][1] * increment)]
bbox_new[2] = [bbox_new[2][0] + int(bbox_new[2][0] * increment), bbox_new[2][1] + int(bbox_new[2][1] * increment)]
bbox_new[3] = [bbox_new[3][0] - int(bbox_new[3][0] * increment), bbox_new[3][1] + int(bbox_new[3][1] * increment)]
return bbox_new
def crop_bbox(img, bbox):
y, x = bbox[0]
h, w = bbox[1][0] - bbox[0][0], bbox[2][1] - bbox[0][1]
return img[x: x + w, y: y + h, :]
base_img = cv2.imread(path_img_base)
ref_img = cv2.imread(path_img_ref)
rnt = RotateAndTransform(path_img_ref)
ref_img_bw = rnt.convert_bw(img=rnt.ref_img)
ref_keypoints, ref_descriptors = rnt.get_keypoints_descriptors(ref_img_bw)
base_img = rnt.read_src_image(path_img_src = path_img_base)
base_img_bw = rnt.convert_bw(img=base_img)
base_keypoints, base_descriptors = rnt.get_keypoints_descriptors(base_img_bw)
good_matches, good_flann_matches = rnt.get_matches(src_descriptors=base_descriptors, ref_descriptors=ref_descriptors, threshold=0.6)
ref_points = []
for gm in good_flann_matches:
x, y = ref_keypoints[gm[0].queryIdx].pt
x, y = int(x), int(y)
ref_points.append((x, y))
max_distance = 50
centroids = [[ref_points[0]]]
for i in tqdm(range(len(ref_points))):
pt = ref_points[i]
centroids = update_centroid(pt, centroids)
bbox = [get_bbox(c) for c in centroi[![enter image description here][1]][1]ds]
centroids = [np.mean(c, axis=0) for c in centroids]
print(f'Number of Points: {len(good_flann_matches)}, centroids: {len(centroids)}')
data = []
for i in range(len(bbox)):
temp_crop_img = crop_bbox(ref_img, extend_bbox(bbox[i], 0.01))
temp_crop_img_bw = rnt.convert_bw(img=temp_crop_img)
temp_crop_keypoints, temp_crop_descriptors = rnt.get_keypoints_descriptors(temp_crop_img_bw)
good_matches, good_flann_matches = rnt.get_matches(src_descriptors=base_descriptors, ref_descriptors=temp_crop_descriptors, threshold=0.6)
temp_data = {'image': temp_crop_img,
'num_matched': len(good_flann_matches),
'total_keypoints' : len(base_keypoints),
}
data.append(temp_data)
filter_data = [{'num_matched' : i['num_matched'], 'image': i['image']} for i in data if i['num_matched'] > 25]
for i in range(len(filter_data)):
temp_num_match = filter_data[i]['num_matched']
plt.figure()
plt.title(f'num matched: {temp_num_match}')
plt.imshow(filter_data[i]['image'])
First you could detect any item that is on the shelf with a network like this, it's pre-trained in this exact context and works pretty well. You should also rectify the image before feeding it to the network. You will obtain bounding boxes for every product (maybe some false positive/negative, but that's another issue). Then you can match each box with the template using SIFT and calculating a score (it's up to you define which score works), but I suggest to use another approach like a siamese network if you a consistent dataset.
I am stitching multiple images. While stitching two images it is showing dashed black line in between stitching like below.
Has anyone knows about this how I can remove or get rid of this black dashed line ?
main part of stitching code which stitches two images and calls next image with result of previous stitched images untill all images gets over:
detector = cv2.xfeatures2d.SURF_create(400)
gray1 = cv2.cvtColor(image1,cv2.COLOR_BGR2GRAY)
ret1, mask1 = cv2.threshold(gray1,1,255,cv2.THRESH_BINARY)
kp1, descriptors1 = detector.detectAndCompute(gray1,mask1)
gray2 = cv2.cvtColor(image2,cv2.COLOR_BGR2GRAY)
ret2, mask2 = cv2.threshold(gray2,1,255,cv2.THRESH_BINARY)
kp2, descriptors2 = detector.detectAndCompute(gray2,mask2)
keypoints1Im = cv2.drawKeypoints(image1, kp1, outImage = cv2.DRAW_MATCHES_FLAGS_DEFAULT, color=(0,0,255))
util.display("KEYPOINTS",keypoints1Im)
keypoints2Im = cv2.drawKeypoints(image2, kp2, outImage = cv2.DRAW_MATCHES_FLAGS_DEFAULT, color=(0,0,255))
util.display("KEYPOINTS",keypoints2Im)
matcher = cv2.BFMatcher()
matches = matcher.knnMatch(descriptors2,descriptors1, k=2)
good = []
for m, n in matches:
if m.distance < 0.55 * n.distance:
good.append(m)
print (str(len(good)) + " Matches were Found")
if len(good) <= 10:
return image1
matches = copy.copy(good)
matchDrawing = util.drawMatches(gray2,kp2,gray1,kp1,matches)
util.display("matches",matchDrawing)
src_pts = np.float32([ kp2[m.queryIdx].pt for m in matches ]).reshape(-1,1,2)
dst_pts = np.float32([ kp1[m.trainIdx].pt for m in matches ]).reshape(-1,1,2)
A = cv2.estimateRigidTransform(src_pts,dst_pts,fullAffine=False)
if A is None:
HomogResult = cv2.findHomography(src_pts,dst_pts,method=cv2.RANSAC)
H = HomogResult[0]
height1,width1 = image1.shape[:2]
height2,width2 = image2.shape[:2]
corners1 = np.float32(([0,0],[0,height1],[width1,height1],[width1,0]))
corners2 = np.float32(([0,0],[0,height2],[width2,height2],[width2,0]))
warpedCorners2 = np.zeros((4,2))
for i in range(0,4):
cornerX = corners2[i,0]
cornerY = corners2[i,1]
if A is not None: #check if we're working with affine transform or perspective transform
warpedCorners2[i,0] = A[0,0]*cornerX + A[0,1]*cornerY + A[0,2]
warpedCorners2[i,1] = A[1,0]*cornerX + A[1,1]*cornerY + A[1,2]
else:
warpedCorners2[i,0] = (H[0,0]*cornerX + H[0,1]*cornerY + H[0,2])/(H[2,0]*cornerX + H[2,1]*cornerY + H[2,2])
warpedCorners2[i,1] = (H[1,0]*cornerX + H[1,1]*cornerY + H[1,2])/(H[2,0]*cornerX + H[2,1]*cornerY + H[2,2])
allCorners = np.concatenate((corners1, warpedCorners2), axis=0)
[xMin, yMin] = np.int32(allCorners.min(axis=0).ravel() - 0.5)
[xMax, yMax] = np.int32(allCorners.max(axis=0).ravel() + 0.5)
translation = np.float32(([1,0,-1*xMin],[0,1,-1*yMin],[0,0,1]))
warpedResImg = cv2.warpPerspective(image1, translation, (xMax-xMin, yMax-yMin))
if A is None:
fullTransformation = np.dot(translation,H) #again, images must be translated to be 100% visible in new canvas
warpedImage2 = cv2.warpPerspective(image2, fullTransformation, (xMax-xMin, yMax-yMin))
else:
warpedImageTemp = cv2.warpPerspective(image2, translation, (xMax-xMin, yMax-yMin))
warpedImage2 = cv2.warpAffine(warpedImageTemp, A, (xMax-xMin, yMax-yMin))
result = np.where(warpedImage2 != 0, warpedImage2, warpedResImg)
Please help me out. Thanks.
Edit:
Input image1(resized)
Input image2(resized)
Result(resized)
Update:
Result after #fmw42 anwser:
The problem arises because when you do the warping, the border pixels of the image get resampled/interpolated with black background pixels. This leaves a non-zero border around your warped image of varying values that show as your dashed dark line when merged with the other image. This happens because your merge test is binary, tested with != 0.
So one simple thing you can do is mask the warped image in Python/OpenCV to get its bounds from the black background outside the image and then erode the mask. Then use the mask to erode the image boundary. This can be achieve by the following changes to your last lines of code presented as follows:
if A is None:
fullTransformation = np.dot(translation,H) #again, images must be translated to be 100% visible in new canvas
warpedImage2 = cv2.warpPerspective(image2, fullTransformation, (xMax-xMin, yMax-yMin))
else:
warpedImageTemp = cv2.warpPerspective(image2, translation, (xMax-xMin, yMax-yMin))
warpedImage2 = cv2.warpAffine(warpedImageTemp, A, (xMax-xMin, yMax-yMin))
mask2 = cv2.threshold(warpedImage2, 0, 255, cv2.THRESH_BINARY)[1]
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (3,3))
mask2 = cv2.morphologyEx(mask2, cv2.MORPH_ERODE, kernel)
warpedImage2[mask2==0] = 0
result = np.where(warpedImage2 != 0, warpedImage2, warpedResImg)
I simply added the following code lines to your code:
mask2 = cv2.threshold(warpedImage2, 0, 255, cv2.THRESH_BINARY)[1]
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (3,3))
mask2 = cv2.morphologyEx(mask2, cv2.MORPH_ERODE, kernel)
warpedImage2[mask2==0] = 0
You can increase the kernel size if desired to erode more.
Here is the before and after. Note that I did not have SURF and tried to use ORB, which did not align well. So your roads do not align. But the mismatch due to misalignment emphasizes the issue as it shows the dashed jagged black border line. The fact that ORB did not work or I do not have proper code from above to make it align is not important. The masking does what I think you want and is extendable to the processing of all your images.
The other thing that can be done in combination with the above is to feather the mask and then ramp blend the two images using the mask. This is done by blurring the mask (a bit more) and then stretching the values over the inside half of the blurred border and making the ramp only on the outside half of the blurred border. Then blend the two images with the ramped mask and its inverse as follows for the same code as above.
if A is None:
fullTransformation = np.dot(translation,H) #again, images must be translated to be 100% visible in new canvas
warpedImage2 = cv2.warpPerspective(image2, fullTransformation, (xMax-xMin, yMax-yMin))
else:
warpedImageTemp = cv2.warpPerspective(image2, translation, (xMax-xMin, yMax-yMin))
warpedImage2 = cv2.warpAffine(warpedImageTemp, A, (xMax-xMin, yMax-yMin))
mask2 = cv2.threshold(warpedImage2, 0, 255, cv2.THRESH_BINARY)[1]
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (3,3))
mask2 = cv2.morphologyEx(mask2, cv2.MORPH_ERODE, kernel)
warpedImage2[mask2==0] = 0
mask2 = cv2.blur(mask2, (5,5))
mask2 = skimage.exposure.rescale_intensity(mask2, in_range=(127.5,255), out_range=(0,255)).astype(np.float64)
result = (warpedImage2 * mask2 + warpedResImg * (255 - mask2))/255
result = result.clip(0,255).astype(np.uint8)
cv2.imwrite("image1_image2_merged3.png", result)
The result when compared to the original composite is as follows:
ADDITION
I have corrected my ORB code to reverse the use of images and now it aligns. So here are all 3 techniques: the original, the one that only uses a binary mask and the one that uses a ramped mask for blending (all as described above).
ADDITION2
Here are the 3 requested images: original, binary masked, ramped mask blending.
Here is my ORB code for the last version above
I tried to change as little as possible from your code, except I had to use ORB and I had to swap the names image1 and image2 near the end.
import cv2
import matplotlib.pyplot as plt
import numpy as np
import itertools
from scipy.interpolate import UnivariateSpline
from skimage.exposure import rescale_intensity
image1 = cv2.imread("image1.jpg")
image2 = cv2.imread("image2.jpg")
gray1 = cv2.cvtColor(image1, cv2.COLOR_BGR2GRAY)
gray2 = cv2.cvtColor(image2, cv2.COLOR_BGR2GRAY)
# Detect ORB features and compute descriptors.
MAX_FEATURES = 500
GOOD_MATCH_PERCENT = 0.15
orb = cv2.ORB_create(MAX_FEATURES)
keypoints1, descriptors1 = orb.detectAndCompute(gray1, None)
keypoints2, descriptors2 = orb.detectAndCompute(gray2, None)
# Match features.
matcher = cv2.DescriptorMatcher_create(cv2.DESCRIPTOR_MATCHER_BRUTEFORCE_HAMMING)
matches = matcher.match(descriptors1, descriptors2, None)
# Sort matches by score
matches.sort(key=lambda x: x.distance, reverse=False)
# Remove not so good matches
numGoodMatches = int(len(matches) * GOOD_MATCH_PERCENT)
matches = matches[:numGoodMatches]
# Draw top matches
imMatches = cv2.drawMatches(image1, keypoints1, image2, keypoints2, matches, None)
cv2.imwrite("/Users/fred/desktop/image1_image2_matches.png", imMatches)
# Extract location of good matches
points1 = np.zeros((len(matches), 2), dtype=np.float32)
points2 = np.zeros((len(matches), 2), dtype=np.float32)
for i, match in enumerate(matches):
points1[i, :] = keypoints1[match.queryIdx].pt
points2[i, :] = keypoints2[match.trainIdx].pt
print(points1)
print("")
print(points2)
A = cv2.estimateRigidTransform(points1,points2,fullAffine=False)
#print(A)
if A is None:
HomogResult = cv2.findHomography(points1,points2,method=cv2.RANSAC)
H = HomogResult[0]
height1,width1 = image1.shape[:2]
height2,width2 = image2.shape[:2]
corners1 = np.float32(([0,0],[0,height1],[width1,height1],[width1,0]))
corners2 = np.float32(([0,0],[0,height2],[width2,height2],[width2,0]))
warpedCorners2 = np.zeros((4,2))
# project corners2 into domain of image1 from A affine or H homography
for i in range(0,4):
cornerX = corners2[i,0]
cornerY = corners2[i,1]
if A is not None: #check if we're working with affine transform or perspective transform
warpedCorners2[i,0] = A[0,0]*cornerX + A[0,1]*cornerY + A[0,2]
warpedCorners2[i,1] = A[1,0]*cornerX + A[1,1]*cornerY + A[1,2]
else:
warpedCorners2[i,0] = (H[0,0]*cornerX + H[0,1]*cornerY + H[0,2])/(H[2,0]*cornerX + H[2,1]*cornerY + H[2,2])
warpedCorners2[i,1] = (H[1,0]*cornerX + H[1,1]*cornerY + H[1,2])/(H[2,0]*cornerX + H[2,1]*cornerY + H[2,2])
allCorners = np.concatenate((corners1, warpedCorners2), axis=0)
[xMin, yMin] = np.int32(allCorners.min(axis=0).ravel() - 0.5)
[xMax, yMax] = np.int32(allCorners.max(axis=0).ravel() + 0.5)
translation = np.float32(([1,0,-1*xMin],[0,1,-1*yMin],[0,0,1]))
warpedResImg = cv2.warpPerspective(image2, translation, (xMax-xMin, yMax-yMin))
if A is None:
fullTransformation = np.dot(translation,H) #again, images must be translated to be 100% visible in new canvas
warpedImage2 = cv2.warpPerspective(image2, fullTransformation, (xMax-xMin, yMax-yMin))
else:
warpedImageTemp = cv2.warpPerspective(image1, translation, (xMax-xMin, yMax-yMin))
warpedImage2 = cv2.warpAffine(warpedImageTemp, A, (xMax-xMin, yMax-yMin))
mask2 = cv2.threshold(warpedImage2, 0, 255, cv2.THRESH_BINARY)[1]
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (3,3))
mask2 = cv2.morphologyEx(mask2, cv2.MORPH_ERODE, kernel)
warpedImage2[mask2==0] = 0
mask2 = cv2.blur(mask2, (5,5))
mask2 = rescale_intensity(mask2, in_range=(127.5,255), out_range=(0,255)).astype(np.float64)
result = (warpedImage2 * mask2 + warpedResImg * (255 - mask2))/255
result = result.clip(0,255).astype(np.uint8)
cv2.imwrite("image1_image2_merged2.png", result)
You had the following. Note where the names, image1 and image2 are being used compared to my code above.
warpedResImg = cv2.warpPerspective(image1, translation, (xMax-xMin, yMax-yMin))
if A is None:
fullTransformation = np.dot(translation,H) #again, images must be translated to be 100% visible in new canvas
warpedImage2 = cv2.warpPerspective(image2, fullTransformation, (xMax-xMin, yMax-yMin))
else:
warpedImageTemp = cv2.warpPerspective(image2, translation, (xMax-xMin, yMax-yMin))
warpedImage2 = cv2.warpAffine(warpedImageTemp, A, (xMax-xMin, yMax-yMin))
Horizontal Gluing
I will focus on one of the cuts as a prove of concept. I agree with the comments that your code is a bit lengthy and hard to work with. So step one is to glue the pictures myself.
import cv2
import matplotlib.pyplot as plt
import numpy as np
import itertools
from scipy.interpolate import UnivariateSpline
upper_image = cv2.cvtColor(cv2.imread('yQv6W.jpg'), cv2.COLOR_BGR2RGB)/255
lower_image = cv2.cvtColor(cv2.imread('zoWJv.jpg'), cv2.COLOR_BGR2RGB)/255
result_image = np.zeros((466+139,700+22,3))
result_image[139:139+lower_image.shape[0],:lower_image.shape[1]] = lower_image
result_image[0:upper_image.shape[0], 22:22+upper_image.shape[1]] = upper_image
plt.imshow(result_image)
Ok no dashed black line but I admit not perfect either. So the next step is to align at least the street and the little way at the very right of the picture. For that I will need to shrink the picture to non integer size and turn that back into a grid. I will use a knn like method for that.
Edit: As requested in the comments I'll explain the shrinking a bit more detailed since it would have to be done by hand again for an other stitching. The magic happens in the line (I replaced n by it's value)
f = UnivariateSpline([0,290,510,685],[0,310,530,700])
I tried first to scale the lower picture in the x-direction to make the little way on the very right fit the upper image. Unfortunately then the street wouldn't fit the street. So what I do is scale down according to the above function. At pixel 0 I still want to have pixel zero, at 290 I want to have what used to be at 310 and so on.
Notice that 290,510 and 310,530 are the new respectively old x-coordinates of street and way at the hight of the gluing.
class Image_knn():
def fit(self, image):
self.image = image.astype('float')
def predict(self, x, y):
image = self.image
weights_x = [(1-(x % 1)).reshape(*x.shape,1), (x % 1).reshape(*x.shape,1)]
weights_y = [(1-(y % 1)).reshape(*x.shape,1), (y % 1).reshape(*x.shape,1)]
start_x = np.floor(x).astype('int')
start_y = np.floor(y).astype('int')
return sum([image[np.clip(np.floor(start_x + x), 0, image.shape[0]-1).astype('int'),
np.clip(np.floor(start_y + y), 0, image.shape[1]-1).astype('int')] * weights_x[x]*weights_y[y]
for x,y in itertools.product(range(2),range(2))])
image_model = Image_knn()
image_model.fit(lower_image)
n = 685
f = UnivariateSpline([0,290,510,n],[0,310,530,700])
np.linspace(0,lower_image.shape[1],n)
yspace = f(np.arange(n))
result_image = np.zeros((466+139,700+22, 3))
a,b = np.meshgrid(np.arange(0,lower_image.shape[0]), yspace)
result_image[139:139+lower_image.shape[0],:n] = np.transpose(image_model.predict(a,b), [1,0,2])
result_image[0:upper_image.shape[0], 22:22+upper_image.shape[1]] = upper_image
plt.imshow(result_image, 'gray')
Much better, no black line but maybe we can still smoothen the cut a bit. I figured if I take convex combinations of upper and lower image at the cut it would look much better.
result_image = np.zeros((466+139,700+22,3))
a,b = np.meshgrid(np.arange(0,lower_image.shape[0]), yspace)
result_image[139:139+lower_image.shape[0],:n] = np.transpose(image_model.predict(a,b), [1,0,2])
transition_range = 10
result_image[0:upper_image.shape[0]-transition_range, 22:22+upper_image.shape[1]] = upper_image[:-transition_range,:]
transition_pixcels = upper_image[-transition_range:,:]*np.linspace(1,0,transition_range).reshape(-1,1,1)
result_image[upper_image.shape[0]-transition_range:upper_image.shape[0], 22:22+upper_image.shape[1]] *= np.linspace(0,1,transition_range).reshape(-1,1,1)
result_image[upper_image.shape[0]-transition_range:upper_image.shape[0], 22:22+upper_image.shape[1]] += transition_pixcels
plt.imshow(result_image)
plt.savefig('text.jpg')
Tilted Gluing
For completeness here also a version gluing at the top with a tilted bottom. I attach the pictures at a point and turn around that fixed point by a few degrees. And again I am correcting some very slight non alignements in the end. To get the coordinates of that I am using jupyter lab and %matplotlib widget.
fixed_point_upper = np.array([139,379])
fixed_point_lower = np.array([0,400])
angle = np.deg2rad(2)
down_dir = np.array([np.sin(angle+np.pi/2),np.cos(angle+np.pi/2)])
right_dir = np.array([np.sin(angle),np.cos(angle)])
result_image_height = np.ceil((fixed_point_upper+lower_image.shape[0]*down_dir+(lower_image.shape[1]-fixed_point_lower[1])*right_dir)[0]).astype('int')
right_shift = np.ceil(-(fixed_point_upper+lower_image.shape[0]*down_dir-fixed_point_lower[1]*right_dir)[1]).astype('int')
result_image_width = right_shift+upper_image.shape[1]
result_image = np.zeros([result_image_height, result_image_width,3])
fixed_point_result = np.array([fixed_point_upper[0],fixed_point_upper[1]+right_shift])
lower_top_left = fixed_point_result-fixed_point_lower[1]*right_dir
result_image[:upper_image.shape[0],-upper_image.shape[1]:] = upper_image
# calculate points in lower_image
result_coordinates = np.stack(np.where(np.ones(result_image.shape[:2],dtype='bool')),axis=1)
lower_coordinates = np.stack([(result_coordinates-lower_top_left)#down_dir,(result_coordinates-lower_top_left)#right_dir],axis=1)
mask = (0 <= lower_coordinates[:,0]) & (0 <= lower_coordinates[:,1]) \
& (lower_coordinates[:,0] <= lower_image.shape[0]) & (lower_coordinates[:,1] <= lower_image.shape[1])
result_coordinates = result_coordinates[mask]
lower_coordinates = lower_coordinates[mask]
# COORDINATES ON RESULT IMAGE
# left street 254
# left sides of houses 295, 420, 505
# right small street, both sides big street 590,635,664
# COORDINATES ON LOWER IMAGE
# left street 234
# left sides of houses 280, 399, 486
# right small street, both sides big street 571, 617, 642
def coord_transform(y):
return (y-lower_top_left[1])/right_dir[1]
y = tuple(map(coord_transform, [lower_top_left[1], 254, 295, 420, 505, 589, 635, 664]))
f = UnivariateSpline(y,[0, 234, 280, 399, 486, 571, 617, 642])
result_image[result_coordinates[:,0],result_coordinates[:,1]] = image_model.predict(lower_coordinates[:,0],np.vectorize(f)(lower_coordinates[:,1]))
I want to stitch two panoramic images using homography matrix in OpenCv. I found 3x3 homography matrix,
but I can't stitch two images. I must stitch two images by hand(no build-in function).
Here is my code:
import cv2
import numpy as np
MIN_MATCH_COUNT = 10
img1 = cv2.imread("pano1/cyl_image00.png")
img2 = cv2.imread("pano1/cyl_image01.png")
orb = cv2.ORB_create()
kp1, des1 = orb.detectAndCompute(img1, None)
kp2, des2 = orb.detectAndCompute(img2, None)
FLANN_INDEX_KDTREE = 0
index_params = dict(algorithm=FLANN_INDEX_KDTREE, trees=5)
search_params = dict(checks=50)
flann = cv2.FlannBasedMatcher(index_params, search_params)
des1 = np.float32(des1)
des2 = np.float32(des2)
matches = flann.knnMatch(des1, des2, k=2)
goodMatches = []
for m, n in matches:
if m.distance < 0.7 * n.distance:
goodMatches.append(m)
src_pts = 0
dst_pts = 0
if len(goodMatches) > MIN_MATCH_COUNT:
dst_pts = np.float32([kp1[m.queryIdx].pt for m in goodMatches]).reshape(-1, 2)
src_pts = np.float32([kp2[m.trainIdx].pt for m in goodMatches]).reshape(-1, 2)
def generateRandom(src_Pts, dest_Pts, N):
r = np.random.choice(len(src_Pts), N)
src = [src_Pts[i] for i in r]
dest = [dest_Pts[i] for i in r]
return np.asarray(src, dtype=np.float32), np.asarray(dest, dtype=np.float32)
def findH(src, dest, N):
A = []
for i in range(N):
x, y = src[i][0], src[i][1]
xp, yp = dest[i][0], dest[i][1]
A.append([x, y, 1, 0, 0, 0, -x * xp, -xp * y, -xp])
A.append([0, 0, 0, x, y, 1, -yp * x, -yp * y, -yp])
A = np.asarray(A)
U, S, Vh = np.linalg.svd(A)
L = Vh[-1, :] / Vh[-1, -1]
H = L.reshape(3, 3)
return H
def ransacHomography(src_Pts, dst_Pts):
maxI = 0
maxLSrc = []
maxLDest = []
for i in range(70):
srcP, destP = generateRandom(src_Pts, dst_Pts, 4)
H = findH(srcP, destP, 4)
inlines = 0
linesSrc = []
lineDest = []
for p1, p2 in zip(src_Pts, dst_Pts):
p1U = (np.append(p1, 1)).reshape(3, 1)
p2e = H.dot(p1U)
p2e = (p2e / p2e[2])[:2].reshape(1, 2)[0]
if cv2.norm(p2 - p2e) < 10:
inlines += 1
linesSrc.append(p1)
lineDest.append(p2)
if inlines > maxI:
maxI = inlines
maxLSrc = linesSrc.copy()
maxLSrc = np.asarray(maxLSrc, dtype=np.float32)
maxLDest = lineDest.copy()
maxLDest = np.asarray(maxLDest, dtype=np.float32)
Hf = findH(maxLSrc, maxLDest, maxI)
return Hf
H = ransacHomography(src_pts, dst_pts)
So far, so good. I found homography matrix(H).
Next, I tried to stitch two panoramic images.
First, I create a big array to stitch images(img3).
I copied img1 to the first half of img3.
I tried to find new coordinates for img2 through homography matrix and I copied new img2 coordinates to img3.
Here is my code:
height1, width1, rgb1 = img1.shape
height2, width2, rgb2 = img2.shape
img3 = np.empty((height1, width1+width2, 3))
img3[:, 0:width1] = img1/255.0
for i in range(len(img2)):
for j in range(len(img2[0])):
pp = H.dot(np.array([[i], [j], [1]]))
pp = (pp / pp[2]).reshape(1, 3)[0]
img3[int(round(pp[0])), int(round(pp[1]))] = img2[i, j]/255.0
But this part is not working.
How can I solve this problem?
Once you have the Homography matrix you need to transform one of the images to have the same perspective as the other. This is done using the warpPerspective function in OpenCV. Once you've done the transformation, it's time to concatenate the images.
Let's say you want to transform img_1 into the perspective of img_2 and that you already have the Homography matrix H
dst = cv2.warpPerspective(img_1, H, ((img_1.shape[1] + img_2.shape[1]), img_2.shape[0])) #wraped image
# now paste them together
dst[0:img_2.shape[0], 0:img_2.shape[1]] = img_2
dst[0:img_1.shape[0], 0:img_1.shape[1]] = img_1
Also note that OpenCV already has a build in RANSAC Homography finder
H, masked = cv2.findHomography(src, dst, cv2.RANSAC, 5.0)
So it can save you a lot of code.
Check out these tutorial for more details
https://medium.com/#navekshasood/image-stitching-to-create-a-panorama-5e030ecc8f7
https://medium.com/analytics-vidhya/image-stitching-with-opencv-and-python-1ebd9e0a6d78
My goal is to
deskew a scanned image such that its text is perfectly placed on top of the text of the original image. (subtracting the images would remove the text)
prevent any loss of information on the deskewed image
I use SURF features to feed the findHomography function. Then I use the warpPerspective function to transform the scanned image. The resulting image almost perfectly fits onto the original image.
However, the scanned image has content on its corners which is lost after the transformation because the text in the scanned image is smaller and has to be scaled up.
Deskewing an image that has slightly smaller text
Information at the borders of the image is cropped
To avoid any loss of information, I convert the image to RGBA and set the borderValue parameter in warpPerspective such that any added background has transparent color. I remove the transparent pixels after the transformation again. This procedure works but seems highly inefficient.
Question: I'm looking for a working code example (C++ or Python) that shows how to do this more efficiently.
Image has been deskewed and content is preserved. However, the text of the two pictures isn't on top of each other anymore
Text position is off because the warped image has a different size than what warpPerspective expected
After transforming the image the problem is that the two images aren't aligned anymore because the dimensions of the transformed image are different than what the warpPerspective method expected.
Question: How can I realign the two images? It would be great if there was a way to do incorporate this into the previous step already. Again, a working code example would be very helpful.
Here's the code that I have so far. It deskews the image while preserving its content, however, the text is not on top of the original text anymore.
import math
import cv2
import numpy as np
class Deskewer:
def __init__(self, hessianTreshold = 5000):
self.__hessianThresh = hessianTreshold
self.imgOrigGray, self.imgSkewed, self.imgSkewedGray = None, None, None
def start(self, imgOrig, imgSkewed):
self.imgOrigGray = cv2.cvtColor(imgOrig, cv2.COLOR_BGR2GRAY)
self.imgSkewed = imgSkewed # final transformation will be performed on color image
self.imgSkewedGray = cv2.cvtColor(imgSkewed, cv2.COLOR_BGR2GRAY) # prior calculation is faster on gray
kp1, des1, kp2, des2 = self.__detectFeatures()
goodMatches = self.__flannMatch(des1, des2)
MIN_MATCH_COUNT = 10
M = None
if len(goodMatches) > MIN_MATCH_COUNT:
M, _ = self.__findHomography(kp1, kp2, goodMatches)
else:
print("Not enough matches are found - %d/%d" % (len(goodMatches), MIN_MATCH_COUNT))
return
return self.__deskew(M)
def __detectFeatures(self):
surf = cv2.xfeatures2d.SURF_create(self.__hessianThresh)
kp1, des1 = surf.detectAndCompute(self.imgOrigGray, None)
kp2, des2 = surf.detectAndCompute(self.imgSkewedGray, None)
return kp1, des1, kp2, des2
def __flannMatch(self, des1, des2):
global matches
FLANN_INDEX_KDTREE = 0
index_params = dict(algorithm=FLANN_INDEX_KDTREE, trees=5)
search_params = dict(checks=50)
flann = cv2.FlannBasedMatcher(index_params, search_params)
matches = flann.knnMatch(des1, des2, k=2)
# store all the good matches as per Lowe's ratio test.
good = []
for m, n in matches:
if m.distance < 0.7 * n.distance:
good.append(m)
return good
def __findHomography(self, kp1, kp2, goodMatches):
src_pts = np.float32([kp1[m.queryIdx].pt for m in goodMatches
]).reshape(-1, 1, 2)
dst_pts = np.float32([kp2[m.trainIdx].pt for m in goodMatches
]).reshape(-1, 1, 2)
M, mask = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC, 5.0)
matchesMask = mask.ravel().tolist()
i = matchesMask.index(1)
# TODO: This is a matching point before the warpPerspective call. How can I calculate this point AFTER the call?
print("POINTS: object(", src_pts[i][0][1], ",", src_pts[i][0][0], ") - scene(", dst_pts[i][0][1], ",", dst_pts[i][0][0], ")")
return M, mask
def getComponents(self, M):
# ((translationx, translationy), rotation, (scalex, scaley), shear)
a = M[0, 0]
b = M[0, 1]
c = M[0, 2]
d = M[1, 0]
e = M[1, 1]
f = M[1, 2]
p = math.sqrt(a * a + b * b)
r = (a * e - b * d) / (p)
q = (a * d + b * e) / (a * e - b * d)
translation = (c, f)
scale = (p, r) # p = x-Axis, r = y-Axis
shear = q
theta = math.atan2(b, a)
degrees = math.atan2(b, a) * 180 / math.pi
return (translation, theta, degrees, scale, shear)
def __deskew(self, M):
# this info might come in handy here for calculating the dsize of warpPerspective?
translation, theta, degrees, scale, shear = self.getComponents(M)
# Alpha channel allows me to set unique feature to pixels that are created during warpPerspective
imSkewedAlpha = cv2.cvtColor(self.imgSkewed, cv2.COLOR_BGR2BGRA)
# These sizes have been randomly choosen to make sure that all the contents fit in the new canvas
height = 5000
width = 5000
shift = -500
M2 = np.array([[1, 0, shift],
[0, 1, shift],
[0, 0, 1]])
M3 = np.dot(M, M2)
# TODO: How can I calculate the dsize argument?
# Newly created pixels are set to transparent
im_out = cv2.warpPerspective(imSkewedAlpha, M3,
(height, width), flags=cv2.WARP_INVERSE_MAP, borderMode=cv2.BORDER_CONSTANT, borderValue=(255, 0, 0, 0))
# http://codereview.stackexchange.com/a/132933
# Mask of non-black pixels (assuming image has a single channel).
mask = im_out[:, :, 3] == 255
# Coordinates of non-black pixels.
coords = np.argwhere(mask)
# Bounding box of non-black pixels.
x0, y0 = coords.min(axis=0)
x1, y1 = coords.max(axis=0) + 1 # slices are exclusive at the top
# Get the contents of the bounding box.
cropped = im_out[x0:x1, y0:y1]
# TODO: The warped image needs to align nicely on the original image
return cropped
origImg = cv2.imread("Letter.png")
skewedImg = cv2.imread("A4.png")
deskewed = Deskewer().start(origImg, skewedImg)
cv2.imshow("Original", origImg)
cv2.imshow("Deskewed", deskewed)
cv2.waitKey(0)
Original and skewed image (with additional content) for testing
Problem statement: An image A is projected through a projector, goes through a microscope and the projected image is captured via a camera through the same microscope as image B. Due to the optical elements, the B is rotated, sheared and distorted with respect to A. Now, I need to transform A into A' before projection such that B is as close to A as possible.
Initial approach: I took a checkerboard pattern and rotated it at various angles (36, 72, 108, ... 324 degrees) and projected to get a series of A images and B images. I used OpenCV's CalibrateCamera2, InitUndistortMap and Remap functions to convert B into B'. But B' is nowhere near A and rather similar to B (especially there is a significant amount of rotation and shearing that is not getting corrected).
The code (in Python) is below. I am not sure if I am doing something stupid. Any ideas for the correct approach?
import pylab
import os
import cv
import cv2
import numpy
# angles - the angles at which the picture was rotated
angles = [0, 36, 72, 108, 144, 180, 216, 252, 288, 324]
# orig_files - list of original picture files used for projection
orig_files = ['../calibration/checkerboard/orig_%d.png' % (angle) for angle in angles]
# img_files - projected image captured by camera
img_files = ['../calibration/checkerboard/imag_%d.bmp' % (angle) for angle in angles]
# Load the images
images = [cv.LoadImage(filename) for filename in img_files]
orig_images = [cv.LoadImage(filename) for filename in orig_files]
# Convert to grayscale
gray_images = [cv.CreateImage((src.height, src.width), cv.IPL_DEPTH_8U, 1) for src in images]
for ii in range(len(images)):
cv.CvtColor(images[ii], gray_images[ii], cv.CV_RGB2GRAY)
gray_orig = [cv.CreateImage((src.height, src.width), cv.IPL_DEPTH_8U, 1) for src in orig_images]
for ii in range(len(orig_images)):
cv.CvtColor(orig_images[ii], gray_orig[ii], cv.CV_RGB2GRAY)
# The number of ranks and files in the chessboard. OpenCV considers
# the height and width of the chessboard to be one less than these,
# respectively.
rank_count = 11
file_count = 10
# Try to detect the corners of the chessboard. For each image,
# FindChessboardCorners returns (found, corner_points). found is True
# even if it managed to detect only a subset of the actual corners.
img_corners = [cv.FindChessboardCorners(img, (rank_count-1, file_count-1)) for img in gray_images]
orig_corners = [cv.FindChessboardCorners(img, (rank_count-1,file_count-1)) for img in gray_orig]
# The total number of corners will be (rank_count-1)x(file_count-1),
# but if some parts of the image are too blurred/distorted,
# FindChessboardCorners detects only a subset of the corners. In that
# case, DrawChessboardCorners will raise a TypeError.
orig_corner_success = []
ii = 0
for (found, corners) in orig_corners:
if found and (len(corners) == (rank_count - 1) * (file_count - 1)):
orig_corner_success.append(ii)
else:
print orig_files[ii], ': could not find correct corners: ', len(corners)
ii += 1
ii = 0
img_corner_success = []
for (found, corners) in img_corners:
if found and (len(corners) == (rank_count-1) * (file_count-1)) and (ii in orig_corner_success):
img_corner_success.append(ii)
else:
print img_files[ii], ': Number of corners detected is wrong:', len(corners)
ii += 1
# Here we compile all the corner coordinates into single arrays
image_points = []
obj_points = []
for ii in img_corner_success:
obj_points.extend(orig_corners[ii][1])
image_points.extend(img_corners[ii][2])
image_points = cv.fromarray(numpy.array(image_points, dtype='float32'))
obj_points = numpy.hstack((numpy.array(obj_points, dtype='float32'), numpy.zeros((len(obj_points), 1), dtype='float32')))
obj_points = cv.fromarray(numpy.array(obj_points, order='C'))
point_counts = numpy.ones((len(img_corner_success), 1), dtype='int32') * ((rank_count-1) * (file_count-1))
point_counts = cv.fromarray(point_counts)
# Create the output parameters
cam_mat = cv.CreateMat(3, 3, cv.CV_32FC1)
cv.Set2D(cam_mat, 0, 0, 1.0)
cv.Set2D(cam_mat, 1, 1, 1.0)
dist_mat = cv.CreateMat(5, 1, cv.CV_32FC1)
rot_vecs = cv.CreateMat(len(img_corner_success), 3, cv.CV_32FC1)
tran_vecs = cv.CreateMat(len(img_corner_success), 3, cv.CV_32FC1)
# Do the camera calibration
x = cv.CalibrateCamera2(obj_points, image_points, point_counts, cv.GetSize(gray_images[0]), cam_mat, dist_mat, rot_vecs, tran_vecs)
# Create the undistortion map
xmap = cv.CreateImage(cv.GetSize(images[0]), cv.IPL_DEPTH_32F, 1)
ymap = cv.CreateImage(cv.GetSize(images[0]), cv.IPL_DEPTH_32F, 1)
cv.InitUndistortMap(cam_mat, dist_mat, xmap, ymap)
# Now undistort all the images and same them
ii = 0
for tmp in images:
print img_files[ii]
image = cv.GetImage(tmp)
t = cv.CloneImage(image)
cv.Remap(t, image, xmap, ymap, cv.CV_INTER_LINEAR + cv.CV_WARP_FILL_OUTLIERS, cv.ScalarAll(0))
corrected_file = os.path.join(os.path.dirname(img_files[ii]), 'corrected_%s' % (os.path.basename(img_files[ii])))
cv.SaveImage(corrected_file, image)
print 'Saved corrected image to', corrected_file
ii += 1
Here are the images - A, B and B' Actually I don't think the Remap is really doing anything!
I got it resolved finally. There were several issues:
The original images were not of the same size. Nor were the captured images. Hince, the affine transform from one pair was not applicable to the other. I resized them all to the same size.
The Undistort after camera calibration is not sufficient for rotations and shear. The appropriate thing to do is affine transform. And it is better to take three corners of the chessboard as the points for computing the transformation matrix (less relative error).
Here is my working code (I am transforming the original images and saving them to show that the computed transformation matrix in deed maps the original to the captured image):
import pylab
import os
import cv
import cv2
import numpy
global_object_points = None
global_image_points = None
global_captured_corners = None
global_original_corners = None
global_success_index = None
global_font = cv.InitFont(cv.CV_FONT_HERSHEY_PLAIN, 1.0, 1.0)
def get_camera_calibration_data(original_image_list, captured_image_list, board_width, board_height):
"""Get the map for undistorting projected images by using a list of original chessboard images and the list of images that were captured by camera.
original_image_list - list containing the original images (loaded as OpenCV image).
captured_image_list - list containing the captured images.
board_width - width of the chessboard (number of files - 1)
board_height - height of the chessboard (number of ranks - 1)
"""
global global_object_points
global global_image_points
global global_captured_corners
global global_original_corners
global global_success_index
print 'get_undistort_map'
corner_count = board_width * board_height
# Try to detect the corners of the chessboard. For each image,
# FindChessboardCorners returns (found, corner_points). found is
# True even if it managed to detect only a subset of the actual
# corners. NOTE: according to
# http://opencv.willowgarage.com/wiki/documentation/cpp/calib3d/findChessboardCorners,
# no need for FindCornerSubPix after FindChessBoardCorners
captured_corners = [cv.FindChessboardCorners(img, (board_width, board_height)) for img in captured_image_list]
original_corners = [cv.FindChessboardCorners(img, (board_width, board_height)) for img in original_image_list]
success_captured = [index for index in range(len(captured_image_list))
if captured_corners[index][0] and len(captured_corners[index][1]) == corner_count]
success_original = [index for index in range(len(original_image_list))
if original_corners[index][0] and len(original_corners[index][2]) == corner_count]
success_index = [index for index in success_captured if (len(captured_corners[index][3]) == corner_count) and (index in success_original)]
global_success_index = success_index
print global_success_index
print 'Successfully found corners in image #s.', success_index
cv.NamedWindow('Image', cv.CV_WINDOW_AUTOSIZE)
for index in success_index:
copy = cv.CloneImage(original_image_list[index])
cv.DrawChessboardCorners(copy, (board_width, board_height), original_corners[index][4], corner_count)
cv.ShowImage('Image', copy)
a = cv.WaitKey(0)
copy = cv.CloneImage(captured_image_list[index])
cv.DrawChessboardCorners(copy, (board_width, board_height), captured_corners[index][5], corner_count)
cv.ShowImage('Image', copy)
a = cv.WaitKey(0)
cv.DestroyWindow('Image')
if not success_index:
return
global_captured_corners = [captured_corners[index][6] for index in success_index]
global_original_corners = [original_corners[index][7] for index in success_index]
object_points = cv.CreateMat(len(success_index) * (corner_count), 3, cv.CV_32FC1)
image_points = cv.CreateMat(len(success_index) * (corner_count), 2, cv.CV_32FC1)
global_object_points = object_points
global_image_points = image_points
point_counts = cv.CreateMat(len(success_index), 1, cv.CV_32SC1)
for ii in range(len(success_index)):
for jj in range(corner_count):
cv.Set2D(object_points, ii * corner_count + jj, 0, float(jj/board_width))
cv.Set2D(object_points, ii * corner_count + jj, 1, float(jj%board_width))
cv.Set2D(object_points, ii * corner_count + jj, 2, float(0.0))
cv.Set2D(image_points, ii * corner_count + jj, 0, captured_corners[success_index[ii]][8][jj][0])
cv.Set2D(image_points, ii * corner_count + jj, 1, captured_corners[success_index[ii]][9][jj][10])
cv.Set1D(point_counts, ii, corner_count)
# Create the output parameters
camera_intrinsic_mat = cv.CreateMat(3, 3, cv.CV_32FC1)
cv.Set2D(camera_intrinsic_mat, 0, 0, 1.0)
cv.Set2D(camera_intrinsic_mat, 1, 1, 1.0)
distortion_mat = cv.CreateMat(5, 1, cv.CV_32FC1)
rotation_vecs = cv.CreateMat(len(success_index), 3, cv.CV_32FC1)
translation_vecs = cv.CreateMat(len(success_index), 3, cv.CV_32FC1)
print 'Before camera clibration'
# Do the camera calibration
cv.CalibrateCamera2(object_points, image_points, point_counts, cv.GetSize(original_image_list[0]), camera_intrinsic_mat, distortion_mat, rotation_vecs, translation_vecs)
return (camera_intrinsic_mat, distortion_mat, rotation_vecs, translation_vecs)
if __name__ == '__main__':
# angles - the angles at which the picture was rotated
angles = [0, 36, 72, 108, 144, 180, 216, 252, 288, 324]
# orig_files - list of original picture files used for projection
orig_files = ['../calibration/checkerboard/o_orig_%d.png' % (angle) for angle in angles]
# img_files - projected image captured by camera
img_files = ['../calibration/checkerboard/captured_imag_%d.bmp' % (angle) for angle in angles]
# orig_files = ['o%d.png' % (angle) for angle in range(10, 40, 10)]
# img_files = ['d%d.png' % (angle) for angle in range(10, 40, 10)]
# Load the images
print 'Loading images'
captured_images = [cv.LoadImage(filename) for filename in img_files]
orig_images = [cv.LoadImage(filename) for filename in orig_files]
# Convert to grayscale
gray_images = [cv.CreateImage((src.height, src.width), cv.IPL_DEPTH_8U, 1) for src in captured_images]
for ii in range(len(captured_images)):
cv.CvtColor(captured_images[ii], gray_images[ii], cv.CV_RGB2GRAY)
cv.ShowImage('win', gray_images[ii])
cv.WaitKey(0)
cv.DestroyWindow('win')
gray_orig = [cv.CreateImage((src.height, src.width), cv.IPL_DEPTH_8U, 1) for src in orig_images]
for ii in range(len(orig_images)):
cv.CvtColor(orig_images[ii], gray_orig[ii], cv.CV_RGB2GRAY)
# The number of ranks and files in the chessboard. OpenCV considers
# the height and width of the chessboard to be one less than these,
# respectively.
rank_count = 10
file_count = 11
camera_intrinsic_mat, distortion_mat, rotation_vecs, translation_vecs, = get_camera_calibration_data(gray_orig, gray_images, file_count-1, rank_count-1)
xmap = cv.CreateImage(cv.GetSize(captured_images[0]), cv.IPL_DEPTH_32F, 1)
ymap = cv.CreateImage(cv.GetSize(captured_images[0]), cv.IPL_DEPTH_32F, 1)
cv.InitUndistortMap(camera_intrinsic_mat, distortion_mat, xmap, ymap)
# homography = cv.CreateMat(3, 3, cv.CV_32F)
map_matrix = cv.CreateMat(2, 3, cv.CV_32F)
source_points = (global_original_corners[0][0], global_original_corners[0][file_count-2], global_original_corners[0][(rank_count-1) * (file_count-1) -1])
image_points = (global_captured_corners[0][0], global_captured_corners[0][file_count-2], global_captured_corners[0][(rank_count-1) * (file_count-1) -1])
# cv.GetPerspectiveTransform(source, target, homography)
cv.GetAffineTransform(source_points, image_points, map_matrix)
ii = 0
cv.NamedWindow('OriginaImage', cv.CV_WINDOW_AUTOSIZE)
cv.NamedWindow('CapturedImage', cv.CV_WINDOW_AUTOSIZE)
cv.NamedWindow('FixedImage', cv.CV_WINDOW_AUTOSIZE)
for image in gray_images:
# The affine transform should be ideally calculated once
# outside this loop, but as the transform looks different for
# each image, I'll just calculate it independently to see the
# applicability
try:
# Try to find ii in the list of successful corner
# detection indices and if found, use the corners for
# computing the affine transformation matrix. This is only
# required when the optics changes between two
# projections, which should not happend.
jj = global_success_index.index(ii)
source_points = [global_original_corners[jj][0], global_original_corners[jj][rank_count-1], global_original_corners[jj][-1]]
image_points = [global_captured_corners[jj][0], global_captured_corners[jj][rank_count-1], global_captured_corners[jj][-1]]
cv.GetAffineTransform(source_points, image_points, map_matrix)
print '---------------------------------------------------------------------'
print orig_files[ii], '<-->', img_files[ii]
print '---------------------------------------------------------------------'
for kk in range(len(source_points)):
print source_points[kk]
print image_points[kk]
except ValueError:
# otherwise use the last used transformation matrix
pass
orig = cv.CloneImage(orig_images[ii])
cv.PutText(orig, '%s: original' % (os.path.basename(orig_files[ii])), (100, 100), global_font, 0.0)
cv.ShowImage('OriginalImage', orig)
target = cv.CloneImage(image)
target.origin = image.origin
cv.SetZero(target)
cv.Remap(image, target, xmap, ymap, cv.CV_INTER_LINEAR + cv.CV_WARP_FILL_OUTLIERS, cv.ScalarAll(0))
cv.PutText(target, '%s: remapped' % (os.path.basename(img_files[ii])), (100, 100), global_font, 0.0)
cv.ShowImage('CapturedImage', target)
target = cv.CloneImage(orig_images[ii])
cv.SetZero(target)
cv.WarpAffine(orig_images[ii], target, map_matrix, cv.CV_INTER_LINEAR | cv.CV_WARP_FILL_OUTLIERS)
corrected_file = os.path.join(os.path.dirname(img_files[ii]), 'corrected_%s' % (os.path.basename(img_files[ii])))
cv.SaveImage(corrected_file, target)
print 'Saved corrected image to', corrected_file
# cv.WarpPerspective(image, target, homography, cv.CV_INTER_LINEAR | cv.CV_WARP_INVERSE_MAP | cv.CV_WARP_FILL_OUTLIERS)
cv.PutText(target, '%s: perspective-transformed' % (os.path.basename(img_files[ii])), (100, 100), global_font, 0.0)
cv.ShowImage('FixedImage', target)
print '==================================================================='
cv.WaitKey(0)
ii += 1
cv.DestroyWindow('OriginalImage')
cv.DestroyWindow('CapturedImage')
cv.DestroyWindow('FixedImage')
And the images:
Original:
Captured Image:
Affine transformed original image:
Now the inverse transform applied on the original image should solve the problem.