I got output like below after stitching result of 24 stitched images to next 25th image. Before that stitching was good.
Is anyone aware of why/when output of stitching comes like this? What are the possibilities of output coming like that? What may be the reason of that?
Stitching code is following standard stitching steps like finding keypoints, descriptors then matching points, calculating homography and then warping of images. But I am not understanding why that output is coming.
Core part of stitching is like below:
detector = cv2.SIFT_create(400)
# find the keypoints and descriptors with SIFT
gray1 = cv2.cvtColor(image1,cv2.COLOR_BGR2GRAY)
ret1, mask1 = cv2.threshold(gray1,1,255,cv2.THRESH_BINARY)
kp1, descriptors1 = detector.detectAndCompute(gray1,mask1)
gray2 = cv2.cvtColor(image2,cv2.COLOR_BGR2GRAY)
ret2, mask2 = cv2.threshold(gray2,1,255,cv2.THRESH_BINARY)
kp2, descriptors2 = detector.detectAndCompute(gray2,mask2)
keypoints1Im = cv2.drawKeypoints(image1, kp1, outImage = cv2.DRAW_MATCHES_FLAGS_DEFAULT, color=(0,0,255))
keypoints2Im = cv2.drawKeypoints(image2, kp2, outImage = cv2.DRAW_MATCHES_FLAGS_DEFAULT, color=(0,0,255))
# BFMatcher with default params
matcher = cv2.BFMatcher()
matches = matcher.knnMatch(descriptors2,descriptors1, k=2)
# Apply ratio test
good = []
for m, n in matches:
if m.distance < 0.75 * n.distance:
good.append(m)
print (str(len(good)) + " Matches were Found")
if len(good) <= 10:
return image1
matches = copy.copy(good)
matchDrawing = util.drawMatches(gray2,kp2,gray1,kp1,matches)
#Aligning the images
src_pts = np.float32([ kp2[m.queryIdx].pt for m in matches ]).reshape(-1,1,2)
dst_pts = np.float32([ kp1[m.trainIdx].pt for m in matches ]).reshape(-1,1,2)
H = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC,5.0)[0]
h1,w1 = image1.shape[:2]
h2,w2 = image2.shape[:2]
pts1 = np.float32([[0,0],[0,h1],[w1,h1],[w1,0]]).reshape(-1,1,2)
pts2 = np.float32([[0,0],[0,h2],[w2,h2],[w2,0]]).reshape(-1,1,2)
pts2_ = cv2.perspectiveTransform(pts2, H)
pts = np.concatenate((pts1, pts2_), axis=0)
# print("pts:", pts)
[xmin, ymin] = np.int32(pts.min(axis=0).ravel() - 0.5)
[xmax, ymax] = np.int32(pts.max(axis=0).ravel() + 0.5)
t = [-xmin,-ymin]
Ht = np.array([[1,0,t[0]],[0,1,t[1]],[0,0,1]]) # translate
result = cv2.warpPerspective(image2, Ht.dot(H), (xmax-xmin, ymax-ymin))
resizedB = np.zeros((result.shape[0], result.shape[1], 3), np.uint8)
resizedB[t[1]:t[1]+h1,t[0]:w1+t[0]] = image1
# Now create a mask of logo and create its inverse mask also
img2gray = cv2.cvtColor(result,cv2.COLOR_BGR2GRAY)
ret, mask = cv2.threshold(img2gray, 0, 255, cv2.THRESH_BINARY)
kernel = np.ones((5,5),np.uint8)
k1 = (kernel == 1).astype('uint8')
mask = cv2.erode(mask, k1, borderType=cv2.BORDER_CONSTANT)
mask_inv = cv2.bitwise_not(mask)
difference = cv2.bitwise_or(resizedB, resizedB, mask=mask_inv)
result2 = cv2.bitwise_and(result, result, mask=mask)
result = cv2.add(result2, difference)
Edit:
This image shows match drawing while stitching 25 to result until 24 images:
And before that match drawing:
I have total 97 images to stitch. If I stitch 24 and 25 image separately they stitches properly. If I start stitching from 23rd image onwards then also stitching is good but it gives me problem when I stitches starting from 1st image. I am not able to understand the problem.
Result after stitching 23rd image:
Result after stitching 24th image:
Result after stitching 25th image is as above which went wrong.
Strange Observation: If I stitch 23,24,25 images seperately with same code it gets stitches. If I stitch images after 23 till 97 , it gets stitches. But somehow if I stitch images from 1st, it breaks while stitching 25th image. I am not understanding why this happens.
I have tried different combination like different keypoint detection, extraction methods, matching methods, different homography calculations, different warping code but those combinations didn't work. Something is missing or wrong in the steps combination code. I am not able to figure it out.
Sorry for this long question. As I am completely new to this I am not able to explain and get the things properly. Thanks for your help and guidance.
Stitched result of 23,24,25 images separately with SAME code:
With different code (gives black lines in between stitching), if I stitched 97 images then 25th goes up in stitching and stitches as shown below (right corner point):
Firstly, I was not able to recreate your problem and solve it as the images were too big for my system to process. However, I had faced the same problem in my Panorama Stitching project, so I am sharing the reason behind it and my approach to solving my problem. Hope this helps you too.
Here's what my problem looked like when I stitched 4 images together just like you did.
As you can see, the 4th image was getting distorted a lot which must not happen. The same thing happened with you but on a greater level.
Now, here's the output when I stitched 8 images after some image pre-processing.
After some pre-processing on the input images, I was able to stitch 8 images together perfectly without any distortion.
To understand the exact reason behind this kind of distortion, watch this video by Joseph Redmon between 50:26 - 1:07:23.
As suggested in the video, we'll first have to project the images onto a cylinder and then unroll them and then stitch these unrolled images together.
Below is the initial input image(left) and the image after projection and unrolling onto a cylinder(right).
For your problem, as you are using satellite images, I guess projection onto a sphere would work better than the cylinder however you'll have to give it a try.
Sharing below my code for projecting the image onto a cylinder and unrolling it for reference. The mathematics used behind it is the same as given in the video.
def Convert_xy(x, y):
global center, f
xt = ( f * np.tan( (x - center[0]) / f ) ) + center[0]
yt = ( (y - center[1]) / np.cos( (x - center[0]) / f ) ) + center[1]
return xt, yt
def ProjectOntoCylinder(InitialImage):
global w, h, center, f
h, w = InitialImage.shape[:2]
center = [w // 2, h // 2]
f = 1100 # 1100 field; 1000 Sun; 1500 Rainier; 1050 Helens
# Creating a blank transformed image
TransformedImage = np.zeros(InitialImage.shape, dtype=np.uint8)
# Storing all coordinates of the transformed image in 2 arrays (x and y coordinates)
AllCoordinates_of_ti = np.array([np.array([i, j]) for i in range(w) for j in range(h)])
ti_x = AllCoordinates_of_ti[:, 0]
ti_y = AllCoordinates_of_ti[:, 1]
# Finding corresponding coordinates of the transformed image in the initial image
ii_x, ii_y = Convert_xy(ti_x, ti_y)
# Rounding off the coordinate values to get exact pixel values (top-left corner)
ii_tl_x = ii_x.astype(int)
ii_tl_y = ii_y.astype(int)
# Finding transformed image points whose corresponding
# initial image points lies inside the initial image
GoodIndices = (ii_tl_x >= 0) * (ii_tl_x <= (w-2)) * \
(ii_tl_y >= 0) * (ii_tl_y <= (h-2))
# Removing all the outside points from everywhere
ti_x = ti_x[GoodIndices]
ti_y = ti_y[GoodIndices]
ii_x = ii_x[GoodIndices]
ii_y = ii_y[GoodIndices]
ii_tl_x = ii_tl_x[GoodIndices]
ii_tl_y = ii_tl_y[GoodIndices]
# Bilinear interpolation
dx = ii_x - ii_tl_x
dy = ii_y - ii_tl_y
weight_tl = (1.0 - dx) * (1.0 - dy)
weight_tr = (dx) * (1.0 - dy)
weight_bl = (1.0 - dx) * (dy)
weight_br = (dx) * (dy)
TransformedImage[ti_y, ti_x, :] = ( weight_tl[:, None] * InitialImage[ii_tl_y, ii_tl_x, :] ) + \
( weight_tr[:, None] * InitialImage[ii_tl_y, ii_tl_x + 1, :] ) + \
( weight_bl[:, None] * InitialImage[ii_tl_y + 1, ii_tl_x, :] ) + \
( weight_br[:, None] * InitialImage[ii_tl_y + 1, ii_tl_x + 1, :] )
# Getting x coorinate to remove black region from right and left in the transformed image
min_x = min(ti_x)
# Cropping out the black region from both sides (using symmetricity)
TransformedImage = TransformedImage[:, min_x : -min_x, :]
return TransformedImage, ti_x-min_x, ti_y
You just have to call the function ProjectOntoCylinder and pass it an image to get the resultant image and the coordinates of white pixels in the mask image. Use the code below to call this function and get the mask image.
# Applying Cylindrical projection on Image
Image_Cyl, mask_x, mask_y = ProjectOntoCylinder(Image)
# Getting Image Mask
Image_Mask = np.zeros(Image_Cyl.shape, dtype=np.uint8)
Image_Mask[mask_y, mask_x, :] = 255
Here are links to my project and its detailed documentation for reference:
Part 1:
Source Code,
Documentation
Part 2:
Source Code,
Documentation
Related
I want to develop a face alignment program. There is a video, from which the face is extracted and aligned. It is happening in the following way: there is a result frame, constructed from the first frame of the video, and then the face from every next frame is aligned to it and rerecorded as a result frame. Alignment is performed via homography. So for every frame, I need to do the operation of finding keypoints, matching them for current face and result face, and computing homography.
Here is the problem. In my pipeline keypoints for the current frame must not be computed repeatedly. Instead, the following algorithm is proposed:
There are some predefined points in the format of 2d numpy array. (in general, they could be any points on the image, but for example, let's imagine these points are some face landmarks)
For the first frame using akaze feature detector I search for keypoints in the area close to the initial points from item 1.
I use cv2.calcOpticalFlowPyrLK to track those keypoints, so in the next frame I do not detect them again, but use tracked keypoints from the previous frame.
So here is the code of this:
# Parameters for lucas kanade optical flow
lk_params = dict( winSize = (15,15),
maxLevel = 2,
criteria = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 0.03))
# previous keypoints are the keypoints from the previous frame. It is the list of cv2.Keypoint
# here I cast them to the input format for optical flow
coord_keypoints = np.array(list(map(lambda point: [point.pt[0], point.pt[1]], previous_keypoints)), dtype = np.float32)
p0 = coord_keypoints.copy().reshape((-1, 1, 2))
# oldFace_gray and faceImg1 are the faces from previous and current frame respectively
p1, st, err = cv2.calcOpticalFlowPyrLK(oldFace_gray, faceImg1, p0, None, **lk_params)
indices = np.where(st==1)[0]
good_new = p1[st==1]
good_old = p0[st==1]
# Here I cast tracked points back to the type of cv2.Keypoint for description and matching
keypoints1 = []
for idx, point in zip(indices, good_new):
keypoint = cv2.KeyPoint(x=point[0], y=point[1],
_size=previous_keypoints[idx].size,
_class_id=previous_keypoints[idx].class_id,
_response=previous_keypoints[idx].response)
keypoints1.append(keypoint)
# here I create descriptors for keypoints defined above for current face and find and describe keypoints for result face
akaze = cv2.AKAZE_create(threshold = threshold)
keypoints1, descriptors1 = akaze.compute(faceImg1, keypoints1)
keypoints2, descriptors2 = akaze.detectAndCompute(faceImg2, mask=None)
# Then I want to filter keypoints for result face by their distance to points on current face and previous result face
# For that firstly define a function
def landmarkCondition(point, landmarks, eps):
for borderPoint in landmarks:
if np.linalg.norm(np.array(point.pt) - np.array(borderPoint)) < eps:
return True
return False
# Then use filters. landmarks_result is 2d numpy array of coordinates of keypoints founded on the previous result face.
keypoints_descriptors2 = (filter(lambda x : landmarkCondition(x[0], landmarks_result, eps_result), zip(keypoints2, descriptors2)))
keypoints_descriptors2 = list(filter(lambda x : landmarkCondition(x[0], good_new, eps_initial), keypoints_descriptors2))
keypoints2, descriptors2 = [], []
for keypoint, descriptor in keypoints_descriptors2:
keypoints2.append(keypoint)
descriptors2.append(descriptor)
descriptors2 = np.array(descriptors2)
# Match founded keypoints
height, width, channels = coloredFace2.shape
matcher = cv2.DescriptorMatcher_create(cv2.DESCRIPTOR_MATCHER_BRUTEFORCE_SL2)
matches = matcher.match(descriptors1, descriptors2, None)
# # Sort matches by score
matches.sort(key=lambda x: x.distance, reverse=False)
numGoodMatches = int(len(matches) * GOOD_MATCH_PERCENT)
matches = matches[:numGoodMatches]
# I want to eliminate obviously bad matches. Since two images are meant to be similar, lines connecting two correspoindg points on images should be almost horizontal with length approximately equal width of the image
def correct(point1, point2 , width, eps=NOT_ZERO_DIVIDER):
x1, y1 = point1
x2, y2 = point2
angle = abs((y2-y1) / (x2 - x1 + width + eps))
length = x2 - x1 + width
return True if angle < CRITICAL_ANGLE and (length > (1-RELATIVE_DEVIATION) * width and length < (1 + RELATIVE_DEVIATION) * width) else False
goodMatches = []
for i, match in enumerate(matches):
if correct(keypoints1[match.queryIdx].pt, keypoints2[match.trainIdx].pt, width):
goodMatches.append(match)
# Find homography
points1 = np.zeros((len(goodMatches), 2), dtype=np.float32)
points2 = np.zeros((len(goodMatches), 2), dtype=np.float32)
for i, match in enumerate(goodMatches):
points1[i, :] = keypoints1[match.queryIdx].pt
points2[i, :] = keypoints2[match.trainIdx].pt
h, mask = cv2.findHomography(points1, points2, method)
height, width, channels = coloredFace2.shape
result = cv2.warpPerspective(coloredFace1, h, (width, height))
resultGray = cv2.cvtColor(result, cv2.COLOR_BGR2GRAY)
The result of such matching and aligning very poor. If I compute keypoints for both images on every steps without tracking, the result is quite good. Do I make a mistake somewhere?
P.S. I am not sure about posting minimum reproducing example because there is a lot of preprocessing of frames from video.
Here is grayscale uint8 image I'm working with: source grayscale image.
This image is a result of stitching 6 different colorized depth images into one. There are 3 rectangular objects in the image, and my goal is to find edges of these objects. Obviously, I have no problem to find external edges of objects. But, separating objects from each other is a big pain.
Desired rectangles in image:
Input image as numpy array: https://drive.google.com/file/d/1uN9R4MgVQBzjJuMhcqWMUAhWDJCatHSf/view?usp=sharing
First of all I was trying to threshold binarize the image, following with some
erosion + dilation processing to distinguish all three objects from
each other. Then contours + minAreaRect would give me necessary
result. This option isn't robust enough, because objects in the scene
can be so close to each other, that edge between them has the same
depth as roughness of the object surfaces. So important edges can be
"blended" with object surfaces deviations. Consequently, sometimes,
I'm getting two objects united in one object.
Using canny edge detection with automatically calculated coefficients
(from picture median) catches all unnecessary brightness changes together with edges. Canny with manually adjusted coefficients works better, but it doesn't give closed edge result + it is not reliable (must be manually tweaked each time).
Another thing I tried - adjusting brightness of image nonlinearly (power-law transformation) - to increase brightness of objects surfaces leaving dark edge cavities unchanged.
p = 0.2; c = (input_image.max()) / (input_image.max()**(p)); output_image = (c*blur_gray.astype(np.float)**(p)).astype(np.uint8)
Here is a result: brightness adjusted image.
Threshold binarizing of this image give better results in terms of edges. I tried canny and Laplacian edge detection, but obtained results give disconnected parts of contour with some noise in object surface areas: binarized result of Laplacian filtering. Next step, in my mind, must be some kind of edge estimation/restoration algorithm. I tried Hough transform to get edge lines, but it didn't give any intelligible result.
It seems to me that I just go around in circles without achieving any intelligible result. So I request help. Probably my approach is fundamentally wrong, or I am missing something due to the fact that I do not have sufficient knowledge. Any ideas or suggestions?
P.S. After posting this, I'll continue, and will try to implement wateshed segmentation algorithm, may be it would work.
I tried to come up with a method to emphasize the vertical and horizontal lines separating the shapes.
I started by thresholding the original image (from numpy) and just used a [0, 10] range that seemed reasonable.
I ran a vertical and horizontal line kernel over the image to generate two masks
Vertical Kernel
Horizontal Kernel
I combined the two masks so that we'd have both of the lines separating the boxes
Now we can use findContours to find the boxes. I filtered out small contours to get just the 3 rectangles and used a 4-sided approximation to try and get just their sides.
import cv2
import numpy as np
import random
# approx n-sided shape
def approxSides(contour, numSides, step_size):
# approx until numSides points
num_points = 999999;
percent = step_size;
while num_points >= numSides:
# get number of points
epsilon = percent * cv2.arcLength(contour, True);
approx = cv2.approxPolyDP(contour, epsilon, True);
num_points = len(approx);
# increment
percent += step_size;
# step back and get the points
# there could be more than numSides points if our step size misses it
percent -= step_size * 2;
epsilon = percent * cv2.arcLength(contour, True);
approx = cv2.approxPolyDP(contour, epsilon, True);
return approx;
# convolve
def conv(mask, kernel, size, half):
# get res
h,w = mask.shape[:2];
# loop
nmask = np.zeros_like(mask);
for y in range(half, h - half):
print("Y: " + str(y) + " || " + str(h));
for x in range(half, w - half):
total = np.sum(np.multiply(mask[y-half:y+half+1, x-half:x+half+1], kernel));
total /= 255;
if total > half:
nmask[y][x] = 255;
else:
nmask[y][x] = 0;
return nmask;
# load numpy array
img = np.load("output_data.npy");
mask = cv2.inRange(img, 0, 10);
# resize
h,w = mask.shape[:2];
scale = 0.25;
h = int(h*scale);
w = int(w*scale);
mask = cv2.resize(mask, (w,h));
# use a line filter
size = 31; # size / 2 is max bridge size
half = int(size/2);
vKernel = np.zeros((size,size), np.float32);
for a in range(size):
vKernel[a][half] = 1/size;
hKernel = np.zeros((size,size), np.float32);
for a in range(size):
hKernel[half][a] = 1/size;
# run filters
vmask = cv2.filter2D(mask, -1, vKernel);
vmask = cv2.inRange(vmask, (half * 255 / size), 255);
hmask = cv2.filter2D(mask, -1, hKernel);
hmask = cv2.inRange(hmask, (half * 255 / size), 255);
combined = cv2.bitwise_or(vmask, hmask);
# contours OpenCV3.4, if you're using OpenCV 2 or 4, it returns (contours, _)
combined = cv2.bitwise_not(combined);
_, contours, _ = cv2.findContours(combined, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE);
# filter out small contours
cutoff_size = 1000;
big_cons = [];
for con in contours:
area = cv2.contourArea(con);
if area > cutoff_size:
big_cons.append(con);
# do approx for 4-sided shape
colored = cv2.cvtColor(combined, cv2.COLOR_GRAY2BGR);
four_sides = [];
for con in big_cons:
approx = approxSides(con, 4, 0.01);
color = [random.randint(0,255) for a in range(3)];
cv2.drawContours(colored, [approx], -1, color, 2);
four_sides.append(approx); # not used for anything
# show
cv2.imshow("Image", img);
cv2.imshow("mask", mask);
cv2.imshow("vmask", vmask);
cv2.imshow("hmask", hmask);
cv2.imshow("combined", combined);
cv2.imshow("Color", colored);
cv2.waitKey(0);
I have a stationary camera which takes photos rapidly of the continuosly moving product but in a fixed position just of the same angle (translation perspective). I need to stitch all images into a panoramic picture. I've tried by using the class Stitcher. It worked, but it took a long time to compute.
I also tried to use another method by using the SIFT detector, FNNbasedMatcher, finding Homography and then warping the images. This method works fine if I only use two images. For multiple images it still doesn't stitch them properly. Does anyone know the best and fastest image stitching algorithm for this case?
This is my code which uses the Stitcher class.
import time
import cv2
import os
import numpy as np
import sys
def main():
# read input images
imgs = []
path = 'pics_rotated/'
i = 0
for (root, dirs, files) in os.walk(path):
images = [f for f in files]
print(images)
for i in range(0,len(images)):
curImg = cv2.imread(path + images[i])
imgs.append(curImg)
stitcher = cv2.Stitcher.create(mode= 0)
status ,result = stitcher.stitch(imgs)
if status != cv2.Stitcher_OK:
print("Can't stitch images, error code = %d" % status)
sys.exit(-1)
cv2.imwrite("imagesout/output.jpg", result)
cv2.waitKey(0)
if __name__ == '__main__':
start = time.time()
main()
end = time.time()
print("Time --->>>>>", end - start)
cv2.destroyAllWindows()enter code here
Briefing
Although OpenCV Stitcher class provides lots of methods and options to perform stitching, I find it hard to use it because of the complexity.
Therefore, I will try to provide the minimum and fastest way to perform stitching.
In case you are wondering more sophisticated approachs such as exposure compensation, I highly recommend looking at the detailed sample code.
As a side note, I will be grateful if someone can convert the following functions to use Stitcher class.
Introduction
In order to combine multiple images into the same perspective, the following operations are needed:
Detect and match features.
Compute homography (perspective transform between frames).
Warp one image onto the other perspective.
Combine the base and warped images while keeping track of the shift in origin.
Given the combination pattern, stitch multiple images.
Feature detection and matching
What are features?
They are distinguishable parts, like corners of a square, that are preserved across images.
There are different algorithms proposed for obtaining these characteristic points, like Harris, ORB, SIFT, SURF, etc.
See cv::Feature2d for the full list.
I will use SIFT because it is accurate and sufficiently fast.
A feature consists of a KeyPoint, which is the location in the image, and a descriptor, which is a set of numbers (e.g. a 128-D vector) that represents the properties of the feature.
After finding distinct points in images, we need to match the corresponding point pairs.
See cv::DescriptionMatcher.
I will use Flann-based descriptor matcher.
First, we initialize the descriptor and matcher classes.
descriptor = cv.SIFT.create()
matcher = cv.DescriptorMatcher.create(cv.DescriptorMatcher.FLANNBASED)
Then, we find the features in each image.
(kps, desc) = descriptor.detectAndCompute(image, mask=None)
Now we find the corresponding point pairs.
if (desc1 is not None and desc2 is not None and len(desc1) >=2 and len(desc2) >= 2):
rawMatch = matcher->knnMatch(desc2, desc1, k=2)
matches = []
# ensure the distance is within a certain ratio of each other (i.e. Lowe's ratio test)
ratio = 0.75
for m in rawMatch:
if len(m) == 2 and m[0].distance < m[1].distance * ratio:
matches.append((m[0].trainIdx, m[0].queryIdx))
Homography computation
Homography is the perspective transformation from one view to another.
The parallel lines in one view may not be parallel in another, like a road to sunset.
We need to have at least 4 corresponding point pairs.
The more means redundant data that have to be decomposed or eliminated.
Homography matrix that transforms the point in the initial view to its warped position.
It is a 3x3 matrix that is computed by Direct Linear Transform algorithm.
There are 8 DoF and the last element in the matrix is 1.
[pt2] = H * [pt1]
Now that we have corresponding point matches, we compute the homography.
The method we use to handle redundant data is RANSAC, which randomly selects 4 point pairs and uses the best fitting result.
See cv::findHomography for more options.
if len(matches) > 4:
(H, status) = cv.findHomography(pts1, pts2, cv.RANSAC)
Warping to perspective
By computing homography, we know which point in the source image corresponds to which point in the destination image.
In order not to lose information from the source image, we need to pad the destination image by the amount that the transformed point falls to negative regions.
At the same time, we need to keep track of the shift amount of the origin for stitching multiple images.
Auxilary functions
# find the ROI of a transformation result
def warpRect(rect, H):
x, y, w, h = rect
corners = [[x, y], [x, y + h - 1], [x + w - 1, y], [x + w - 1, y + h - 1]]
extremum = cv.transform(corners, H)
minx, miny = np.min(extremum[:,0]), np.min(extremum[:,1])
maxx, maxy = np.max(extremum[:,0]), np.max(extremum[:,1])
xo = int(np.floor(minx))
yo = int(np.floor(miny))
wo = int(np.ceil(maxx - minx))
ho = int(np.ceil(maxy - miny))
outrect = (xo, yo, wo, ho)
return outrect
# homography matrix is translated to fit in the screen
def coverH(rect, H):
# obtain bounding box of the result
x, y, _, _ = warpRect(rect, H)
# shift amount to the first quadrant
xpos = int(-x if x < 0 else 0)
ypos = int(-y if y < 0 else 0)
# correct the homography matrix so that no point is thrown out
T = np.array([[1, 0, xpos], [0, 1, ypos], [0, 0, 1]])
H_corr = T.dot(H)
return (H_corr, (xpos, ypos))
# pad image to cover ROI, return the shift amount of origin
def addBorder(img, rect):
x, y, w, h = rect
tl = (x, y)
br = (x + w, y + h)
top = int(-tl[1] if tl[1] < 0 else 0)
bottom = int(br[1] - img.shape[0] if br[1] > img.shape[0] else 0)
left = int(-tl[0] if tl[0] < 0 else 0)
right = int(br[0] - img.shape[1] if br[0] > img.shape[1] else 0)
img = cv.copyMakeBorder(img, top, bottom, left, right, cv.BORDER_CONSTANT, value=[0, 0, 0])
orig = (left, top)
return img, orig
def size2rect(size):
return (0, 0, size[1], size[0])
Warping function
def warpImage(img, H):
# tweak the homography matrix to move the result to the first quadrant
H_cover, pos = coverH(size2rect(img.shape), H)
# find the bounding box of the output
x, y, w, h = warpRect(size2rect(img.shape), H_cover)
width, height = x + w, y + h
# warp the image using the corrected homography matrix
warped = cv.warpPerspective(img, H_corr, (width, height))
# make the external boundary solid black, useful for masking
warped = np.ascontiguousarray(warped, dtype=np.uint8)
gray = cv.cvtColor(warped, cv.COLOR_RGB2GRAY)
_, bw = cv.threshold(gray, 1, 255, cv.THRESH_BINARY)
# https://stackoverflow.com/a/55806272/12447766
major = cv.__version__.split('.')[0]
if major == '3':
_, cnts, _ = cv.findContours(bw, cv.RETR_EXTERNAL, cv.CHAIN_APPROX_NONE)
else:
cnts, _ = cv.findContours(bw, cv.RETR_EXTERNAL, cv.CHAIN_APPROX_NONE)
warped = cv.drawContours(warped, cnts, 0, [0, 0, 0], lineType=cv.LINE_4)
return (warped, pos)
Combining warped and destination images
This is the step where image enhancement such as exposure compensation becomes involved.
In order to keep things simple, we will use mean value blending.
The easiest solution would be overriding the existing data in the destination image but averaging operation is not a burden for us.
# only the non-zero pixels are weighted to the average
def mean_blend(img1, img2):
assert(img1.shape == img2.shape)
locs1 = np.where(cv.cvtColor(img1, cv.COLOR_RGB2GRAY) != 0)
blended1 = np.copy(img2)
blended1[locs1[0], locs1[1]] = img1[locs1[0], locs1[1]]
locs2 = np.where(cv.cvtColor(img2, cv.COLOR_RGB2GRAY) != 0)
blended2 = np.copy(img1)
blended2[locs2[0], locs2[1]] = img2[locs2[0], locs2[1]]
blended = cv.addWeighted(blended1, 0.5, blended2, 0.5, 0)
return blended
def warpPano(prevPano, img, H, orig):
# correct homography matrix
T = np.array([[1, 0, -orig[0]], [0, 1, -orig[1]], [0, 0, 1]])
H_corr = H.dot(T)
# warp the image and obtain shift amount of origin
result, pos = warpImage(prevPano, H_corr)
xpos, ypos = pos
# zero pad the result
rect = (xpos, ypos, img.shape[1], img.shape[0])
result, _ = addBorder(result, rect)
# mean value blending
idx = np.s_[ypos : ypos + img.shape[0], xpos : xpos + img.shape[1]]
result[idx] = mean_blend(result[idx], img)
# crop extra paddings
x, y, w, h = cv.boundingRect(cv.cvtColor(result, cv.COLOR_RGB2GRAY))
result = result[y : y + h, x : x + w]
# return the resulting image with shift amount
return (result, (xpos - x, ypos - y))
Stitching multiple images given combination pattern
# base image is the last image in each iteration
def blend_multiple_images(images, homographies):
N = len(images)
assert(N >= 2)
assert(len(homographies) == N - 1)
pano = np.copy(images[0])
pos = (0, 0)
for i in range(N - 1):
img = images[i + 1]
# get homography matrix
H = homographies[i]
# warp pano onto image
pano, pos = warpPano(pano, img, H, pos)
return (pano, pos)
The method above warps the previously combined image, called pano, onto the next image subsequently.
A pattern, however, may have conjunction points for the best stitching view.
For example
1 2 3
4 5 6
The best pattern to combine these images is
1 -> 2 <- 3
|
V
4 -> 5 <- 6
Therefore, we need one last function to combine 1 & 2 with 2 & 3, or 1235 with 456 at node 5.
from operator import sub
# no warping here, useful for combining two different stitched images
# the image at given origin coordinates must be the same
def patchPano(img1, img2, orig1=(0,0), orig2=(0,0)):
# bottom right points
br1 = (img1.shape[1] - 1, img1.shape[0] - 1)
br2 = (img2.shape[1] - 1, img2.shape[0] - 1)
# distance from orig to br
diag2 = tuple(map(sub, br2, orig2))
# possible pano corner coordinates based on img1
extremum = np.array([(0, 0), br1,
tuple(map(sum, zip(orig1, diag2))),
tuple(map(sub, orig1, orig2))])
bb = cv.boundingRect(extremum)
# patch img1 to img2
pano, shift = addBorder(img1, bb)
orig = tuple(map(sum, zip(orig1, shift)))
idx = np.s_[orig[1] : orig[1] + img2.shape[0] - orig2[1],
orig[0] : orig[0] + img2.shape[1] - orig2[0]]
subImg = img2[orig2[1] : img2.shape[0], orig2[0] : img2.shape[1]]
pano[idx] = mean_blend(pano[idx], subImg)
return (pano, orig)
For a quick demo, you can run the Python code in GitHub.
If you want to use the above methods in C++, you can have a look at Stitch library.
Any PR or edit to this post is welcome.
As an alternative to the last step that #Burak gave, this is the way I used as I had the number of images for each of the rows (chunks), the multiStitching being nothing but a function to stitch images horizontally:
def stitchingImagesHV(img_list, size):
"""
As our multi stitching algorithm works on the horizontal line, we will hack
it to use also the vertical stitching by rotating each row "stitch_img" and
apply the same technique, and after that, the final result is rotated back to the
original direction.
"""
# Generate row chunks of "size" length from image list
chunks = [img_list[i:i + size] for i in range(0, len(img_list), size)]
list_rotated_images = []
for i in range(len(chunks)):
stitch_img = multiStitching(chunks[i])
stitch_img_rotated = cv2.rotate(stitch_img, cv2.ROTATE_90_COUNTERCLOCKWISE)
list_rotated_images.append(stitch_img_rotated.astype('uint8'))
stitch_img2 = multiStitching(list_rotated_images)
return cv2.rotate(stitch_img2, cv2.ROTATE_90_CLOCKWISE)
I have an image, in which I want to threshold part of the image within a circular region, and then the remainder of the image outside of this region.
Unfortunately my attempts seem to be thresholding the image as a whole, ignoring the masks. How can this be properly achieved? See code attempt below.
def circular_mask(h, w, centre=None, radius=None):
if centre is None: # use the middle of the image
centre = [int(w / 2), int(h / 2)]
if radius is None: # use the smallest distance between the centre and image walls
radius = min(centre[0], centre[1], w - centre[0], h - centre[1])
Y, X = np.ogrid[:h, :w]
dist_from_centre = np.sqrt((X - centre[0]) ** 2 + (Y - centre[1]) ** 2)
mask = dist_from_centre <= radius
return mask
img = cv2.imread('image.png', 0) #read image
h,w = img.shape[:2]
mask = circular_mask(h,w, centre=(135,140),radius=75) #create a boolean circle mask
mask_img = img.copy()
inside = np.ma.array(mask_img, mask=~mask)
t1 = inside < 50 #threshold part of image within the circle, ignore rest of image
plt.imshow(inside)
plt.imshow(t1, alpha=.25)
plt.show()
outside = np.ma.array(mask_img, mask=mask)
t2 = outside < 20 #threshold image outside circle region, ignoring image in circle
plt.imshow(outside)
plt.imshow(t2, alpha=.25)
plt.show()
fin = np.logical_or(t1, t2) #combine the results from both thresholds together
plt.imshow(fin)
plt.show()
Working solution:
img = cv2.imread('image.png', 0)
h,w = img.shape[:2]
mask = circular_mask(h,w, centre=(135,140),radius=75)
inside = img.copy()*mask
t1 = inside < 50#get_threshold(inside, 1)
plt.imshow(inside)
plt.show()
outside = img.copy()*~mask
t2 = outside < 70
plt.imshow(outside)
plt.show()
plt.imshow(t1)
plt.show()
plt.imshow(t2)
plt.show()
plt.imshow(np.logical_and(t1,t2))
plt.show()
I assume your image is single layered (e.g. Grey Scale).
You can make 2 copies of the image. Multiply (or Logical AND) your mask with one of them and invert of that mask with the other one. Now apply your desired threshold to each of them. In the end merge both images using Logical OR operation.
My goal is to
deskew a scanned image such that its text is perfectly placed on top of the text of the original image. (subtracting the images would remove the text)
prevent any loss of information on the deskewed image
I use SURF features to feed the findHomography function. Then I use the warpPerspective function to transform the scanned image. The resulting image almost perfectly fits onto the original image.
However, the scanned image has content on its corners which is lost after the transformation because the text in the scanned image is smaller and has to be scaled up.
Deskewing an image that has slightly smaller text
Information at the borders of the image is cropped
To avoid any loss of information, I convert the image to RGBA and set the borderValue parameter in warpPerspective such that any added background has transparent color. I remove the transparent pixels after the transformation again. This procedure works but seems highly inefficient.
Question: I'm looking for a working code example (C++ or Python) that shows how to do this more efficiently.
Image has been deskewed and content is preserved. However, the text of the two pictures isn't on top of each other anymore
Text position is off because the warped image has a different size than what warpPerspective expected
After transforming the image the problem is that the two images aren't aligned anymore because the dimensions of the transformed image are different than what the warpPerspective method expected.
Question: How can I realign the two images? It would be great if there was a way to do incorporate this into the previous step already. Again, a working code example would be very helpful.
Here's the code that I have so far. It deskews the image while preserving its content, however, the text is not on top of the original text anymore.
import math
import cv2
import numpy as np
class Deskewer:
def __init__(self, hessianTreshold = 5000):
self.__hessianThresh = hessianTreshold
self.imgOrigGray, self.imgSkewed, self.imgSkewedGray = None, None, None
def start(self, imgOrig, imgSkewed):
self.imgOrigGray = cv2.cvtColor(imgOrig, cv2.COLOR_BGR2GRAY)
self.imgSkewed = imgSkewed # final transformation will be performed on color image
self.imgSkewedGray = cv2.cvtColor(imgSkewed, cv2.COLOR_BGR2GRAY) # prior calculation is faster on gray
kp1, des1, kp2, des2 = self.__detectFeatures()
goodMatches = self.__flannMatch(des1, des2)
MIN_MATCH_COUNT = 10
M = None
if len(goodMatches) > MIN_MATCH_COUNT:
M, _ = self.__findHomography(kp1, kp2, goodMatches)
else:
print("Not enough matches are found - %d/%d" % (len(goodMatches), MIN_MATCH_COUNT))
return
return self.__deskew(M)
def __detectFeatures(self):
surf = cv2.xfeatures2d.SURF_create(self.__hessianThresh)
kp1, des1 = surf.detectAndCompute(self.imgOrigGray, None)
kp2, des2 = surf.detectAndCompute(self.imgSkewedGray, None)
return kp1, des1, kp2, des2
def __flannMatch(self, des1, des2):
global matches
FLANN_INDEX_KDTREE = 0
index_params = dict(algorithm=FLANN_INDEX_KDTREE, trees=5)
search_params = dict(checks=50)
flann = cv2.FlannBasedMatcher(index_params, search_params)
matches = flann.knnMatch(des1, des2, k=2)
# store all the good matches as per Lowe's ratio test.
good = []
for m, n in matches:
if m.distance < 0.7 * n.distance:
good.append(m)
return good
def __findHomography(self, kp1, kp2, goodMatches):
src_pts = np.float32([kp1[m.queryIdx].pt for m in goodMatches
]).reshape(-1, 1, 2)
dst_pts = np.float32([kp2[m.trainIdx].pt for m in goodMatches
]).reshape(-1, 1, 2)
M, mask = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC, 5.0)
matchesMask = mask.ravel().tolist()
i = matchesMask.index(1)
# TODO: This is a matching point before the warpPerspective call. How can I calculate this point AFTER the call?
print("POINTS: object(", src_pts[i][0][1], ",", src_pts[i][0][0], ") - scene(", dst_pts[i][0][1], ",", dst_pts[i][0][0], ")")
return M, mask
def getComponents(self, M):
# ((translationx, translationy), rotation, (scalex, scaley), shear)
a = M[0, 0]
b = M[0, 1]
c = M[0, 2]
d = M[1, 0]
e = M[1, 1]
f = M[1, 2]
p = math.sqrt(a * a + b * b)
r = (a * e - b * d) / (p)
q = (a * d + b * e) / (a * e - b * d)
translation = (c, f)
scale = (p, r) # p = x-Axis, r = y-Axis
shear = q
theta = math.atan2(b, a)
degrees = math.atan2(b, a) * 180 / math.pi
return (translation, theta, degrees, scale, shear)
def __deskew(self, M):
# this info might come in handy here for calculating the dsize of warpPerspective?
translation, theta, degrees, scale, shear = self.getComponents(M)
# Alpha channel allows me to set unique feature to pixels that are created during warpPerspective
imSkewedAlpha = cv2.cvtColor(self.imgSkewed, cv2.COLOR_BGR2BGRA)
# These sizes have been randomly choosen to make sure that all the contents fit in the new canvas
height = 5000
width = 5000
shift = -500
M2 = np.array([[1, 0, shift],
[0, 1, shift],
[0, 0, 1]])
M3 = np.dot(M, M2)
# TODO: How can I calculate the dsize argument?
# Newly created pixels are set to transparent
im_out = cv2.warpPerspective(imSkewedAlpha, M3,
(height, width), flags=cv2.WARP_INVERSE_MAP, borderMode=cv2.BORDER_CONSTANT, borderValue=(255, 0, 0, 0))
# http://codereview.stackexchange.com/a/132933
# Mask of non-black pixels (assuming image has a single channel).
mask = im_out[:, :, 3] == 255
# Coordinates of non-black pixels.
coords = np.argwhere(mask)
# Bounding box of non-black pixels.
x0, y0 = coords.min(axis=0)
x1, y1 = coords.max(axis=0) + 1 # slices are exclusive at the top
# Get the contents of the bounding box.
cropped = im_out[x0:x1, y0:y1]
# TODO: The warped image needs to align nicely on the original image
return cropped
origImg = cv2.imread("Letter.png")
skewedImg = cv2.imread("A4.png")
deskewed = Deskewer().start(origImg, skewedImg)
cv2.imshow("Original", origImg)
cv2.imshow("Deskewed", deskewed)
cv2.waitKey(0)
Original and skewed image (with additional content) for testing