Im looking at the following image stitching example in the OpenCV documentation: https://raw.githubusercontent.com/opencv/opencv/4.x/samples/python/stitching_detailed.py, trying to wrap my head around how to use bundle adjustment to estimate homography and warp images. Im having a hard time following what exactly is going in partially because I can't seem to find the docs for many of the functions they are using. A snippet of the code I think I am particularly interested in is below.
estimator = ESTIMATOR_CHOICES[args.estimator]()
b, cameras = estimator.apply(features, p, None)
if not b:
print("Homography estimation failed.")
exit()
for cam in cameras:
cam.R = cam.R.astype(np.float32)
adjuster = BA_COST_CHOICES[args.ba]()
adjuster.setConfThresh(1)
refine_mask = np.zeros((3, 3), np.uint8)
if ba_refine_mask[0] == 'x':
refine_mask[0, 0] = 1
if ba_refine_mask[1] == 'x':
refine_mask[0, 1] = 1
if ba_refine_mask[2] == 'x':
refine_mask[0, 2] = 1
if ba_refine_mask[3] == 'x':
refine_mask[1, 1] = 1
if ba_refine_mask[4] == 'x':
refine_mask[1, 2] = 1
adjuster.setRefinementMask(refine_mask)
b, cameras = adjuster.apply(features, p, cameras)
if not b:
print("Camera parameters adjusting failed.")
exit()
focals = []
for cam in cameras:
focals.append(cam.focal)
focals.sort()
if len(focals) % 2 == 1:
warped_image_scale = focals[len(focals) // 2]
else:
warped_image_scale = (focals[len(focals) // 2] + focals[len(focals) // 2 - 1]) / 2
if wave_correct is not None:
rmats = []
for cam in cameras:
rmats.append(np.copy(cam.R))
rmats = cv.detail.waveCorrect(rmats, wave_correct)
for idx, cam in enumerate(cameras):
cam.R = rmats[idx]
corners = []
masks_warped = []
images_warped = []
sizes = []
masks = []
for i in range(0, num_images):
um = cv.UMat(255 * np.ones((images[i].shape[0], images[i].shape[1]), np.uint8))
masks.append(um)
warper = cv.PyRotationWarper(warp_type, warped_image_scale * seam_work_aspect) # warper could be nullptr?
for idx in range(0, num_images):
K = cameras[idx].K().astype(np.float32)
swa = seam_work_aspect
K[0, 0] *= swa
K[0, 2] *= swa
K[1, 1] *= swa
K[1, 2] *= swa
corner, image_wp = warper.warp(images[idx], K, cameras[idx].R, cv.INTER_LINEAR, cv.BORDER_REFLECT)
corners.append(corner)
sizes.append((image_wp.shape[1], image_wp.shape[0]))
images_warped.append(image_wp)
p, mask_wp = warper.warp(masks[idx], K, cameras[idx].R, cv.INTER_NEAREST, cv.BORDER_CONSTANT)
masks_warped.append(mask_wp.get())
There are several key things I can't seem to find.
estimator.apply() I can't find the docs to this and so I don't fully understand what the function exects as arguments nor what it returns. (Estimator I'm looking at: https://docs.opencv.org/4.x/df/d15/classcv_1_1detail_1_1Estimator.html)
What is the camera object for cam in cameras: cam.R = cam.R.astype(np.float32) is this the correct documentation to look at? https://docs.opencv.org/4.x/dc/d3a/classcv_1_1viz_1_1Camera.html
adjuster.apply() also doesn't seem to be a member function of any of the classes. BundleAdjusterBase, BundleAdjusterReproj or others... (maybe I just don't understand c++. Adjuster IM looking at: https://docs.opencv.org/4.x/d5/d56/classcv_1_1detail_1_1BundleAdjusterBase.html)
PyRotationWarper Class Reference states PyRotationWarper.warp() takes two parameters from the camera intrinsics and rotation. Would I be correct in assuming this is performing the bundle adjustment step warping images based on 3D points?
Does this snippet more or less represent a minimal working example of image mosaicing using bundle adjustment? Im not sure what Im doing. If someone would be willing to provide an example of stitching 4 or 5 images and applying bundle adjustment I would be eternally grateful.
PS. I'm not using createStitcher because I want to learn to do it from scratch and eventually use deep learning to estimate camera params and pose/ match feature points.
Related
I'd like to compute the cross correlation using de Fast Fourier Transform, for cloud motion tracking following the steps of the image below.
def roi_image(image):
image = cv.imread(image, 0)
roi = image[700:900, 1900:2100]
return roi
def FouTransf(image):
img_f32 = np.float32(image)
d_ft = cv.dft(img_f32, flags = cv.DFT_COMPLEX_OUTPUT)
d_ft_shift = np.fft.fftshift(d_ft)
rows, cols = image.shape
opt_rows = cv.getOptimalDFTSize(rows)
opt_cols = cv.getOptimalDFTSize(cols)
opt_img = np.zeros((opt_rows, opt_cols))
opt_img[:rows, :cols] = image
crow, ccol = opt_rows / 2 , opt_cols / 2
mask = np.zeros((opt_rows, opt_cols, 2), np.uint8)
mask[int(crow-50):int(crow+50), int(ccol-50):int(ccol+50)] = 1
f_mask = d_ft_shift*mask
return f_mask
def inv_FouTransf(image):
f_ishift = np.fft.ifftshift(image)
img_back = cv.idft(f_ishift)
img_back = cv.magnitude(img_back[:, :, 0], img_back[:, :, 1])
return img_back
def rms(sigma):
rms = np.std(sigma)
return rms
# Step 1: Import images
a = roi_image(path_a)
b = roi_image(path_b)
# Step 2: Convert the image to frequency domain
G_t0 = FouTransf(a)
G_t0_conj = G_t0.conj()
G_t1 = FouTransf(b)
# Step 3: Compute C(m, v)
C = G_t0_conj * G_t1
# Step 4: Convert the image to space domain to obtain Cov (p, q)
c_w = inv_FouTransf(C)
# Step 5: Compute Cross correlation
R_pq = c_w / (rms(a) * rms(b))
I'm a little confused because I've never use that technique. ¿The application es accurate?
HINT: eq (1) is : R(p,q) = Cov(p,q) / (sigma_t0 * sigma_t1). If more information is required the paper is: "An Automated Techinique or Obtaining Cloud Motion from Geostatiory Satellite Data Using Cross Correlation".
I found this source but I don't know if does something I'm trying.
If you are trying to do something similar to cv2.matchTemplate(), a working python implementation of the Normalized Cross-Correlation (NCC) method can be found in this repository:
########################################################################################
# Author: Ujash Joshi, University of Toronto, 2017 #
# Based on Octave implementation by: Benjamin Eltzner, 2014 <b.eltzner#gmx.de> #
# Octave/Matlab normxcorr2 implementation in python 3.5 #
# Details: #
# Normalized cross-correlation. Similiar results upto 3 significant digits. #
# https://github.com/Sabrewarrior/normxcorr2-python/master/norxcorr2.py #
# http://lordsabre.blogspot.ca/2017/09/matlab-normxcorr2-implemented-in-python.html #
########################################################################################
import numpy as np
from scipy.signal import fftconvolve
def normxcorr2(template, image, mode="full"):
"""
Input arrays should be floating point numbers.
:param template: N-D array, of template or filter you are using for cross-correlation.
Must be less or equal dimensions to image.
Length of each dimension must be less than length of image.
:param image: N-D array
:param mode: Options, "full", "valid", "same"
full (Default): The output of fftconvolve is the full discrete linear convolution of the inputs.
Output size will be image size + 1/2 template size in each dimension.
valid: The output consists only of those elements that do not rely on the zero-padding.
same: The output is the same size as image, centered with respect to the ‘full’ output.
:return: N-D array of same dimensions as image. Size depends on mode parameter.
"""
# If this happens, it is probably a mistake
if np.ndim(template) > np.ndim(image) or \
len([i for i in range(np.ndim(template)) if template.shape[i] > image.shape[i]]) > 0:
print("normxcorr2: TEMPLATE larger than IMG. Arguments may be swapped.")
template = template - np.mean(template)
image = image - np.mean(image)
a1 = np.ones(template.shape)
# Faster to flip up down and left right then use fftconvolve instead of scipy's correlate
ar = np.flipud(np.fliplr(template))
out = fftconvolve(image, ar.conj(), mode=mode)
image = fftconvolve(np.square(image), a1, mode=mode) - \
np.square(fftconvolve(image, a1, mode=mode)) / (np.prod(template.shape))
# Remove small machine precision errors after subtraction
image[np.where(image < 0)] = 0
template = np.sum(np.square(template))
out = out / np.sqrt(image * template)
# Remove any divisions by 0 or very close to 0
out[np.where(np.logical_not(np.isfinite(out)))] = 0
return out
The returned object from normxcorr2() is the cross correlation matrix.
I'm currently trying to video stabilization using OpenCV and Python.
I use the following function to calculate rotation:
def accumulate_rotation(src, theta_x, theta_y, theta_z, timestamps, prev, current, f, gyro_delay=None, gyro_drift=None, shutter_duration=None):
if prev == current:
return src
pts = []
pts_transformed = []
for x in range(10):
current_row = []
current_row_transformed = []
pixel_x = x * (src.shape[1] / 10)
for y in range(10):
pixel_y = y * (src.shape[0] / 10)
current_row.append([pixel_x, pixel_y])
if shutter_duration:
y_timestamp = current + shutter_duration * (pixel_y - src.shape[0] / 2)
else:
y_timestamp = current
transform = getAccumulatedRotation(src.shape[1], src.shape[0], theta_x, theta_y, theta_z, timestamps, prev,
current, f, gyro_delay, gyro_drift)
output = cv2.perspectiveTransform(np.array([[pixel_x, pixel_y]], dtype="float32"), transform)
current_row_transformed.append(output)
pts.append(current_row)
pts_transformed.append(current_row_transformed)
o = utilities.meshwarp(src, pts_transformed)
return o
I get the following error when it gets to output = cv2.perspectiveTransform(np.array([[pixel_x, pixel_y]], dtype="float32"), transform):
cv2.error: /Users/travis/build/skvark/opencv-python/opencv/modules/core/src/matmul.cpp:2271: error: (-215) scn + 1 == m.cols in function perspectiveTransform
Any help or suggestions would really be appreciated.
This implementation really needs to be changed in a future version, or the docs should be more clear.
From the OpenCV docs for perspectiveTransform():
src – input two-channel (...) floating-point array
Slant emphasis added by me.
>>> A = np.array([[0, 0]], dtype=np.float32)
>>> A.shape
(1, 2)
So we see from here that A is just a single-channel matrix, that is, two-dimensional. One row, two cols. You instead need a two-channel image, i.e., a three-dimensional matrix where the length of the third dimension is 2 or 3 depending on if you're sending in 2D or 3D points.
Long story short, you need to add one more set of brackets to make the set of points you're sending in three-dimensional, where the x values are in the first channel, and the y values are in the second channel.
>>> A = np.array([[[0, 0]]], dtype=np.float32)
>>> A.shape
(1, 1, 2)
Also, as suggested in the comments:
If you have an array points of shape (n_points, dimension) (i.e. dimension is 2 or 3), a nice way to re-format it for this use-case is points[np.newaxis]
It's not intuitive, and though it's documented, it's not very explicit on that point. That's all you need. I've answered an identical question before, but for the cv2.transform() function.
I am trying to implement the Matlab function bwmorph(bw,'remove') in Python. This function removes interior pixels by setting a pixel to 0 if all of its 4-connected neighbor pixels are 1. The resulting image should return the boundary pixels. I've written a code but I'm not sure if this is how to do it.
# neighbors() function returns the values of the 4-connected neighbors
# bwmorph() function returns the input image with only the boundary pixels
def neighbors(input_matrix,input_array):
indexRow = input_array[0]
indexCol = input_array[1]
output_array = []
output_array[0] = input_matrix[indexRow - 1,indexCol]
output_array[1] = input_matrix[indexRow,indexCol + 1]
output_array[2] = input_matrix[indexRow + 1,indexCol]
output_array[3] = input_matrix[indexRow,indexCol - 1]
return output_array
def bwmorph(input_matrix):
output_matrix = input_matrix.copy()
nRows,nCols = input_matrix.shape
for indexRow in range(0,nRows):
for indexCol in range(0,nCols):
center_pixel = [indexRow,indexCol]
neighbor_array = neighbors(output_matrix,center_pixel)
if neighbor_array == [1,1,1,1]:
output_matrix[indexRow,indexCol] = 0
return output_matrix
Since you are using NumPy arrays, one suggestion I have is to change the if statement to use numpy.all to check if all values are nonzero for the neighbours. In addition, you should make sure that your input is a single channel image. Because grayscale images in colour share all of the same values in all channels, just extract the first channel. Your comments indicate a colour image so make sure you do this. You are also using the output matrix which is being modified in the loop when checking. You need to use an unmodified version. This is also why you're getting a blank output.
def bwmorph(input_matrix):
output_matrix = input_matrix.copy()
# Change. Ensure single channel
if len(output_matrix.shape) == 3:
output_matrix = output_matrix[:, :, 0]
nRows,nCols = output_matrix.shape # Change
orig = output_matrix.copy() # Need another one for checking
for indexRow in range(0,nRows):
for indexCol in range(0,nCols):
center_pixel = [indexRow,indexCol]
neighbor_array = neighbors(orig, center_pixel) # Change to use unmodified image
if np.all(neighbor_array): # Change
output_matrix[indexRow,indexCol] = 0
return output_matrix
In addition, a small grievance I have with your code is that you don't check for out-of-boundary conditions when determining the four neighbours. The test image you provided does not throw an error as you don't have any border pixels that are white. If you have a pixel along any of the borders, it isn't possible to check all four neighbours. However, one way to mitigate this would be to perhaps wrap around by using the modulo operator:
def neighbors(input_matrix,input_array):
(rows, cols) = input_matrix.shape[:2] # New
indexRow = input_array[0]
indexCol = input_array[1]
output_array = [0] * 4 # New - I like pre-allocating
# Edit
output_array[0] = input_matrix[(indexRow - 1) % rows,indexCol]
output_array[1] = input_matrix[indexRow,(indexCol + 1) % cols]
output_array[2] = input_matrix[(indexRow + 1) % rows,indexCol]
output_array[3] = input_matrix[indexRow,(indexCol - 1) % cols]
return output_array
I've devised a recursive function to handle a specific problem within the deep learning community. It seems to work quickly and well for most cases, but then takes ~20 minutes for other cases for seemingly no reason. The function, in the simplest case, can be abstracted as simply numpy's "repeat" function on two axes. Here's the code I used to test this function:
def recursive_upsample(fMap, index, dims):
if index == 0:
return fMap
else:
start = time.time()
upscale = np.zeros((dims[index-1][0],dims[index-1][1],fMap.shape[-1]))
if dims[index-1][0] % 2 == 1 and dims[index-1][1] % 2 == 1:
crop = fMap[:fMap.shape[0]-1,:fMap.shape[1]-1]
consX = fMap[-1,:][:-1]
consY = fMap[:,-1][:-1]
corner = fMap[-1,-1]
crop = crop.repeat(2, axis=0).repeat(2, axis=1)
upscale[:crop.shape[0],:crop.shape[1]] = crop
upscale[-1,:][:-1] = consX.repeat(2,axis=0)
upscale[:,-1][:-1] = consY.repeat(2,axis=0)
upscale[-1,-1] = corner
elif dims[index-1][0] % 2 == 1:
crop = fMap[:fMap.shape[0]-1]
consX = fMap[-1:,]
crop = crop.repeat(2, axis=0).repeat(2, axis=1)
upscale[:crop.shape[0]] = crop
upscale[-1:,] = consX.repeat(2,axis=1)
elif dims[index-1][1] % 2 == 1:
crop = fMap[:,:fMap.shape[1]-1]
consY = fMap[:,-1]
crop = crop.repeat(2, axis=0).repeat(2, axis=1)
upscale[:,:crop.shape[1]] = crop
upscale[:,-1] = consY.repeat(2,axis=0)
else:
upscale = fMap.repeat(2, axis=0).repeat(2, axis=1)
print('Upscaling from {} to {} took {} seconds'.format(fMap.shape,upscale.shape,time.time() - start))
fMap = upscale
return recursive_upsample(fMap,index-1,dims)
if __name__ == '__main__':
dims = [(634,1020,64),(317,510,128),(159,255,256),(80,128,512),(40,64,512)]
images = []
for dim in dims:
image = np.random.rand(dim[0],dim[1],dim[2])
images.append(image)
start = time.time()
upsampled = []
for index,image in enumerate(images):
upsampled.append(recursive_upsample(image,index,dims))
print('Upsampling took {} seconds'.format(time.time() - start))
For some odd reason, the point in the recursion where the feature map of shape (40,64,512) is being upsampled from shape (317,510,512) to (634,1020,512) takes an egregious 941 seconds! I'm starting to rewrite this code with Theano, but should I be looking to some underlying problem with my code? My reasoning as of right now is that computing this on CPU is unwieldy, but I'm not sure what the hold up is with such a simple function. Also any tips on how to make this function faster would be appreciated!
There's no need to do the recursion. E.g. for the (40,64,512) image you can directly do:
upsampled = image.repeat(16, axis=0).repeat(16, axis=1)[:634,:1020]
I'm trying to calibrate a fisheye camera using OpenCV 3.0.0 python bindings (with an asymmetric circle grid), but I have problems to format the object and image point arrays correctly. My current source looks like this:
import cv2
import glob
import numpy as np
def main():
circle_diameter = 4.5
circle_radius = circle_diameter/2.0
pattern_width = 4
pattern_height = 11
num_points = pattern_width*pattern_height
images = glob.glob('*.bmp')
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
imgpoints = []
objpoints = []
obj = []
for i in range(pattern_height):
for j in range(pattern_width):
obj.append((
float(2*j + i % 2)*circle_radius,
float(i*circle_radius),
0
))
for name in images:
image = cv2.imread(name)
grayimage = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
retval, centers = cv2.findCirclesGrid(grayimage, (pattern_width, pattern_height), flags=(cv2.CALIB_CB_ASYMMETRIC_GRID + cv2.CALIB_CB_CLUSTERING))
imgpoints_tmp = np.zeros((num_points, 2))
if retval:
for i in range(num_points):
imgpoints_tmp[i, 0] = centers[i, 0, 0]
imgpoints_tmp[i, 1] = centers[i, 0, 1]
imgpoints.append(imgpoints_tmp)
objpoints.append(obj)
# Convertion to numpy array
imgpoints = np.array(imgpoints, dtype=np.float32)
objpoints = np.array(objpoints, dtype=np.float32)
K, D = cv2.fisheye.calibrate(objpoints, imgpoints, image_size=(1280, 800), K=None, D=None)
if __name__ == '__main__':
main()
The error message is:
OpenCV Error: Assertion failed (objectPoints.type() == CV_32FC3 || objectPoints.type() == CV_64FC3) in cv::fisheye::calibrate
objpoints has shape (31,44,3).
So objpoints array needs to be formatted in a different way, but I'm not able to achieve the correct layout. Maybe someone can help here?
In the sample of OpenCV (Camera Calibration) they set the objp to objp2 = np.zeros((8*9,3), np.float32)
However, in omnidirectional camera or fisheye camera, it should be:
objp = np.zeros((1,8*9,3), np.float32)
Idea is from here Calibrate fisheye lens using OpenCV — part 1
The correct layout of objpoints is a list of numpy arrays with len(objpoints) = "number of pictures" and each entry beeing a numpy array.
Please have a look at the official help. OpenCV documentation talks about "vectors", which is equivalent of a list or numpy.array. In this instance a "vector of vectors" can be interpreted as a list of numpy.arrays.
The data type is correct, but the shape is not. The expected shape of objpoints supposed to be (n_observations, 1, n_corners_per_observation, 3). Therefore, the code in your case should be:
imgpoints = np.array(imgpoints, dtype=np.float32).reshape(
-1,
1,
pattern_width * pattern_height,
3
)
or more general:
imgpoints = np.array(imgpoints, dtype=np.float32).reshape(
n_observations,
1,
n_corners_per_observation,
3
)
The error message is slightly misleading.
Didn't find a satisfying answer here so I messed around and eventually got this chunk to work:
calibration_flags = cv2.fisheye.CALIB_RECOMPUTE_EXTRINSIC + cv2.fisheye.CALIB_CHECK_COND + cv2.fisheye.CALIB_FIX_SKEW
# lists with each element a [1,n_points,_] array of type float32
obj_points = [np.random.rand(1,10,3).astype(np.float32)]
fisheye_points = [np.random.rand(1,10,2).astype(np.float32)]
# initialize empty variables of correct size and type, where total_num_points is summed across all arrays in each above list
rvecs = [np.zeros((1, 1, 3), dtype=np.float32) for i in range(total_num_points)]
tvecs = [np.zeros((1, 1, 3), dtype=np.float32) for i in range(total_num_points)]
D = np.zeros([4,1]).astype(np.float32)
K = np.zeros([3,3]).astype(np.float32)
outputs = cv2.fisheye.calibrate(gt_points,fisheye_points,(1920,1080),K,D,rvecs,tvecs)