I am working on a project which attempts to remove the perspective distortion from an image based on the known orientation of the camera. My thinking is that I can create a rotational matrix based on the known X, Y, and Z orientations of the camera. I can then apply those matrices to the image via the WarpPerspective method.
In my script (written in Python) I have created three rotational matrices, each based on an orientation angle. I have gotten to a point where I am stuck on two issues. First, when I load each individual matrix into the WarpPerspective method, it doesn't seem to be working correctly. Whenever I warp an image on one axis it appears to significantly overwarp the image. The contents of the image are only recognizable if I limit the orientation angle to around 1 degree or less.
Secondly, how do I combine the three rotational matrices into a single matrix to be loaded into the WarpPerspective method. Can I import a 3x3 rotational matrix into that method, or do I have to create a 4x4 projective matrix. Below is the code that I am working on.
Thank you for your help.
CR
from numpy import *
import cv
#Sets angle of camera and converts to radians
x = -14 * (pi/180)
y = 20 * (pi/180)
z = 15 * (pi/180)
#Creates the Rotational Matrices
rX = array([[1, 0, 0], [0, cos(x), -sin(x)], [0, sin(x), cos(x)]])
rY = array([[cos(y), 0, -sin(y)], [0, 1, 0], [sin(y), 0, cos(y)]])
rZ = array([[cos(z), sin(z), 0], [-sin(z), cos(z), 0], [0, 0, 1]])
#Converts to CVMat format
X = cv.fromarray(rX)
Y = cv.fromarray(rY)
Z = cv.fromarray(rZ)
#Imports image file and creates destination filespace
im = cv.LoadImage("reference_image.jpg")
dst = cv.CreateImage(cv.GetSize(im), cv.IPL_DEPTH_8U, 3)
#Warps Image
cv.WarpPerspective(im, dst, X)
#Display
cv.NamedWindow("distorted")
cv.ShowImage("distorted", im)
cv.NamedWindow("corrected")
cv.ShowImage("corrected", dst)
cv.WaitKey(0)
cv.DestroyWindow("distorted")
cv.DestroyWindow("corrected")
You are doing several things wrong. First, you can't rotate on the x or y axis without a camera model. Imagine a camera with an incredibly wide field of view. You could hold it really close to an object and see the entire thing but if that object rotated its edges would seem to fly towards you very quickly with a strong perspective distortion. On the other hand a small field of view (think telescope) has very little perspective distortion. A nice place to start is setting your image plane at least as far from the camera as it is wide and putting your object right on the image plane. That is what I did in this example (c++ openCV)
The steps are
construct a rotation matrix
center the image at the origin
rotate the image
move the image down the z axis
multiply by the camera matrix
warp the perspective
//1
float x = -14 * (M_PI/180);
float y = 20 * (M_PI/180);
float z = 15 * (M_PI/180);
cv::Matx31f rot_vec(x,y,z);
cv::Matx33f rot_mat;
cv::Rodrigues(rot_vec, rot_mat); //converts to a rotation matrix
cv::Matx33f translation1(1,0,-image.cols/2,
0,1,-image.rows/2,
0,0,1);
rot_mat(0,2) = 0;
rot_mat(1,2) = 0;
rot_mat(2,2) = 1;
//2 and 3
cv::Matx33f trans = rot_mat*translation1;
//4
trans(2,2) += image.rows;
cv::Matx33f camera_mat(image.rows,0,image.rows/2,
0,image.rows,image.rows/2,
0,0,1);
//5
cv::Matx33f transform = camera_mat*trans;
//6
cv::Mat final;
cv::warpPerspective(image, final, cv::Mat(transform),image.size());
This code gave me this output
I did not see Franco's answer until I posted this. He is completely correct, using FindHomography would save you all these steps. Still I hope this is useful.
Just knowing the rotation is not enough unless your images are taken either using a telecentric lens, or with a telephoto lens with very long focal (in which cases the images are nearly orthographic, and there is no perspective distortion).
Besides, it's not necessary. True, you can undo the perspective foreshortening of one plane in the image by calibrating the camera (i.e. estimating the intrinsic and extrinsic parameters to form the full camera projection matrix).
But you achieve the same result much more simply if you can identify in the image a quadrangle which is the image of a real-world square (or rectangle with known width/height ratio). If you can do that, you can trivially compute the homography matrix that maps the square (rectangle) to the quadrangle, then warp using its inverse.
The Wikipedia page on rotation matrices shows how it is possible to combine the three basic rotation matrices into one.
Related
Suppose that I have an input x of size [H,W] and also a mu_x and mu_y (which may be fractional)representing the pixels in x and y direction to shift. Is there any efficient way in pytorch without using c++ to shift the tensor x for mu_x and mu_y units with bilinear interpolation.
To be more precise, let's say we have an image. mu_x = 5 and mu_y = 3, we may want to shift the image so that the image moves rightward 5 pixels and downward 3 pixels, with the pixels out of boundary of [H,W] removed and new pixels introduced at the other end of the boundary to be 0. However, with fractional mu_x and mu_y, we need to use bilinear interpolation to estimate the resulting image.
Is it possible to be implemented with pure pytorch tensor operations? Or do I need to use c++.
I believe you can achieve this by applying grid sampling on your original input and using a grid to guide the sampling process. If you take a coordinate grid of your image and sample using that the resulting image will be equal to the original image. However you can apply a shift on this grid and therefore sample with the given shift. Grid sampling works with floating-point grids of course, which means you can apply an arbitrary non-round shift to your image and choose a sampling mode (bilinear is the default).
This can be implemented out of the box with F.grid_sampling. Given an image tensor img, we first construct a pixel grid of that image using torch.meshgrid. Keep in mind the grid used by the sampler must be normalized to [-1, -1]. Therefore pixel x=0,y=0 should be mapped to (-1,-1), pixel x=w,y=h mapped to (1,1), and the center pixel will end up at around (0,0).
Use two torch.arange with a [0,1]-normalization followed by a remapping to [-1,1]:
>>> c,h,w = img.shape
>>> x, y = torch.arange(h)/(h-1), torch.arange(w)/(w-1)
>>> grid = torch.dstack(torch.meshgrid(x, y))*2-1
So the resulting grid has a shape of (c, h, w) which will be the dimensions of the output image produced by the sampling process.
Since we are not working with batched elements, we need to unsqueeze singleton dimensions on both img and grid. Then we can apply F.grid_sample:
>>> sampled = F.grid_sample(img[None], grid[None])
Following this you can apply your arbitrary mu_x, mu_y shift and even easily use this to batches of images and shifts. The way you would define your sampling is by defining a shifted grid:
>>> x_s, y_s = (torch.arange(h)+mu_y)/(h-1), (torch.arange(w)+mu_x)/(w-1)
Where mu_x and mu_y are the values in pixels (floating point) with wish which the image is shifted on the horizontal and vertical axes respectively. To acquire the sampled image, apply F.grid_sampling on a grid made up of x_s and y_s:
>>> grid_shifted = torch.dstack(torch.meshgrid(x_s, y_s))*2-1
>>> sampled = F.grid_sample(img[None], grid_shifted[None])
I am trying to change the perspective of the first image according to the second image. For finding the homography matrix I have found the four coordinates of a common object(white notice board) in both images.
Image 1
Image 2
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
img1 = cv2.imread("rgb per.jpg")
img2 = cv2.imread("IR per.jpg")
img_1_coor = np.float32([[1178,425], [1201,425], [1178,439], [1201,439], [1551,778]]) #coordinate of white notice board in rgb image
img_2_coor = np.float32([[370,98], [381,103], [367,107], [380,112], [498,332]]) #coordinate of same object in IR image
for x in range(0,4):
cv2.circle(img1,(img_1_coor[x][0],img_1_coor[x][1]),5,(255,0,0),1)
matplotlib.rcParams['figure.dpi'] = 300
#plt.imshow(img1) #this verified that the found coordinates are correct
#P = cv2.getPerspectiveTransform(img_1_coor,img_2_coor)
H, s = cv2.findHomography (img_1_coor,img_2_coor)
print(s)
perspective = cv2.warpPerspective(img1, H, img2.shape[:2])
plt.imshow(perspective)
#the resulting image
Output image
I want the output to be like image 1 with point-of-view(camera angle) as image 2. If this is not possible, vice versa would also be helpful to have image 2 with the point of view as image 1.
Can someone tell if it's possible to use coordinates of the object in image to change the perspective of full-image, if yes is there any problem with my code?
I recommend a computer vision package that implements the perspective transform (e.g. OpenCV or whatever they use nowadays). It is a fairly common computer vision operation. It is also known less properly as corner-pinning.
If you wish to implement it yourself as an exercise, you may look at the Direct Linear Transform where you solve your matrix problem in homogenous coordinates, possibly by using a singular value decomposition.
We need to detect whether the images produced by our tunable lens are blurred or not.
We want to find a proxy measure for blurriness.
My current thinking is to first apply Sobel along the x direction because the jumps or the stripes are mostly along this direction. Then computing the x direction marginal means and finally compute the standard deviation of these marginal means.
We expect this Std is bigger for a clear image and smaller for a blurred one because clear images shall have a large intensity or more bigger jumps of pixel values.
But we get the opposite results. How could we improve this blurriness measure?
def sobel_image_central_std(PATH):
# use the blue channel
img = cv2.imread(PATH)[:,:,0]
# extract the central part of the image
hh, ww = img.shape
hh2 = hh // 2
ww2 = ww// 2
hh4 = hh // 4
ww4 = hh //4
img_center = img[hh4:(hh2+hh4), ww4:(ww2+ww4)]
# Sobel operator
sobelx = cv2.Sobel(img_center, cv2.CV_64F, 1, 0, ksize=3)
x_marginal = sobelx.mean(axis = 0)
plt.plot(x_marginal)
return(x_marginal.std())
Blur #1
Blur #2
Clear #1
Clear #2
In general:
Is there a way to detect if an image is blurry?
You can combine calculation this with your other question where you are searching for the central angle.
Once you have the angle (and the center, maybe outside of the image) you can make an axis transformation to remove the circular component of the cone. Instead you get x (radius) and y (angle) where y would run along the circular arcs.
Maybe you can get the center of the image from the camera set-up.
Then you don't need to calculate it using the intersection of the edges from the central angle. Or just do it manually once if it is fixed for all images.
Look at polar coordinate systems.
Due to the shape of the cone the image will be more dense at the peak but this should be a fixed factor. But this will probably bias the result when calculation the blurriness along the transformed image.
So what you could to correct this is create a synthetic cone image with circular lines and do the transformation on it. Again, requires some try-and-error.
But it should deliver some mask that you could use to correct the "blurriness bias".
The goal is to get the Bird's Eye View from KITTI images (dataset), and I have the Projection Matrix (3x4).
There are many ways to generate transformation matrices. For Bird's Eye View I have read some kind math expressions, like:
H12 = H2*H1-1=ARA-1=P*A-1 in OpenCV - Projection, homography matrix and bird's eye view
and x = Pi * Tr * X in kitti dataset camera projection matrix
but none of these options worked for my purpose.
PYTHON CODE
import numpy as np
import cv2
image = cv2.imread('Data/RGB/000007.png')
maxHeight, maxWidth = image.shape[:2]
M has 3x4 dimensions
M = np.array(([721.5377, 0.0, 609.5593, 44.85728], [0.0, 721.5377, 72.854, 0.2163791], [0.0, 0.0, 1.0, .002745884]))
Here It's necessary a M matrix with 3x3 dimensions
warped = cv2.warpPerspective(image, M, (maxWidth, maxHeight))
show the original and warped images
cv2.imshow("Original", image)
cv2.imshow("Warped", warped)
cv2.waitKey(0)
I need to know how to manage the Projection Matrix for getting Bird's Eye View.
So far, everything I've tried throws warped images at me, without information even close to what I need.
This is a example of image from the KITTI database.
This is other example of image from the KITTI database.
On the left, images are shown detecting cars in 3D (above) and 2D (below). On the right is the Bird's Eye View that I want to obtain. Therefore, I need to obtain the transformation matrix to transform the coordinates of the boxes that delimit the cars.
Here is my code to manually build a bird's eye view transform:
cv::Mat1d CameraModel::getInversePerspectiveMapping(double pixelPerMeter, cv::Point const & origin) const {
double f = pixelPerMeter * cameraPosition()[2];
cv::Mat1d R(3,3);
R << 0, 1, 0,
1, 0, 0,
0, 0, 1;
cv::Mat1d K(3,3);
K << f, 0, origin.x,
0, f, origin.y,
0, 0, 1;
cv::Mat1d transformtoGround = K * R * mCameraToCarMatrix(cv::Range(0,3), cv::Range(0,3));
return transformtoGround * mIntrinsicMatrix.inv();
}
The member variables/functions used inside the functions are
mCameraToCarMatrix: a 4x4 matrix holding the homogeneous rigid transformation from the camera's coordinate system to the car's coordinate system. The camera's axes are x-right, y-down, z-forward. The car's axes are x-forward, y-left, z-up. Within this function only the rotation part of mCameraToCarMatrix is used.
mIntrinsicMatrix: the 3x3 matrix holding the camera's intrinsic parameters
cameraPosition()[2]: the Z-coordinate (height) of the camera in the car's coordinate frame. It's the same as mCameraToCarMatrix(2,3).
The function parameters:
pixelPerMeter: the resolution of the bird's eye view image. A distance of 1 meter on the XY plane will translate to pixelPerMeter pixels in the bird's eye view image.
origin: the camera's position in the bird's eye view image
You can pass the transform matrix to cv::initUndistortRectifyMaps() as newCameraMatrix and then use cv::remap to create the bird's eye view image.
I am trying to scale a set of images in Skimage. I am using the following code, which works well, except that the new rescaled image (by a factor 2) is now centered in the top-left (see below). I would like the image to remain in the original centre. Is there a simple way to achieve this? My aim is to have the saved copy of the image (e.g. as jpg file) to remain centered. My question does not concern the display of the image through imshow. E.g. when i save the image per below - the image is centered to the upper left, which causes issues with subsequent steps in my code.
###Part of the code
tform=skimage.transform.SimilarityTransform(scale=2, rotation=0,translation=(0, 0))
rotated = skimage.transform.warp(test, tform)
plt.imshow(rotated)
import scipy
scipy.misc.imsave('rotated.jpg', rotated)
Scaling as itself is defined as one subset of affine transformations.
The affine transformation matrix for scaling only is defined as
s_x, 0, 0
0, s_y, 0
0, 0, 1
where s_x and s_y are the scaling factors in the respective dimensions (defined relative to the origin at (0,0)). If you want your image, to be scaled not relative to the origin, but another point, you first translate the image , so that the center of scaling is in the origin, then you scale, then you move the image back. You simply do a matrix multiplication of your transform matrices with the scale matrix. I had a similar problem with rotation, that can be found here. Same principle applies for this problem. The result is
s_x, 0, (-s_x*x)+x
0, s_y, (-s_y*y)+y
0, 0, 1
where x and y are half the size of your image in the respective dimensions.
The resulting matrix can be used with:
skimage.transform.AffineTransform(matrix)