How to get Bird's Eye View from KITTI by Projection Matrix? - python

The goal is to get the Bird's Eye View from KITTI images (dataset), and I have the Projection Matrix (3x4).
There are many ways to generate transformation matrices. For Bird's Eye View I have read some kind math expressions, like:
H12 = H2*H1-1=ARA-1=P*A-1 in OpenCV - Projection, homography matrix and bird's eye view
and x = Pi * Tr * X in kitti dataset camera projection matrix
but none of these options worked for my purpose.
PYTHON CODE
import numpy as np
import cv2
image = cv2.imread('Data/RGB/000007.png')
maxHeight, maxWidth = image.shape[:2]
M has 3x4 dimensions
M = np.array(([721.5377, 0.0, 609.5593, 44.85728], [0.0, 721.5377, 72.854, 0.2163791], [0.0, 0.0, 1.0, .002745884]))
Here It's necessary a M matrix with 3x3 dimensions
warped = cv2.warpPerspective(image, M, (maxWidth, maxHeight))
show the original and warped images
cv2.imshow("Original", image)
cv2.imshow("Warped", warped)
cv2.waitKey(0)
I need to know how to manage the Projection Matrix for getting Bird's Eye View.
So far, everything I've tried throws warped images at me, without information even close to what I need.
This is a example of image from the KITTI database.
This is other example of image from the KITTI database.
On the left, images are shown detecting cars in 3D (above) and 2D (below). On the right is the Bird's Eye View that I want to obtain. Therefore, I need to obtain the transformation matrix to transform the coordinates of the boxes that delimit the cars.

Here is my code to manually build a bird's eye view transform:
cv::Mat1d CameraModel::getInversePerspectiveMapping(double pixelPerMeter, cv::Point const & origin) const {
double f = pixelPerMeter * cameraPosition()[2];
cv::Mat1d R(3,3);
R << 0, 1, 0,
1, 0, 0,
0, 0, 1;
cv::Mat1d K(3,3);
K << f, 0, origin.x,
0, f, origin.y,
0, 0, 1;
cv::Mat1d transformtoGround = K * R * mCameraToCarMatrix(cv::Range(0,3), cv::Range(0,3));
return transformtoGround * mIntrinsicMatrix.inv();
}
The member variables/functions used inside the functions are
mCameraToCarMatrix: a 4x4 matrix holding the homogeneous rigid transformation from the camera's coordinate system to the car's coordinate system. The camera's axes are x-right, y-down, z-forward. The car's axes are x-forward, y-left, z-up. Within this function only the rotation part of mCameraToCarMatrix is used.
mIntrinsicMatrix: the 3x3 matrix holding the camera's intrinsic parameters
cameraPosition()[2]: the Z-coordinate (height) of the camera in the car's coordinate frame. It's the same as mCameraToCarMatrix(2,3).
The function parameters:
pixelPerMeter: the resolution of the bird's eye view image. A distance of 1 meter on the XY plane will translate to pixelPerMeter pixels in the bird's eye view image.
origin: the camera's position in the bird's eye view image
You can pass the transform matrix to cv::initUndistortRectifyMaps() as newCameraMatrix and then use cv::remap to create the bird's eye view image.

Related

Project 3D mesh on 2d image using camera intrinsic matrix

I've been trying to use the HOnnotate dataset to extract perspective correct hand and object masks as shown in the images of Task-3 of the Hands-2019 challenge.
The data set comes with the following annotations:
annotations:
The annotations are provided in pickled files under meta folder for each sequence. The pickle files in the training data contain a dictionary with the following keys:
objTrans: A 3x1 vector representing object translation
objRot: A 3x1 vector representing object rotation in axis-angle representation
handPose: A 48x1 vector represeting the 3D rotation of the 16 hand joints including the root joint in axis-angle representation. The ordering of the joints follow the MANO model convention (see joint_order.png) and can be directly fed to MANO model.
handTrans: A 3x1 vector representing the hand translation
handBeta: A 10x1 vector representing the MANO hand shape parameters
handJoints3D: A 21x3 matrix representing the 21 3D hand joint locations
objCorners3D: A 8x3 matrix representing the 3D bounding box corners of the object
objCorners3DRest: A 8x3 matrix representing the 3D bounding box corners of the object before applying the transormation
objName: Name of the object as given in YCB dataset
objLabel: Object label as given in YCB dataset
camMat: Intrinsic camera parameters
handVertContact: A 778D boolean vector whose each element represents whether the corresponding MANO vertex is in contact with the object. A MANO vertex is in contact if its distance to the object surface is <4mm
handVertDist: A 778D float vector representing the distance of MANO vertices to the object surface.
handVertIntersec: A 778D boolean vector specifying if the MANO vertices are inside the object surface.
handVertObjSurfProj: A 778x3 matrix representing the projection of MANO vertices on the object surface.
It also comes with a visualization script (https://github.com/shreyashampali/ho3d) that can render the annotations as 3D meshes (using Open3D) or 2D projects of on object corners and hand points (using Matplotlib):
What I am trying to do is project the visualization created by Open3D back to the original image.
So far I have not been able to do this. What I have been able to do is get the point cloud from 3d mesh and apply the camera intrinsic on it to make it perspective correct, now the question is how to create a mask out of the point-cloud for both hands and objects like the one from Open3d rendering.
# code looks as follows
# "mesh" is an Open3D triangle mesh ie "open3d.geometry.TriangleMesh()"
pcd = open3d.geometry.PointCloud()
pcd.points = mesh.vertices
pcd.colors = mesh.vertex_colors
pcd.normals = mesh.vertex_normals
pts3D = np.asarray(pcd.points)
# hand/object along negative z-axis so need to correct perspective when plotting using OpenCV
cord_change_mat = np.array([[1., 0., 0.], [0, -1., 0.], [0., 0., -1.]], dtype=np.float32)
pts3D = pts3D.dot(cord_change_mat.T)
# "anno['camMat']" is camera intrinsic matrix
img_points, _ = cv2.projectPoints(pts3D, (0, 0, 0), (0, 0, 0), anno['camMat'], np.zeros(4, dtype='float32'))
# draw perspective correct point cloud back on the image
for point in img_points:
p1, p2 = int(point[0][0]), int(point[0][1])
img[p2, p1] = (255, 255, 255)
Basically, I'm trying to get this segmentation mask out:
PS. Sorry if this doesn't make much sense, I'm very much new to 3D meshes, point clouds and their projections. I don't know all the correct technical words from them, yet. Leave a comment with a question and I can try to explain it as far as I can.
Turns out there is an easy way to do this task using Open3D and the camera intrinsic values. Basically we instruct Open3D to render the image from the POV of the camera.
import open3d
import open3d.visualization.rendering as rendering
# Create a renderer with a set image width and height
render = rendering.OffscreenRenderer(img_width, img_height)
# setup camera intrinsic values
pinhole = open3d.camera.PinholeCameraIntrinsic(img_width, img_height, fx, fy, cx, cy)
# Pick a background colour of the rendered image, I set it as black (default is light gray)
render.scene.set_background([0.0, 0.0, 0.0, 1.0]) # RGBA
# now create your mesh
mesh = open3d.geometry.TriangleMesh()
mesh.paint_uniform_color([1.0, 0.0, 0.0]) # set Red color for mesh
# define further mesh properties, shape, vertices etc (omitted here)
# Define a simple unlit Material.
# (The base color does not replace the mesh's own colors.)
mtl = o3d.visualization.rendering.Material()
mtl.base_color = [1.0, 1.0, 1.0, 1.0] # RGBA
mtl.shader = "defaultUnlit"
# add mesh to the scene
render.scene.add_geometry("MyMeshModel", mesh, mtl)
# render the scene with respect to the camera
render.scene.camera.set_projection(camMat, 0.1, 1.0, 640, 480)
img_o3d = render.render_to_image()
# we can now save the rendered image right at this point
open3d.io.write_image("output.png", img_o3d, 9)
# Optionally, we can convert the image to OpenCV format and play around.
# For my use case I mapped it onto the original image to check quality of
# segmentations and to create masks.
# (Note: OpenCV expects the color in BGR format, so swap red and blue.)
img_cv2 = cv2.cvtColor(np.array(img_o3d), cv2.COLOR_RGBA2BGR)
cv2.imwrite("cv_output.png", img_cv2)
This answer borrows a lot from this answer

Image Processing: how to imwarp with simple mask on destination?

Following my own question from 4 years ago, this time in Python only-
I am looking for a way to perform texture mapping into a small region in a destination image, defined by 4 corners given as (x, y) pixel coordinates. This region is not necessarily rectangular. It is a perspective projection of some rectangle onto the image plane.
I would like to map some (rectangular) texture into the mask defined by those corners.
Mapping directly by forward-mapping the texture will not work properly, as source pixels will be mapped to non-integer locations in the destination.
This problem is usually solved by inverse-warping from the destination to the source, then coloring according to some interpolation.
Opencv's warpPerspective doesn't work here, as it can't take a mask in.
Inverse-warping the entire destination and then mask is not acceptable because the majority of the computation is redundant.
Is there a built-in opencv (or other) function that accomplishes above requirements?
If not, what is a good way to get a list of pixels from my ROI defined by corners, in favor of passing that to projectPoints?
Example background image:
I want to fill the area outlined by the red lines (defined by its corners) with some other texture, say this one
Mapping between them can be obtained by mapping the texture's corners to the ROI corners with cv2.getPerspectiveTransform
For future generations, here is how to only back and forward warp pixels within the bbox of the warped corner points, as #Micka suggested.
here banner is the grass image, and banner_coords_2d are the corners of the red region on image, which is meme-man.
def transform_banner(banner_coords_2d, banner, image):
# show_points_on_image("banner corners", image, banner_coords_2d)
banner_height, banner_width, _ = banner.shape
src_banner_points = np.float32([
[0, 0],
[banner_width - 1, 0],
[0, banner_height - 1],
[banner_width - 1, banner_height - 1],
])
# only warp to size of bbox of warped corners, not all of the image
warped_left = np.round(np.min(banner_coords_2d[:, 0])).astype(int)
warped_right = np.round(np.max(banner_coords_2d[:, 0])).astype(int)
warped_top = np.round(np.min(banner_coords_2d[:, 1])).astype(int)
warped_bottom = np.round(np.max(banner_coords_2d[:, 1])).astype(int)
warped_width = int(warped_right - warped_left)
warped_height = int(warped_bottom - warped_top)
dst_banner_points = banner_coords_2d.astype(np.float32)
dst_banner_points[:, 0] -= warped_left
dst_banner_points[:, 1] -= warped_top
tform = cv2.getPerspectiveTransform(src_banner_points, dst_banner_points)
warped_banner = cv2.warpPerspective(banner, tform, (warped_width, warped_height))
# cv2.imshow("warped_banner", warped_banner)
image_with_banner = image.copy()
image_with_banner[warped_top: warped_bottom, warped_left: warped_right][warped_banner != 0] = warped_banner[
warped_banner != 0]
# cv2.imshow("image_with_banner", image_with_banner)
return image_with_banner
Likely, this can be done more neatly, I am open to edits.

Perspective transform with whole image

So I have four points in an array A, denoting the corners of a rectangular object (but not a rectangle when projected onto the image plane). I know the size of the rectangle, so I can calculate the perspective transform with
cv2.getPerspectiveTransform(four_corners, np.array([[0, 0], [0, height], [width, height], [width, 0], dtype=np.float32))
Then I can transform the image with cv2.warpPerspective.
The problem (which can be demonstrated by another people's question Counting aspect ratio of Perspective Transform destination image) is that the warped result is cropped. Only the region inside the four corners are in the final image, while I want the whole original image to be warped into the final result.
How do I achieve that?

Scale and Centre image - Skimage

I am trying to scale a set of images in Skimage. I am using the following code, which works well, except that the new rescaled image (by a factor 2) is now centered in the top-left (see below). I would like the image to remain in the original centre. Is there a simple way to achieve this? My aim is to have the saved copy of the image (e.g. as jpg file) to remain centered. My question does not concern the display of the image through imshow. E.g. when i save the image per below - the image is centered to the upper left, which causes issues with subsequent steps in my code.
###Part of the code
tform=skimage.transform.SimilarityTransform(scale=2, rotation=0,translation=(0, 0))
rotated = skimage.transform.warp(test, tform)
plt.imshow(rotated)
import scipy
scipy.misc.imsave('rotated.jpg', rotated)
Scaling as itself is defined as one subset of affine transformations.
The affine transformation matrix for scaling only is defined as
s_x, 0, 0
0, s_y, 0
0, 0, 1
where s_x and s_y are the scaling factors in the respective dimensions (defined relative to the origin at (0,0)). If you want your image, to be scaled not relative to the origin, but another point, you first translate the image , so that the center of scaling is in the origin, then you scale, then you move the image back. You simply do a matrix multiplication of your transform matrices with the scale matrix. I had a similar problem with rotation, that can be found here. Same principle applies for this problem. The result is
s_x, 0, (-s_x*x)+x
0, s_y, (-s_y*y)+y
0, 0, 1
where x and y are half the size of your image in the respective dimensions.
The resulting matrix can be used with:
skimage.transform.AffineTransform(matrix)

Perspective Warping in OpenCV based on know camera orientation

I am working on a project which attempts to remove the perspective distortion from an image based on the known orientation of the camera. My thinking is that I can create a rotational matrix based on the known X, Y, and Z orientations of the camera. I can then apply those matrices to the image via the WarpPerspective method.
In my script (written in Python) I have created three rotational matrices, each based on an orientation angle. I have gotten to a point where I am stuck on two issues. First, when I load each individual matrix into the WarpPerspective method, it doesn't seem to be working correctly. Whenever I warp an image on one axis it appears to significantly overwarp the image. The contents of the image are only recognizable if I limit the orientation angle to around 1 degree or less.
Secondly, how do I combine the three rotational matrices into a single matrix to be loaded into the WarpPerspective method. Can I import a 3x3 rotational matrix into that method, or do I have to create a 4x4 projective matrix. Below is the code that I am working on.
Thank you for your help.
CR
from numpy import *
import cv
#Sets angle of camera and converts to radians
x = -14 * (pi/180)
y = 20 * (pi/180)
z = 15 * (pi/180)
#Creates the Rotational Matrices
rX = array([[1, 0, 0], [0, cos(x), -sin(x)], [0, sin(x), cos(x)]])
rY = array([[cos(y), 0, -sin(y)], [0, 1, 0], [sin(y), 0, cos(y)]])
rZ = array([[cos(z), sin(z), 0], [-sin(z), cos(z), 0], [0, 0, 1]])
#Converts to CVMat format
X = cv.fromarray(rX)
Y = cv.fromarray(rY)
Z = cv.fromarray(rZ)
#Imports image file and creates destination filespace
im = cv.LoadImage("reference_image.jpg")
dst = cv.CreateImage(cv.GetSize(im), cv.IPL_DEPTH_8U, 3)
#Warps Image
cv.WarpPerspective(im, dst, X)
#Display
cv.NamedWindow("distorted")
cv.ShowImage("distorted", im)
cv.NamedWindow("corrected")
cv.ShowImage("corrected", dst)
cv.WaitKey(0)
cv.DestroyWindow("distorted")
cv.DestroyWindow("corrected")
You are doing several things wrong. First, you can't rotate on the x or y axis without a camera model. Imagine a camera with an incredibly wide field of view. You could hold it really close to an object and see the entire thing but if that object rotated its edges would seem to fly towards you very quickly with a strong perspective distortion. On the other hand a small field of view (think telescope) has very little perspective distortion. A nice place to start is setting your image plane at least as far from the camera as it is wide and putting your object right on the image plane. That is what I did in this example (c++ openCV)
The steps are
construct a rotation matrix
center the image at the origin
rotate the image
move the image down the z axis
multiply by the camera matrix
warp the perspective
//1
float x = -14 * (M_PI/180);
float y = 20 * (M_PI/180);
float z = 15 * (M_PI/180);
cv::Matx31f rot_vec(x,y,z);
cv::Matx33f rot_mat;
cv::Rodrigues(rot_vec, rot_mat); //converts to a rotation matrix
cv::Matx33f translation1(1,0,-image.cols/2,
0,1,-image.rows/2,
0,0,1);
rot_mat(0,2) = 0;
rot_mat(1,2) = 0;
rot_mat(2,2) = 1;
//2 and 3
cv::Matx33f trans = rot_mat*translation1;
//4
trans(2,2) += image.rows;
cv::Matx33f camera_mat(image.rows,0,image.rows/2,
0,image.rows,image.rows/2,
0,0,1);
//5
cv::Matx33f transform = camera_mat*trans;
//6
cv::Mat final;
cv::warpPerspective(image, final, cv::Mat(transform),image.size());
This code gave me this output
I did not see Franco's answer until I posted this. He is completely correct, using FindHomography would save you all these steps. Still I hope this is useful.
Just knowing the rotation is not enough unless your images are taken either using a telecentric lens, or with a telephoto lens with very long focal (in which cases the images are nearly orthographic, and there is no perspective distortion).
Besides, it's not necessary. True, you can undo the perspective foreshortening of one plane in the image by calibrating the camera (i.e. estimating the intrinsic and extrinsic parameters to form the full camera projection matrix).
But you achieve the same result much more simply if you can identify in the image a quadrangle which is the image of a real-world square (or rectangle with known width/height ratio). If you can do that, you can trivially compute the homography matrix that maps the square (rectangle) to the quadrangle, then warp using its inverse.
The Wikipedia page on rotation matrices shows how it is possible to combine the three basic rotation matrices into one.

Categories