I am working on a python project which involves stitching high resolution images which are mostly greyscale. The method of extracting features is not viable for this project as the images do not contain enough key points/descriptors for feature extracting with algorithms such as SIFT/SURF.
I have attempted to predict a pixels location between 2 images using the intrinsic matrix of the camera to create a rotation matrix but this was unsuccessful.
is it possible to project pixels from 1 image into another in this way at all or should I be looking at something else? possibly working with the extrinsic matrix?
the camera uses a 600mm lens with a full frame sensor.
Related
I have a setup where a (2D) camera is mounted on the end-effector of a robot arm - similar to the OpenCV documentation:
I want to calibrate the camera and find the transformation from camera to end-effector.
I have already calibrated the camera using this OpenCV guide, Camera Calibration, with a checkerboard where the undistorted images are obtained.
My problem is about finding the transformation from camera to end-effector. I can see that OpenCV has a function, calibrateHandEye(), which supposely should achieve this. I already have the "gripper2base" vectors and are missing the "target2cam" vectors. Should this be based on the size of the checkerboard squares or what am I missing?
Any guidance in the right direction will be appreciated.
You are close to the answer.
Yes, it is based on the size of the checkerboard. But instead of directly taking those parameters and an image, this function is taking target2cam. How to get target2cam? Just simply move your robot arm above the chessboard so that the camera can see the chessboard and take a picture. From the picture of the chessboard and camera intrinsics, you can find target2cam. Calculating the extrinsic from the chessboard is already given in opencv.
Repeat this a couple of times at different robot poses and collect multiple target2cam. Put them calibrateHandEye() and you will get what you need.
I have pairs of image - an image (blurred intentionally) and its depth map (given as PNG).
For example:
However, there seems to be a shift between the depth map and the real image as can be seen in this example:
All i know that these images were shot with a RealSense LiDAR Camera L515 (I do not have knowledge of the underlying camera characteristics or the distance between both rgb and infrared sensors).
Is there a way to align both images? I searched the internet for possible solutions. However, all solutions rely on data that I do not have, such as the intrinsic matrix, cameras SDK and more.
Since the two imaging systems are very close physically, the homography between them would likely be a good approximation. You can find the homography using 4 corresponding points that you choose manually.
You can use the OpenvCV implementation.
I am using python and OpenCV to perform spatial and temporal denoising on rgb-d images. However, until now I was using data physically captured using Kinect in my lab. But I realised, after hole filling(spatial denoising) I will require a pixelwise perfect depth image meaning, without holes/temporal noise (or groundtruth) to compare the hole filled output. So that I can conclude the efficiency of my approach. I have tried searching for online databases such as KITTY, Middlebury, TUM and Darmstadt dataset but none of them have a perfect .png image I can use as my groundtruth. Is there any such dataset available that could be useful for my application or will I have to create a groundtruth myself by using any tools/plugins to create artificial depth images from computer generated 3D scenes or reproduce depth noise?
I've been doing a lot of Image Processing recently on Python using OpenCV and I've worked all this while with 2-D Images in the generic BGR style.
Now, I'm trying to figure out how to incorporate depth and work with depth information as well.
I've seen the documentation on creating simple point clouds using the Left and Right images of a Stereocamera, but I was hoping to gain some intuition on Depth-based cameras themselves like Kinect.
What kind of camera should I use for this purpose, and more importantly: how do I process these images in Python - as I can't find a lot of documentation on handling RGBD images in OpenCV.
If you want to work with depth based cameras you can go for Time of Flight(ToF) cameras like picoflexx and picomonstar cameras. They will give you X,Y and Z values. Where your x and y values are distances from camera centre of that point (like in 2D space) and Z will five you the direct distance of that point (not perpendicular) from camera centre.
For this camera and this 3d data processing you can use Point Cloud Library fro processing.
Imagine someone taking a burst shot from camera, he will be having multiple images, but since no tripod or stand was used, images taken will be slightly different.
How can I align them such that they overlay neatly and crop out the edges
I have searched a lot, but most of the solutions were either making a 3D reconstruction or using matlab.
e.g. https://github.com/royshil/SfM-Toy-Library
Since I'm very new to openCV, I will prefer a easy to implement solution
I have generated many datasets by manually rotating and cropping images in MSPaint but any link containing corresponding datasets(slightly rotated and translated images) will also be helpful.
EDIT:I found a solution here
http://www.codeproject.com/Articles/24809/Image-Alignment-Algorithms
which gives close approximations to rotation and translation vectors.
How can I do better than this?
It depends on what you mean by "better" (accuracy, speed, low memory requirements, etc). One classic approach is to align each frame #i (with i>2) with the first frame, as follows:
Local feature detection, for instance via SIFT or SURF (link)
Descriptor extraction (link)
Descriptor matching (link)
Alignment estimation via perspective transformation (link)
Transform image #i to match image 1 using the estimated transformation (link)