I have pairs of image - an image (blurred intentionally) and its depth map (given as PNG).
For example:
However, there seems to be a shift between the depth map and the real image as can be seen in this example:
All i know that these images were shot with a RealSense LiDAR Camera L515 (I do not have knowledge of the underlying camera characteristics or the distance between both rgb and infrared sensors).
Is there a way to align both images? I searched the internet for possible solutions. However, all solutions rely on data that I do not have, such as the intrinsic matrix, cameras SDK and more.
Since the two imaging systems are very close physically, the homography between them would likely be a good approximation. You can find the homography using 4 corresponding points that you choose manually.
You can use the OpenvCV implementation.
Related
I am working on a python project which involves stitching high resolution images which are mostly greyscale. The method of extracting features is not viable for this project as the images do not contain enough key points/descriptors for feature extracting with algorithms such as SIFT/SURF.
I have attempted to predict a pixels location between 2 images using the intrinsic matrix of the camera to create a rotation matrix but this was unsuccessful.
is it possible to project pixels from 1 image into another in this way at all or should I be looking at something else? possibly working with the extrinsic matrix?
the camera uses a 600mm lens with a full frame sensor.
When calculating Homography, usually the information of the camera should be provided. Is there any straightforward technique to achieve perspective correction without actually having camera's properties?
are there papers for that?
A standard technique is calibration with a target.
To identify a (planar) homography, four points suffice. Take an image of the viewed plane where you place a contrasted rectangle and locate the corners in the image (pixel coordinates). You could do this by image processing or just manually. Then choose the pixel coordinates where you would like the corners to map after correction.
This will allow you to write a system of eight equations in the eight unknown parameters of the homography. Fortunately, this system is easily linearized and the solution is unique.
I've been doing a lot of Image Processing recently on Python using OpenCV and I've worked all this while with 2-D Images in the generic BGR style.
Now, I'm trying to figure out how to incorporate depth and work with depth information as well.
I've seen the documentation on creating simple point clouds using the Left and Right images of a Stereocamera, but I was hoping to gain some intuition on Depth-based cameras themselves like Kinect.
What kind of camera should I use for this purpose, and more importantly: how do I process these images in Python - as I can't find a lot of documentation on handling RGBD images in OpenCV.
If you want to work with depth based cameras you can go for Time of Flight(ToF) cameras like picoflexx and picomonstar cameras. They will give you X,Y and Z values. Where your x and y values are distances from camera centre of that point (like in 2D space) and Z will five you the direct distance of that point (not perpendicular) from camera centre.
For this camera and this 3d data processing you can use Point Cloud Library fro processing.
I am using Object segmentation dataset having following information:
Introduced: IROS 2012
Device: Kinect v1
Description: 111 RGBD images of stacked and occluding objects on table.
Labelling: Per-pixel segmentation into objects.
link for the page: http://www.acin.tuwien.ac.at/?id=289
I am trying to use the depth map provided by the dataset. However, it seems the depth map is completely black.
Original image for the above depth map
I tried to do some preprocessing and normalised the image so that the depth map could be visualised in the form of a gray image.
img_depth = cv2.imread("depth_map.png",-1) #depth_map.png has uint16 data type
depth_array = np.array(img_depth, dtype=np.float32)
frame = cv2.normalize(depth_array, depth_array, 0, 1, cv2.NORM_MINMAX)
cv2.imwrite('capture_depth.png',frame*255)
The result of doing this preprocessing is:
In one of the posts in stackoverflow, i read that these black patches are the regions where the depth map was not defined.
If i have to use this depth map, what is the best possible way to fill these undefined regions? (I am thinking of filling these regions with K-nearest neighbour but feel there could be better ways for this).
Are there any RGB-D datasets that do not have such problems or these kind of problems always exists? what are the best possible way to tackle such problems?
Thanks in Advance!
Pretty much every 3d imaging technology will produce data with invalid or missing points. Lack of texture, too steep slopes, obscuration, transparency, reflections,... you name it.
There is no magic solution to filling these holes. You'll need some sort of interpolation or you maybe replace missing points based on some model.
The internet is full of methods for filling holes. Most techniques for intensity images can be successsfully applied to depth images.
It will depend on your application, your requirements and what you know about your objects.
Data quality in 3d is a question of time, money and the right combination of object and technology.
Areas that absorb or scatter the Kinect IR (like glossy surfaces or sharp edges) are filled with zero pixel value (indicating non-calculated depth). A method to approximately fill the non-captured data around these areas is by using the statistical median of a 5x5 window. This method works just fine for Kinect depth images. An example implementation can be seen for Matlab and C# in the links.
Imagine someone taking a burst shot from camera, he will be having multiple images, but since no tripod or stand was used, images taken will be slightly different.
How can I align them such that they overlay neatly and crop out the edges
I have searched a lot, but most of the solutions were either making a 3D reconstruction or using matlab.
e.g. https://github.com/royshil/SfM-Toy-Library
Since I'm very new to openCV, I will prefer a easy to implement solution
I have generated many datasets by manually rotating and cropping images in MSPaint but any link containing corresponding datasets(slightly rotated and translated images) will also be helpful.
EDIT:I found a solution here
http://www.codeproject.com/Articles/24809/Image-Alignment-Algorithms
which gives close approximations to rotation and translation vectors.
How can I do better than this?
It depends on what you mean by "better" (accuracy, speed, low memory requirements, etc). One classic approach is to align each frame #i (with i>2) with the first frame, as follows:
Local feature detection, for instance via SIFT or SURF (link)
Descriptor extraction (link)
Descriptor matching (link)
Alignment estimation via perspective transformation (link)
Transform image #i to match image 1 using the estimated transformation (link)