Applying Homographies to Remove Perspective Distortion - python

When calculating Homography, usually the information of the camera should be provided. Is there any straightforward technique to achieve perspective correction without actually having camera's properties?
are there papers for that?

A standard technique is calibration with a target.
To identify a (planar) homography, four points suffice. Take an image of the viewed plane where you place a contrasted rectangle and locate the corners in the image (pixel coordinates). You could do this by image processing or just manually. Then choose the pixel coordinates where you would like the corners to map after correction.
This will allow you to write a system of eight equations in the eight unknown parameters of the homography. Fortunately, this system is easily linearized and the solution is unique.

Related

How to detect when an image needs perspective transform?

I have a set of images in which I need to detect which of them needs a perspective transform. The images might be plain documents or photos taken with phone cameras with perspective and I need to perform perspective transform on those. How can I detect which need perspective transform in opencv?
I can do perspective transform, however, I'm not capable of detecting when an image needs to suffer a perspective transform.
This could be a possible approach:
Take a reference picture (which does not require a perspective transform).
Define four points of interest- (x1,y1) (x2,y2) (x3,y3) (x4,y4) in your reference image. Consider these points as your destination points.
Now in every other image that you want to check if a perspective transform is necessary, you will detect the same points of interest in those images. Lets call them source points.
Next you have to check if the source points match your destination points. Also you will have to check if the dimensions(width & height) match.
If neither of the two matches(the points or the dimension), there's a need for perspective transform.

Calculating positions of objects as (x,y) on a known platform (opencv-python)

I have a platform which I know the sizes. I would like to get the positions of objects placed on it as (x,y) while looking through the webcam, the origin being the top-left corner of the platform. However, I can only look through from a low angle: example
I detect the objects using the otsu threshold. I want to use the bottom edge of the bounding rectangles, then proportion it accordingly concerning the corners (the best I can think of), but I don't know how to implement it. I tried warp perspective but it enlarges the objects too much. image with threshold // attempt of warp perspective
Any help or suggestion would be appreciated.
Don't use warp perspective to transform the image to make the table cover the complete image as you did here.
While performing perspective transformations in image processing, try not to transform the image too much.
Below is the image with your table marked with red trapezium that you transformed.
Now try to transform it into a perfect rectangle but you do not want to transform it too much as you did. One way is to transform the trapezium to a rectangle by simply adjusting the shorter edge's vertices to come directly above the lower edge's vertices as shown in the image below with green.
This way, things far from the camera will be skewed wrt width only a little. This will give better results. Another even better way would be to decrease the size of the lower edge a little and increase the size of the upper edge a little. This will evenly skew objects kept over the table as shown below.
Now, as you know the real dimensions of the table and the dimensions of the rectangle in the image, you can do the mapping. Using this, you can determine the exact position of the objects kept on the table.

Align RGB Depth Map with RGB image without intrinsic matrix

I have pairs of image - an image (blurred intentionally) and its depth map (given as PNG).
For example:
However, there seems to be a shift between the depth map and the real image as can be seen in this example:
All i know that these images were shot with a RealSense LiDAR Camera L515 (I do not have knowledge of the underlying camera characteristics or the distance between both rgb and infrared sensors).
Is there a way to align both images? I searched the internet for possible solutions. However, all solutions rely on data that I do not have, such as the intrinsic matrix, cameras SDK and more.
Since the two imaging systems are very close physically, the homography between them would likely be a good approximation. You can find the homography using 4 corresponding points that you choose manually.
You can use the OpenvCV implementation.

How to find a four point polygon approximation with OpenCV?

So I have a problem that requires me to get a perspective transform on a series of numbers. However, in order to get the four point transform, I need the correct points to send as parameters to the function. I couldn't find any methods that will solve this problem, and I've tried convex hull (returns more than four), minAreaRect (it returns a rectangle).
I don't have a lot of experience with OCR, but I would hope all the text segments live on the same perspective plane.
If so, how about using a simplified convex Hull (e.g. convexHull() then approxPolyDP) on one of seven connected components to get the points / compute perspective, then apply the same unwarp to an a scaled quad that encloses all the components ? (probably not perfect, but close)
Hopefully the snippets in this answer will help:
I really hope the same perspective transformation can be applied to each yellow text connected component.

opencv: reprojectImageTo3d what is the metric unit of the (X,Y,Z) point?

firstly, I wanted to know the metric unit of the 3d point we got from the opencv reprojectImageTo3D() function.
secondly, I have calibrated each camera individually with a chessboard with "mm" as metric unit and then use the opencv functions to calibrate the stereo system, rectify the stereo pair and then compute the disparity map.
Basically i want the distance of a center of a bounding box.
so i compute the disparity map and reproject it to 3D with the reprojectImageTo3D() function and then i take from those 3D points, the one which correspond to the center of the bbox (x, y).
But which image should i use to get the center of bbox? the rectified or the original?
Secondly, is it better to use the same camera model for a stereo system?
Thank you
During the calibration process (calibrateCamera) you have to give the points grid of your calibration target. The unit that you give there will then define the unit for the rest of the process.
When calling reprojectImageTo3D, you probably used the matrix Q output by stereoRectify, which takes in the individual calibrations (cameraMatrix1, cameraMatrix2). That's where the unit came from.
So in your case you get mm I guess.
reprojectImageTo3D has to use the rectified image, since the disparity is calculated using the rectified image (It wouldn't be properly aligned otherwise). Also, when calculating the disparity, it is calculated relative to the first image given (left one in the doc). So you should use the left rectified image if you computed the disparity like this: cv::StereoMatcher::compute(left, right)
I never had two different cameras, but it makes sense to use the same ones. I think that if you have very different color images, edges or any image difference, that could potentially influence the disparity quality.
What is actually very important (unless you are only working with still pictures), is to use cameras that can be synchronized by hardware (e.g. GENLOCK signal: https://en.wikipedia.org/wiki/Genlock). If you have a bit of delay between left and right and a moving subject, the disparity can be wrong. This is also true for the calibration.
Hope this helps!

Categories