How to detect when an image needs perspective transform?

How to detect when an image needs perspective transform? - python

I have a set of images in which I need to detect which of them needs a perspective transform. The images might be plain documents or photos taken with phone cameras with perspective and I need to perform perspective transform on those. How can I detect which need perspective transform in opencv?
I can do perspective transform, however, I'm not capable of detecting when an image needs to suffer a perspective transform.

This could be a possible approach:
Take a reference picture (which does not require a perspective transform).
Define four points of interest- (x1,y1) (x2,y2) (x3,y3) (x4,y4) in your reference image. Consider these points as your destination points.
Now in every other image that you want to check if a perspective transform is necessary, you will detect the same points of interest in those images. Lets call them source points.
Next you have to check if the source points match your destination points. Also you will have to check if the dimensions(width & height) match.
If neither of the two matches(the points or the dimension), there's a need for perspective transform.

Related

Calculating positions of objects as (x,y) on a known platform (opencv-python)

I have a platform which I know the sizes. I would like to get the positions of objects placed on it as (x,y) while looking through the webcam, the origin being the top-left corner of the platform. However, I can only look through from a low angle: example
I detect the objects using the otsu threshold. I want to use the bottom edge of the bounding rectangles, then proportion it accordingly concerning the corners (the best I can think of), but I don't know how to implement it. I tried warp perspective but it enlarges the objects too much. image with threshold // attempt of warp perspective
Any help or suggestion would be appreciated.

Don't use warp perspective to transform the image to make the table cover the complete image as you did here.
While performing perspective transformations in image processing, try not to transform the image too much.
Below is the image with your table marked with red trapezium that you transformed.
Now try to transform it into a perfect rectangle but you do not want to transform it too much as you did. One way is to transform the trapezium to a rectangle by simply adjusting the shorter edge's vertices to come directly above the lower edge's vertices as shown in the image below with green.
This way, things far from the camera will be skewed wrt width only a little. This will give better results. Another even better way would be to decrease the size of the lower edge a little and increase the size of the upper edge a little. This will evenly skew objects kept over the table as shown below.
Now, as you know the real dimensions of the table and the dimensions of the rectangle in the image, you can do the mapping. Using this, you can determine the exact position of the objects kept on the table.

OpenCV - How to remove convexity defects in a cam scanner?

I get in trouble by finding an algorithm to remove the convexity of my photos. As you can see the photos are captured from book pages, and I wanna remove the convexity. My question is similar to this but what I have is just page boundaries as input and neither I have grid nor am able to find by processing algorithms.
I wanna output as the right one in the below photo.
Obviously, the perspective transformation is the first thing comes in mind. However, as you can see the result is not promising:

Here's a possible pipeline to solve your problem. The main idea is to identify the text, create a super blob of it with some morphology, locate the 4 corners of this super blob and feed the points to a perspective "unwarper" (or rectifier, or whatever you wish to call that perspective correction method).
Start by converting your image to grayscale and apply adaptive thresholding to it. Try the Gaussian or Mean methods with parameters that better fit your tests. This is the result I obtain after fiddling with the values for a bit:
Now, the idea is to isolate just the text. The solution I applied is: obtain the biggest blobs and subtract them from the original image. You're going to need a method to calculate the area of each binary blob. Check this previous post for suggestions on how to implement one.
These are the biggest blobs from the image:
Subtract the largest blobs from the original image. This is the result:
As you can see, the text is almost isolated. Let me clean up the little bits of pixels by applying, again, an area filter. This time to eliminate the small blobs. This is the result:
Very good, some characters are lost during the operation, but that’s ok. We need a nice continuous block of text, because we are gonna dilate the hell of it. I tried applying a rectangular structuring element of size 5 and 5 Op iterations. Erode the output with 5 more iterations afterward, so you end up with this nice - isolated - super blob were the text used to be:
Check it out. The 3 markers you see are the centroids of the biggest blobs that I detected on the image. We need to find the 4 corners of the super blob. The biggest blob in the image is what we are after. I decided to re-use the area filter and look for the blob with the biggest area. This is the isolated super blob:
From here, the operations are pretty straightforward. Again, the goal is to get the four corners of this blob. You can fit a rectangle or apply an edge detector followed by Hough transform, to get the straight lines that follow the edges of the super blob.
I decided to apply a Canny Edge detector followed by Hough transform. Of course, I tuned the transform to filter only the possible lines I’m interested in – straight lines above a certain length. This is the result of the line detection:
There's some extra info plotted on the image. The markers you see (red and yellow) are the start/endpoints of the lines. My idea here was to find a bunch of these lines and compute the mean of these points. The idea is that we have a cluster of points that are separated in "quadrants". If we compute the mean of the start and endpoints of each line per quadrant, we will end up with 4 means – and these are the approximate values of the super blob’s corners!
I applied K-means to the start and endpoints of the lines, but you very well prefer other methods of processing. That's ok. My approximate corners are identified by the big red O markers in the above image.
As I suggested, try giving a fixed output position for these corners. I defined the red rectangle for the corners to be mapped on. For this test, I pretty much adjusted the rectangle manually. The perspective correction yields this result:
Some suggestions:
Depending on the resolution of the input image, you could downsize it
for a faster and better result, as your input seems big enough for
that.
Tune Hough Line Detection to yield larger lines. My current
configuration detects some smaller lines and that can hinder the
corner approximation.
I choose a somewhat robust method for calculating the 4 corners of
the super blob that I’ve personally used before (Edge detection +
Hough Line Transform + K-means) but whatever processing chain you
chose to obtain the data is entirely up to you!

Applying Homographies to Remove Perspective Distortion

When calculating Homography, usually the information of the camera should be provided. Is there any straightforward technique to achieve perspective correction without actually having camera's properties?
are there papers for that?

A standard technique is calibration with a target.
To identify a (planar) homography, four points suffice. Take an image of the viewed plane where you place a contrasted rectangle and locate the corners in the image (pixel coordinates). You could do this by image processing or just manually. Then choose the pixel coordinates where you would like the corners to map after correction.
This will allow you to write a system of eight equations in the eight unknown parameters of the homography. Fortunately, this system is easily linearized and the solution is unique.

opencv: reprojectImageTo3d what is the metric unit of the (X,Y,Z) point?

firstly, I wanted to know the metric unit of the 3d point we got from the opencv reprojectImageTo3D() function.
secondly, I have calibrated each camera individually with a chessboard with "mm" as metric unit and then use the opencv functions to calibrate the stereo system, rectify the stereo pair and then compute the disparity map.
Basically i want the distance of a center of a bounding box.
so i compute the disparity map and reproject it to 3D with the reprojectImageTo3D() function and then i take from those 3D points, the one which correspond to the center of the bbox (x, y).
But which image should i use to get the center of bbox? the rectified or the original?
Secondly, is it better to use the same camera model for a stereo system?
Thank you

During the calibration process (calibrateCamera) you have to give the points grid of your calibration target. The unit that you give there will then define the unit for the rest of the process.
When calling reprojectImageTo3D, you probably used the matrix Q output by stereoRectify, which takes in the individual calibrations (cameraMatrix1, cameraMatrix2). That's where the unit came from.
So in your case you get mm I guess.
reprojectImageTo3D has to use the rectified image, since the disparity is calculated using the rectified image (It wouldn't be properly aligned otherwise). Also, when calculating the disparity, it is calculated relative to the first image given (left one in the doc). So you should use the left rectified image if you computed the disparity like this: cv::StereoMatcher::compute(left, right)
I never had two different cameras, but it makes sense to use the same ones. I think that if you have very different color images, edges or any image difference, that could potentially influence the disparity quality.
What is actually very important (unless you are only working with still pictures), is to use cameras that can be synchronized by hardware (e.g. GENLOCK signal: https://en.wikipedia.org/wiki/Genlock). If you have a bit of delay between left and right and a moving subject, the disparity can be wrong. This is also true for the calibration.
Hope this helps!

Image Segmentation based on Pixel Density

I need some help developing some code that segments a binary image into components of a certain pixel density. I've been doing some research in OpenCV algorithms, but before developing my own algorithm to do this, I wanted to ask around to make sure it hasn't been made already.
For instance, in this picture, I have code that imports it as a binary image. However, is there a way to segment objects in the objects from the lines? I would need to segment nodes (corners) and objects (the circle in this case). However, the object does not necessarily have to be a shape.
The solution I thought was to use pixel density. Most of the picture will made up of lines, and the objects have a greater pixel density than that of the line. Is there a way to segment it out?
Below is a working example of the task.
Original Picture:
Resulting Images after Segmentation of Nodes (intersection of multiple lines) and Components (Electronic components like the Resistor or the Voltage Source in the picture)

You can use an integral image to quickly compute the density of black pixels in a rectangular region. Detection of regions with high density can then be performed with a moving window in varying scales. This would be very similar to how face detection works but using only one super-simple feature.
It might be beneficial to make all edges narrow with something like skeletonizing before computing the integral image to make the result insensitive to wide lines.

OpenCV has some functionality for finding contours that is able to put the contours in a hierarchy. It might be what you are looking for. If not, please add some more information about your expected output!

If I understand correctly, you want to detect the lines and the circle in your image, right?
If it is the case, have a look at the Hough line transform and Hough circle transform.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.