What I'm doing
I'm trying to process (badly taken) photos of receipts and I'm stuck at warping perspective. My first attempt was to find the corners of the receipt using contour which worked pretty well.
But then I have images like this which part of the receipt was not captured (perhaps blocked by another piece of paper, etc.) so using the corners would yield bad result.
What I tried
I then moved on to line detection using Hough transform. The idea is that receipts usually have a few horizontal lines across. This is what I have so far.
My first thought was to use findHomography using points on two sides as source. To calculate the y-coordinate of the destination points, I'd find the distance between that point and some reference line.
The problem
But then I realized that this is not the correct way, as a line that's exactly halfway between top and bottom in the real receipt wouldn't be half way in the warped image.
Question
So I don't know the locations of the "destination" points, but what I do know is that all these angles between the white and red lines should be 90 degrees. How do I find the transformation matrix in this case?
Related
I get in trouble by finding an algorithm to remove the convexity of my photos. As you can see the photos are captured from book pages, and I wanna remove the convexity. My question is similar to this but what I have is just page boundaries as input and neither I have grid nor am able to find by processing algorithms.
I wanna output as the right one in the below photo.
Obviously, the perspective transformation is the first thing comes in mind. However, as you can see the result is not promising:
Here's a possible pipeline to solve your problem. The main idea is to identify the text, create a super blob of it with some morphology, locate the 4 corners of this super blob and feed the points to a perspective "unwarper" (or rectifier, or whatever you wish to call that perspective correction method).
Start by converting your image to grayscale and apply adaptive thresholding to it. Try the Gaussian or Mean methods with parameters that better fit your tests. This is the result I obtain after fiddling with the values for a bit:
Now, the idea is to isolate just the text. The solution I applied is: obtain the biggest blobs and subtract them from the original image. You're going to need a method to calculate the area of each binary blob. Check this previous post for suggestions on how to implement one.
These are the biggest blobs from the image:
Subtract the largest blobs from the original image. This is the result:
As you can see, the text is almost isolated. Let me clean up the little bits of pixels by applying, again, an area filter. This time to eliminate the small blobs. This is the result:
Very good, some characters are lost during the operation, but that’s ok. We need a nice continuous block of text, because we are gonna dilate the hell of it. I tried applying a rectangular structuring element of size 5 and 5 Op iterations. Erode the output with 5 more iterations afterward, so you end up with this nice - isolated - super blob were the text used to be:
Check it out. The 3 markers you see are the centroids of the biggest blobs that I detected on the image. We need to find the 4 corners of the super blob. The biggest blob in the image is what we are after. I decided to re-use the area filter and look for the blob with the biggest area. This is the isolated super blob:
From here, the operations are pretty straightforward. Again, the goal is to get the four corners of this blob. You can fit a rectangle or apply an edge detector followed by Hough transform, to get the straight lines that follow the edges of the super blob.
I decided to apply a Canny Edge detector followed by Hough transform. Of course, I tuned the transform to filter only the possible lines I’m interested in – straight lines above a certain length. This is the result of the line detection:
There's some extra info plotted on the image. The markers you see (red and yellow) are the start/endpoints of the lines. My idea here was to find a bunch of these lines and compute the mean of these points. The idea is that we have a cluster of points that are separated in "quadrants". If we compute the mean of the start and endpoints of each line per quadrant, we will end up with 4 means – and these are the approximate values of the super blob’s corners!
I applied K-means to the start and endpoints of the lines, but you very well prefer other methods of processing. That's ok. My approximate corners are identified by the big red O markers in the above image.
As I suggested, try giving a fixed output position for these corners. I defined the red rectangle for the corners to be mapped on. For this test, I pretty much adjusted the rectangle manually. The perspective correction yields this result:
Some suggestions:
Depending on the resolution of the input image, you could downsize it
for a faster and better result, as your input seems big enough for
that.
Tune Hough Line Detection to yield larger lines. My current
configuration detects some smaller lines and that can hinder the
corner approximation.
I choose a somewhat robust method for calculating the 4 corners of
the super blob that I’ve personally used before (Edge detection +
Hough Line Transform + K-means) but whatever processing chain you
chose to obtain the data is entirely up to you!
I'm currently trying to write a program that can automatically extract data from some graphs in multiple scanned documents. Mainly by using opencv I would like to detect some features of the graphs in order to convert them into usable data. In the left graph I'm looking for the height of the circle sectors and in the right graph the distance from the center to the points where the dotted lines intersect with the gray area. In both cases I would like to convert these values into numeric data for further usage.
What follows is a step by step plan of how I think my algorithm will work:
Align the image based on the big dotted lines. This way I can ensure that the graphs in all the scanned images will have the exact same positions. After all, it is possible that some images will be slightly tilted or moved in comparison with other images, due to the manual scanning process. Basically I want the coordinate of a pixel in one image to correspond to the exact same pixel in another image.
We now know that the coordinates of the graph centers and the angles for the circle sectors are identical for all images now. For each circle sector, filter the darker pixels from the lighter ones. This is done using the openCV inRange function.
Search for the best fitting segment over the darker pixels in the left graph and search for the best fitting triangle in the right graph. This is done by global optimization.
Return the radius of the optimal segment and return the edge lengths of the optimal triangle. Now we have values that we can use as data.
I have more or less figured out how to do every step, except the first one. I have no clue on how I would go about aligning my images. Does someone might have an idea or a strategy on how to achieve this alignment?
Step 1: canny, it give you perfect long edge. If this is the only part you dont understand, here is the answer. You can adjust the parameter to get the best result. The first will be idea for both line and pie circle. But if you only keen to find pie. change the parameter accordingly to get my 2nd image
The red denotes the doted line. sample from opencv directly
Step 2: local area enhancement/segmentation to find both circles (from image 1 parameter with houghcircle param2 set to 110)
Step 3: Segment the pie out(all the way to the edge of image) and find the median line
Step 4: OCR on the test image pies and find the distance of none-background color along the median line.
Step 5: generate list out and send to csv or sth
I am trying to extract the tiles ( Letters ) placed on a Scrabble Board. The goal is to identify / read all possible words present on the board.
An example image -
Ideally, I would like to find the four corners of the scrabble Board, and apply perspective transform, for further processing.
After Perspective transform -
The algorithm that I am using is as follows -
Apply Adaptive thresholding to the gray scale image of the Scrabble Board.
Dilate / Close the image, find the largest contour in the given image, then find the convex hull, and completely fill the area enclosed by the convex hull.
Find the boundary points ( contour ) of the resultant image, then apply Contour approximation to get the corner points, then apply perspective transform
Corner Points found -
This approach works with images like these. But, as you can see, many square boards have a base, which is curved at the top and the bottom. Sometimes, the base is a big circular board. And with these images my approach fails. Example images and outputs -
Board with Circular base:
Points found using above approach:
I can post more such problematic images, but this image should give you an idea about the problem that I am dealing with. My question is -
How do I find the rectangular board when a circular board is also present in the image?
Some points I would like to state -
I tried using hough lines to detect the lines in the image, find the largest vertical line(s), and then find their intersections to detect the corner points. Unfortunately, because of the tiles, all lines seem to be distorted / disconnected, and hence my attempts have failed.
I have also tried to apply contour approximation to all the contours found in the image ( I was assuming that the large rectangle, too, would be a contour ), but that approach failed as well.
I have implemented the solution in openCV-python. Since the approach is what matters here, and the question was becoming a tad too long, I didn't post the relevant code.
I am willing to share more such problematic images as well, if it is required.
Thank you!
EDIT1
#Silencer's answer has been mighty helpful to me for identifying letters in the image, but I want to accurately find the placement of the words in the image. Hence, I feel identifying the rows and columns is necessary, and I can do that only when a perspective transform is applied to the board.
I wrote an answer on MSER text detection:
Trying to Plot OpenCV's MSER regions using matplotlib
The code generate the following results on your images.
You can have a try.
I think #silencer has already given quite promising solution.
But to perform perspective transform as you have mentioned that you have already tried with hough lines to find the largest rectangle but it fails because for tiles present.
Given you have large image data set may be more than 1000 images, you can also give a shot to Deep learning based approach where you can train a model with images as input and corresponding rectangle boundary points coordinate as outputs.
So I've been struggling with this problem for a while so I would appreciate it if somebody helped me out with this.
I'm trying to create a physical robot that solves a puzzle. The image of the completed puzzle will be provided along with a picture of scattered pieces
Scattered piece picture
I've gotten opencv to find contours and single out each piece and rotate them so they are all parallel to the horizontal axes (all "diamond" or "diagonal" pieces are rotated so they look like squares)
I've been using SIFT to match a bunch of small square pieces to the complete picture.
Comparing an un-rotated square piece to the full picture
The problem is this is not in the correct orientation. How would I go about finding out whether I need to rotate 90, 180, 270 degrees?
Another problem I have is to determine which quadrant (non-adrant?) the piece is in. For example, this piece belongs to the bottom right corner. Is there a function that identifies the majority of similar keypoints and then classify into one of the nine regions?
Since SIFT are designed to be rotation-invariant, it is a good thing that the feature matches even though you have a rotation.
To determine how much rotation you need, you generally need to have your camera calibration parameter in order to unproject the picture into a view that is top-down. For your robot, it looks like the pictures are already top-down.
If this assumption holds, you can perform a regression to figure out what angle you need to rotate your piece. If you also know that your pieces are always square, you only have 4 choices to choose from. In that case, you can try all 4 and see which one is "closest" to your extracted patch (matched via SIFT to the big picture).
Determining the quadrant the matched piece is in can be done by looking at the coordinates of the matched points. Their distance to the corners should be what you need.
I need some help developing some code that segments a binary image into components of a certain pixel density. I've been doing some research in OpenCV algorithms, but before developing my own algorithm to do this, I wanted to ask around to make sure it hasn't been made already.
For instance, in this picture, I have code that imports it as a binary image. However, is there a way to segment objects in the objects from the lines? I would need to segment nodes (corners) and objects (the circle in this case). However, the object does not necessarily have to be a shape.
The solution I thought was to use pixel density. Most of the picture will made up of lines, and the objects have a greater pixel density than that of the line. Is there a way to segment it out?
Below is a working example of the task.
Original Picture:
Resulting Images after Segmentation of Nodes (intersection of multiple lines) and Components (Electronic components like the Resistor or the Voltage Source in the picture)
You can use an integral image to quickly compute the density of black pixels in a rectangular region. Detection of regions with high density can then be performed with a moving window in varying scales. This would be very similar to how face detection works but using only one super-simple feature.
It might be beneficial to make all edges narrow with something like skeletonizing before computing the integral image to make the result insensitive to wide lines.
OpenCV has some functionality for finding contours that is able to put the contours in a hierarchy. It might be what you are looking for. If not, please add some more information about your expected output!
If I understand correctly, you want to detect the lines and the circle in your image, right?
If it is the case, have a look at the Hough line transform and Hough circle transform.