Align scanned documents based on a reference point, using openCV - python

I'm currently trying to write a program that can automatically extract data from some graphs in multiple scanned documents. Mainly by using opencv I would like to detect some features of the graphs in order to convert them into usable data. In the left graph I'm looking for the height of the circle sectors and in the right graph the distance from the center to the points where the dotted lines intersect with the gray area. In both cases I would like to convert these values into numeric data for further usage.
What follows is a step by step plan of how I think my algorithm will work:
Align the image based on the big dotted lines. This way I can ensure that the graphs in all the scanned images will have the exact same positions. After all, it is possible that some images will be slightly tilted or moved in comparison with other images, due to the manual scanning process. Basically I want the coordinate of a pixel in one image to correspond to the exact same pixel in another image.
We now know that the coordinates of the graph centers and the angles for the circle sectors are identical for all images now. For each circle sector, filter the darker pixels from the lighter ones. This is done using the openCV inRange function.
Search for the best fitting segment over the darker pixels in the left graph and search for the best fitting triangle in the right graph. This is done by global optimization.
Return the radius of the optimal segment and return the edge lengths of the optimal triangle. Now we have values that we can use as data.
I have more or less figured out how to do every step, except the first one. I have no clue on how I would go about aligning my images. Does someone might have an idea or a strategy on how to achieve this alignment?

Step 1: canny, it give you perfect long edge. If this is the only part you dont understand, here is the answer. You can adjust the parameter to get the best result. The first will be idea for both line and pie circle. But if you only keen to find pie. change the parameter accordingly to get my 2nd image
The red denotes the doted line. sample from opencv directly
Step 2: local area enhancement/segmentation to find both circles (from image 1 parameter with houghcircle param2 set to 110)
Step 3: Segment the pie out(all the way to the edge of image) and find the median line
Step 4: OCR on the test image pies and find the distance of none-background color along the median line.
Step 5: generate list out and send to csv or sth

Related

Skewing text - How to take advantage of existing edges

I have the following JPG image. If I want to find the edges where the white page meets the black background. So I can rotate the contents a few degrees clockwise. My aim is to straighten the text for using with Tesseract OCR conversion. I don't see the need to rotate the text blocks as I have seen in similar examples.
In the docs Canny Edge Detection the third arg 200 eg edges = cv.Canny(img,100,200) is maxVal and said to be 'sure to be edges'. Is there anyway to determine these (max/min) values ahead of any trial & error approach?
I have used code examples which utilize the Python cv2 module. But the edge detection is set up for simpler applications.
Is there any approach I can use to take the text out of the equation. For example: only detecting edge lines greater than a specified length?
Any suggestions would be appreciated.
Below is an example of edge detection (above image same min/max values) The outer edge of the page is clearly defined. The image is high contrast b/w. It has even lighting. I can't see a need for the use of an adaptive threshold. Simple global is working. Its just at what ratio to use it.
I don't have the answer to this yet. But to add. I now have the contours of the above doc.
I used find contours tutorial with some customization of the file loading. Note: removing words gives a thinner/cleaner outline.
Consider Otsu.
Its chief virtue is that it is adaptive to local
illumination within the image.
In your case, blank margins might be the saving grace.
Consider working on a series of 2x reduced resolution images,
where new pixel is min() (or even max()!) of original four pixels.
These reduced images might help you to focus on the features
that matter for your use case.
The usual way to deskew scanned text is to binarize and
then keep changing theta until "sum of pixels across raster"
is zero, or small. In particular, with few descenders
and decent inter-line spacing, we will see "lots" of pixels
on each line of text and "near zero" between text lines,
when theta matches the original printing orientation.
Which lets us recover (1.) pixels per line, and (2.) inter-line spacing, assuming we've found a near-optimal theta.
In your particular case, focusing on the ... leader dots
seems a promising approach to finding the globally optimal
deskew correction angle. Discarding large rectangles of
pixels in the left and right regions of the image could
actually reduce noise and enhance the accuracy of
such an approach.

Calculating positions of objects as (x,y) on a known platform (opencv-python)

I have a platform which I know the sizes. I would like to get the positions of objects placed on it as (x,y) while looking through the webcam, the origin being the top-left corner of the platform. However, I can only look through from a low angle: example
I detect the objects using the otsu threshold. I want to use the bottom edge of the bounding rectangles, then proportion it accordingly concerning the corners (the best I can think of), but I don't know how to implement it. I tried warp perspective but it enlarges the objects too much. image with threshold // attempt of warp perspective
Any help or suggestion would be appreciated.
Don't use warp perspective to transform the image to make the table cover the complete image as you did here.
While performing perspective transformations in image processing, try not to transform the image too much.
Below is the image with your table marked with red trapezium that you transformed.
Now try to transform it into a perfect rectangle but you do not want to transform it too much as you did. One way is to transform the trapezium to a rectangle by simply adjusting the shorter edge's vertices to come directly above the lower edge's vertices as shown in the image below with green.
This way, things far from the camera will be skewed wrt width only a little. This will give better results. Another even better way would be to decrease the size of the lower edge a little and increase the size of the upper edge a little. This will evenly skew objects kept over the table as shown below.
Now, as you know the real dimensions of the table and the dimensions of the rectangle in the image, you can do the mapping. Using this, you can determine the exact position of the objects kept on the table.

Find homography knowing the "destination angles" and not locations of points

What I'm doing
I'm trying to process (badly taken) photos of receipts and I'm stuck at warping perspective. My first attempt was to find the corners of the receipt using contour which worked pretty well.
But then I have images like this which part of the receipt was not captured (perhaps blocked by another piece of paper, etc.) so using the corners would yield bad result.
What I tried
I then moved on to line detection using Hough transform. The idea is that receipts usually have a few horizontal lines across. This is what I have so far.
My first thought was to use findHomography using points on two sides as source. To calculate the y-coordinate of the destination points, I'd find the distance between that point and some reference line.
The problem
But then I realized that this is not the correct way, as a line that's exactly halfway between top and bottom in the real receipt wouldn't be half way in the warped image.
Question
So I don't know the locations of the "destination" points, but what I do know is that all these angles between the white and red lines should be 90 degrees. How do I find the transformation matrix in this case?

Finding Corner points of Scrabble Board in an image

I am trying to extract the tiles ( Letters ) placed on a Scrabble Board. The goal is to identify / read all possible words present on the board.
An example image -
Ideally, I would like to find the four corners of the scrabble Board, and apply perspective transform, for further processing.
After Perspective transform -
The algorithm that I am using is as follows -
Apply Adaptive thresholding to the gray scale image of the Scrabble Board.
Dilate / Close the image, find the largest contour in the given image, then find the convex hull, and completely fill the area enclosed by the convex hull.
Find the boundary points ( contour ) of the resultant image, then apply Contour approximation to get the corner points, then apply perspective transform
Corner Points found -
This approach works with images like these. But, as you can see, many square boards have a base, which is curved at the top and the bottom. Sometimes, the base is a big circular board. And with these images my approach fails. Example images and outputs -
Board with Circular base:
Points found using above approach:
I can post more such problematic images, but this image should give you an idea about the problem that I am dealing with. My question is -
How do I find the rectangular board when a circular board is also present in the image?
Some points I would like to state -
I tried using hough lines to detect the lines in the image, find the largest vertical line(s), and then find their intersections to detect the corner points. Unfortunately, because of the tiles, all lines seem to be distorted / disconnected, and hence my attempts have failed.
I have also tried to apply contour approximation to all the contours found in the image ( I was assuming that the large rectangle, too, would be a contour ), but that approach failed as well.
I have implemented the solution in openCV-python. Since the approach is what matters here, and the question was becoming a tad too long, I didn't post the relevant code.
I am willing to share more such problematic images as well, if it is required.
Thank you!
EDIT1
#Silencer's answer has been mighty helpful to me for identifying letters in the image, but I want to accurately find the placement of the words in the image. Hence, I feel identifying the rows and columns is necessary, and I can do that only when a perspective transform is applied to the board.
I wrote an answer on MSER text detection:
Trying to Plot OpenCV's MSER regions using matplotlib
The code generate the following results on your images.
You can have a try.
I think #silencer has already given quite promising solution.
But to perform perspective transform as you have mentioned that you have already tried with hough lines to find the largest rectangle but it fails because for tiles present.
Given you have large image data set may be more than 1000 images, you can also give a shot to Deep learning based approach where you can train a model with images as input and corresponding rectangle boundary points coordinate as outputs.

Image Segmentation based on Pixel Density

I need some help developing some code that segments a binary image into components of a certain pixel density. I've been doing some research in OpenCV algorithms, but before developing my own algorithm to do this, I wanted to ask around to make sure it hasn't been made already.
For instance, in this picture, I have code that imports it as a binary image. However, is there a way to segment objects in the objects from the lines? I would need to segment nodes (corners) and objects (the circle in this case). However, the object does not necessarily have to be a shape.
The solution I thought was to use pixel density. Most of the picture will made up of lines, and the objects have a greater pixel density than that of the line. Is there a way to segment it out?
Below is a working example of the task.
Original Picture:
Resulting Images after Segmentation of Nodes (intersection of multiple lines) and Components (Electronic components like the Resistor or the Voltage Source in the picture)
You can use an integral image to quickly compute the density of black pixels in a rectangular region. Detection of regions with high density can then be performed with a moving window in varying scales. This would be very similar to how face detection works but using only one super-simple feature.
It might be beneficial to make all edges narrow with something like skeletonizing before computing the integral image to make the result insensitive to wide lines.
OpenCV has some functionality for finding contours that is able to put the contours in a hierarchy. It might be what you are looking for. If not, please add some more information about your expected output!
If I understand correctly, you want to detect the lines and the circle in your image, right?
If it is the case, have a look at the Hough line transform and Hough circle transform.

Categories