Detect rectanglular signature fields in document scans using OpenCV

Detect rectanglular signature fields in document scans using OpenCV - python

I am trying to extract rectangular big boxes from document images with signatures in it. Since i don't have training data (for deep learning), i want to cut rectangular boxes (3 in all images) from these images using OpenCV.
Here is what I tried:
import numpy as np
import cv2
img = cv2.imread('S-0330-444-20012800.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
ret,thresh = cv2.threshold(gray,127,255,1)
contours,h = cv2.findContours(thresh,1,2)
for cnt in contours:
approx = cv2.approxPolyDP(cnt,0.02*cv2.arcLength(cnt,True),True)
if len(approx)==4:
cv2.drawContours(img,[cnt],0,(26,60,232),-1)
cv2.imshow('img',img)
cv2.waitKey(0)
sample image
With the above code, I get a lot of squares (around 152 small points like squares) and of course not the 3 boxes.
Replies appreciated. [sample image is attached]

I would suggest you read up on template matching. There is also a good OpenCV tutorial on this.
For your use case, the idea would be to generate a stereotyped image of a rectangular box with the same shape (width/height ratio) as the boxes found on your documents. Depending on whether your input images show the document always in the same scaling or not, your would need to either resize the inputs to keep their magnification constant, or you would need to operate with a template bank (e.g. an array of box templates in various scalings).
Briefly, you would then cross-correlate the template box(es) with the input image and (in case of well-matched scaling) would find ideally relatively sharp peaks indicating the centers of your document boxes.

In the code above, use image pyramids (to merge unwanted contour noises) and cv2.findContours in combination. Post to that filtering list of Contours based on contour area cv2.contourArea will lead to only bigger squares.
There is also an alternate solution. Looking at images, we can see that the signature text usually is bigger than that of printed text in that ROI. so we can filter out contours smaller than signature contours and extract only the signature.
Its always good to remove noise before using cv2.findContours e.g. dilate, erode, blurring etc.

Related

Python Tesseract figuring out text orientation/transformation

# Tesseract Win-Installer https://github.com/UB-Mannheim/tesseract/wiki
import pytesseract
import cv2
image = cv2.imread("img.png")
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
text = pytesseract.image_to_string(image)
print(text)
the output is not even close "WX017," instead of "MX011A"
however, if I manually rearrange the characters it works. I could transform the input image and define an ROI but the orientation could be anything. It could be upside down as well.
I want to recognize curved text around a circle
1:
2:
3:

This is pretty difficult, as tesseract expects text to be minimally distorted.
One (admittedly far-fetched) possibility would be to try and detect the circle, then map it onto a rectangle.
To do this, you can reduce each letter to a non-connected blob using a blur filter and discarding grey values below a threshold; iterate until you get more or less circular blobs, then get their centers. Take several of those three by three at random, and for each triplet calculate the center of the circle encompassing all three. The average of those centers should be more or less the center of the lettering circle.
Having the center, and the approximate radius, it is relatively easy to map the circular crown of appropriate height to a rectangle (e.g. using polar-to-cartesian transform).
You then apply tesseract to the transformed rectangle.
It should also be possible to use autocorrelation to average and sharpen a multiple identical text along said rectangle (i.e. "MX011A MXOI1A MX017A" --> "MX011A").

Extract Data from an Image with Python/OpenCV/Tesseract?

I'm trying to extract some contents from a cropped image. I tried pytesseract and opencv template matching but the results are very poor. OpenCV template matching sometimes fails due to poor quality of the icons and tesseract gives me a line of text with false characters.
I'm trying to grab the values like this:
0:26 83 1 1
Any thoughts or techniques?

A technique you could use would be to blur your image. From what it looks like, the image is kind of low res and blurry already, so you wouldn't need to blur the image super hard. Whenever I need to use a blur function in Opencv, I normally choose the gaussian blur, as its technique of blurring each pixel as well as each surrounding pixel is great. Once the image is blurred, I would threshold, or adaptive threshold the image. Once you have gotten this far, the image that should be shown should be mostly hard lines with little bits of short lines mixed between. Afterwards, dilate the threshold image just enough to have the bits where there are a lot of hard edges connect. Once a dilate has been performed, find the contours of that image, and sort based on their height with the image. Since I assume the position of those numbers wont change, you will only have to sort your contours based on the height of the image. Afterwards, once you have sorted your contours, just create bounding boxes over them, and read the text from there.
However, if you want to do this the quick and dirty way, you can always just manually create your own ROI's around each area you want to read and do it that way.
First Method
Gaussian blur the image
Threshold the image
Dilate the image
Find Contours
Sort Contours based on height
Create bounding boxes around relevent contours
Second Method
Manually create ROI's around the area you want to read text from

Extracting data from graphs in a scanned document

EDIT: This is a deeper explanation of a question I asked earlier, which is still not solved for me.
I'm currently trying to write some code that can extract data from some uncommon graphs in a book. I scanned the pages of the book, and by using opencv I would like to detect some features ofthe graphs in order to convert them into useable data. In the left graph I'm looking for the height of the "triangles" and in the right graph the distance from the center to the points where the dotted lines intersect with the gray area. In both cases I would like to convert these values into numeric data for further usage.
For the left graph, I thought of detecting all the individual colors and computing the area of each sector by counting the amount of pixels in that color. When I have the area of these sectors, I can easily calculate their heights, using basic math. The following code snippet shows how far I've gotten already with identifying different colors. However I can't manage to make this work accurately. It always seems to detect some colors of other sectors as well, or not detect all pixels of one sector. I think it has something to do with the boundaries I'm using. I can't quite figure out how to make them work. Does someone know how I can determine these values?
import numpy as np
import cv2
img = cv2.imread('images/test2.jpg')
lower = np.array([0,0,100])
upper = np.array([50,56,150])
mask = cv2.inRange(img, lower, upper)
output = cv2.bitwise_and(img, img, mask = mask)
cv2.imshow('img', img)
cv2.imshow('mask', mask)
cv2.imshow('output', output)
cv2.waitKey(0)
cv2.destroyAllWindows()
For the right graph, I still have no idea how to extract data from it. I thought of identifying the center by detecting all the dotted lines, and then by detecting the intersections of these dotted lines with the gray area, I could measure the distance between the center and these intersections. However I couldn't yet figure out how to do this properly, since it sounds quite complex. The following code snippet shows how far I've gotten with the line detection. Also in this case the detection is far from accurate. Does someone have an idea how to tackle this problem?
import numpy as np
import cv2
# Reading the image
img = cv2.imread('test2.jpg')
# Convert the image to grayscale
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# Apply edge detection
edges = cv2.Canny(gray,50,150,apertureSize = 3)
# Line detection
lines = cv2.HoughLinesP(edges,1,np.pi/180,100,minLineLength=50,maxLineGap=20)
for line in lines:
x1,y1,x2,y2 = line[0]
cv2.line(img,(x1,y1),(x2,y2),(0,0,255),2)
cv2.imwrite('linesDetected.jpg',img)

For the left image, using your approach, try to look at the RGB histogram, the colors should be significant peaks, if you would like to use the relative area of the segments.
Another alternative could be to use Hough Circle Transform, which should work on circle segments. See also here.
For the right image ... let me think ...
You could create a "empty" diagram with no data inside. You know the locations of the circle segment ("cake pieces"). Then you could identify the area where the data is (the dark ones), either by using a grey threshold, an RGB threshold, or Find Contours or look for Watershed / Distance Transform.
In the end the idea is to make a boolean overlay between the cleared image and the segments (your data) that was found. Then you can identify which share of your circle segments is covered, or knowing the center, find the farthest point from the center.

Extraction of characters from forms with boxed field inputs

I am trying to extract characters from all fields in forms with boxes such as the one shown here:
Sample printed form
My current approach is as follows:
Crop the field from the form based on some standard format.
Image pre-processing and finding contours around the boxes of fields.
Based on the number of boxes in that field, crop each small box and run character recognition on these cropped character images.
The boxes may be slightly tilted in the images. I use an alignment algorithm but it still does not always straighten the box edges. This can be seen in this image:
Aligned date crop
.
On such images, when I crop characters using straight lines (step 3 of the algorithm mentioned above), the edges of boxes are also included which confuse the character recognition module. For example, the number '3' and 'the box edge' is represented as 31 sometimes.
I want to use pre-trained models only and hence, I am looking for a better way to extract characters from the boxed fields properly.
I would highly appreciate any help provided by the SO community.

As the box edges are generally thinner (as in your case) than the text inside them, we can leverage this information.
By applying a horizontal morphological closing kernel (Dilation -> Erosion), we can make thin vertical lines white, which will help OCR. Some trash may be left after processing but that wouldn't hinder OCR accuracy. The size of the kernel depends on the width of the border lines. Obviously, you can tweak it as per your case.
Here is the sample code:
import cv2
import numpy as np
im = cv2.imread('sample_image.png')
im = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
k1 = (4,1)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, k1)
im = cv2.morphologyEx(im, cv2.MORPH_CLOSE, kernel, iterations=1)
_,im = cv2.threshold(im, thresh=200, maxval=255, type=cv2.THRESH_BINARY)
cv2.imwrite('sample_output.png',im)
And here are the images:
sample_image.png
sample_output.png

Opencv - Extracting data from in-game images

I need some help with an OpenCV project I'm working on. I'm taking images from a computer game (in this case, Fortnite), and I would like to extract different elements from them, eg. timer value, quantities of materials, health and shield etc.
Currently I perform a series of image preprocessing functions until I get a binary image, followed by locating the contours in the image and then sending those contours to a machine learning algorithm (K-Nearest-Neighbours).
I am able to succeed in a lot of cases, but there are some images where I don't manage to find some of the contours, therefore I don't find all the data.
An important thing to note is that I use the same preprocessing pipeline for all images, because I'm looking for as robust of a solution that I can manage.
I would like to know what I can do to improve the performance of my program. -
Is KNN a good model for this sort of task, or are there other models that might give me better results?
Is there any way to recognise characters without locating contours?
How can I make my preprocessing pipeline as robust as possible, given the fact that there is a lot of variance in the background across all images?
My goal is to process the images as fast as possible, starting out with a minimum of at least 2 images per second.
Thanks in advance for any help or advice you can give me!
Here is an example image before preprocessing
Here is the image after preprocessing, in this example I cannot find the contour for the 4 on the right side.

Quite simply, enlarging the image might help, since it increases the dark border of the number.
I threw together some code that does that. The result could be improved, but my point here is to show that the 4 can now be detected as a contour. To increase efficiency I only selected contours within a certain size.
Also, since it is part of the HUD, that usually means that the location on screen is always the same. If so, you can get great performance increase by only selecting the area with values (described here) - as I have done manually.
Finally, since the numbers have a consistent shape, you could try matchShapes as an alternative to kNN to recognize the numbers. I don't know how they compare in performance though, so you'll have to try that out yourself.
Result:
Code:
import numpy as np
import cv2
# load image
img = cv2.imread("fn2.JPG")
# enlarge image
img = cv2.resize(img,None,fx=4, fy=4, interpolation = cv2.INTER_CUBIC)
# convert to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
# create mask using threshold
ret,mask = cv2.threshold(gray,200,255,cv2.THRESH_BINARY)
# find contours in mask
im, contours, hierarchy = cv2.findContours(mask, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
# draw contour on image
for cnt in contours:
if cv2.contourArea(cnt) < 3000 and cv2.contourArea(cnt) > 200:
cv2.drawContours(img, [cnt], 0, (255,0,0), 2)
#show images
cv2.imshow("Mask", mask)
cv2.imshow("Image", img)
cv2.waitKey(0)
cv2.destroyAllWindows()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.