So I'm trying to create a program that can see what number an image is and print the integer in the console. (I'm using python 3)
For example that the program recognizes that the following image (an actual image the program has to check) is number 2:
I've tried to just compare it with an other image with the 2 in it with cv2.matchTemplate() but each time the blue pixels rgb values are a little bit different for each image and the image could be a bit larger or smaller. for example the following image:
It also has to recognize it apart from al the other blue number images (0-9), for example the following one:
I've tried mulitple match template codes, and make a folder with number 0-9 images as templates, but each time almost every single number is recognized in the number that needs to be recognized. for example number 5 gets recognized in an image that is number 2. And if its doesnt recognize all of them, it recognizes the wrong one(s).
The ones I've tried:
Answer from this question
Both codes from this tutorial
And the one from this tutorial
but like I said before it comes with those problems.
I've also tried to see how much percentage blue is in each image, but those numbers were to close to tell the numbers appart by seeing how much blue was in them.
Does anyone have a solution? Am I being stupid for using cv2.matchTemplate() and is there a much simpler option? (I don't mind using a library for it, because this is part of a bigger piece of code, but I prefer to code it, instead of libraries)
Instead of using Template Matching, a better approach is to use Pytesseract OCR to read the number with image_to_string(). But before performing OCR, you need to preprocess the image. For optimal OCR performance, the preprocessed image should have the desired text/number/characters to OCR in black with the background in white. A simple preprocessing step is to convert the image to grayscale, Otsu's threshold to obtain a binary image, then invert the image. Here's a visualization of the preprocessing step:
Input image -> Grayscale -> Otsu's threshold -> Inverted image ready for OCR
Result from Pytesseract OCR
2
Here's the results with the other images:
2
5
We use the --psm 6 configuration option to assume a single uniform block of text. See here for more configuration options.
Code
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
# Load image, grayscale, Otsu's threshold, then invert
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
invert = 255 - thresh
# Perfrom OCR with Pytesseract
data = pytesseract.image_to_string(invert, lang='eng', config='--psm 6')
print(data)
cv2.imshow('thresh', thresh)
cv2.imshow('invert', invert)
cv2.waitKey()
Note: If you insist on using Template Matching, you need to use scale variant template matching. Take a look at how to isolate everything inside of a contour, scale it, and test the similarity to an image? and Python OpenCV line detection to detect X symbol in image for some examples. If you know for certain that your images are blue, then another approach would be to use color thresholding with cv2.inRange() to obtain a binary mask image then apply OCR on the image.
Given the lovely regular input, I expect that all you need is simple comparison to templates. Since you neglected to supply your code and output, it's hard to tell what might have gone wrong.
Very simply ...
Rescale your input to the size or your templates.
Calculate any straightforward matching evaluation on the input with each of the 10 templates. A simply matching count should suffice: how many pixels match between the two images.
The template with the highest score is the identification.
You might also want to set a lower threshold for declaring a match, perhaps based on how well that template matches each of the other templates: any identification has to clearly exceed the match between two different templates.
If you don't have access to an OCR engine, just know you can build your own OCR system via a KNN classifier. In this example, the implementation should not be very difficult, as you are only classifying numbers. OpenCV provides a very straightforward implementation of KNN.
The classifier is trained using features calculated from samples from known instances of classes. In this case, you have 10 classes (if you are working with digits 0 - 9), so you can prepare a "template" with your digits, extract some features, train the classifier and use it to classify new instances.
All can be done in OpenCV without the need of extra libraries and the KNN (for this kind of application) has a more than acceptable accuracy rate.
Related
I'm trying to extract some contents from a cropped image. I tried pytesseract and opencv template matching but the results are very poor. OpenCV template matching sometimes fails due to poor quality of the icons and tesseract gives me a line of text with false characters.
I'm trying to grab the values like this:
0:26 83 1 1
Any thoughts or techniques?
A technique you could use would be to blur your image. From what it looks like, the image is kind of low res and blurry already, so you wouldn't need to blur the image super hard. Whenever I need to use a blur function in Opencv, I normally choose the gaussian blur, as its technique of blurring each pixel as well as each surrounding pixel is great. Once the image is blurred, I would threshold, or adaptive threshold the image. Once you have gotten this far, the image that should be shown should be mostly hard lines with little bits of short lines mixed between. Afterwards, dilate the threshold image just enough to have the bits where there are a lot of hard edges connect. Once a dilate has been performed, find the contours of that image, and sort based on their height with the image. Since I assume the position of those numbers wont change, you will only have to sort your contours based on the height of the image. Afterwards, once you have sorted your contours, just create bounding boxes over them, and read the text from there.
However, if you want to do this the quick and dirty way, you can always just manually create your own ROI's around each area you want to read and do it that way.
First Method
Gaussian blur the image
Threshold the image
Dilate the image
Find Contours
Sort Contours based on height
Create bounding boxes around relevent contours
Second Method
Manually create ROI's around the area you want to read text from
For a school project I am trying to write a program in Python that tracks the movement of the pupil. In order to do that I am using OpenCV.
After looking up some tutorials on the internet, I noticed that almost everyone is using thresholding to achieve this, since a binary image is necessary for almost every step further down the road (e.g. HoughCircle Transofrmation, Contours). However, from my understanding thresholding is extremly light sensitive, therefore such an approach would only return good results in optimal lightning conditions.
So here comes my question: Is there any alternative or better approach than just Thresholding the image? Or is my understanding of thresholding in OpenCV wrong in the first place?
Here is a example image:
The purpose of thresholding is to segment the desired objects from the background where you can then perform additional processing (applying morphological operations) then perform contour filtering to further isolate the desired objects. Instead of applying image processing techniques on a BGR (3-channel) image or a grayscale (1-channel) image with range [0...255], thresholding allows us to obtain a binary image where every pixel is either 0 or 1 which makes distinguishing objects easier. Depending on your situation, there are many ways obtain a binary image, here are several methods:
cv2.Canny - Canny edge detection which uses a minVal and maxVal to determine edges
cv2.threshold - Simple thresholding with user selected arbitrary global threshold value
cv2.threshold + cv2.THRESH_OTSU - Otsu's thresholding to automatically calculate the threshold value.
cv2.adaptiveThreshold - Adaptive thresholding where the image has different lighting conditions in different areas. Essentially it will automatically calculate the threshold value for different regions of the image and gives better results with images with varying illumination
cv2.inRange - Color segmentation. The idea is to use lower and upper threshold ranges to obtain a binary image. Useful when trying to isolate a single color range
I am trying to extract characters from all fields in forms with boxes such as the one shown here:
Sample printed form
My current approach is as follows:
Crop the field from the form based on some standard format.
Image pre-processing and finding contours around the boxes of fields.
Based on the number of boxes in that field, crop each small box and run character recognition on these cropped character images.
The boxes may be slightly tilted in the images. I use an alignment algorithm but it still does not always straighten the box edges. This can be seen in this image:
Aligned date crop
.
On such images, when I crop characters using straight lines (step 3 of the algorithm mentioned above), the edges of boxes are also included which confuse the character recognition module. For example, the number '3' and 'the box edge' is represented as 31 sometimes.
I want to use pre-trained models only and hence, I am looking for a better way to extract characters from the boxed fields properly.
I would highly appreciate any help provided by the SO community.
As the box edges are generally thinner (as in your case) than the text inside them, we can leverage this information.
By applying a horizontal morphological closing kernel (Dilation -> Erosion), we can make thin vertical lines white, which will help OCR. Some trash may be left after processing but that wouldn't hinder OCR accuracy. The size of the kernel depends on the width of the border lines. Obviously, you can tweak it as per your case.
Here is the sample code:
import cv2
import numpy as np
im = cv2.imread('sample_image.png')
im = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
k1 = (4,1)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, k1)
im = cv2.morphologyEx(im, cv2.MORPH_CLOSE, kernel, iterations=1)
_,im = cv2.threshold(im, thresh=200, maxval=255, type=cv2.THRESH_BINARY)
cv2.imwrite('sample_output.png',im)
And here are the images:
sample_image.png
sample_output.png
I am trying to extract rectangular big boxes from document images with signatures in it. Since i don't have training data (for deep learning), i want to cut rectangular boxes (3 in all images) from these images using OpenCV.
Here is what I tried:
import numpy as np
import cv2
img = cv2.imread('S-0330-444-20012800.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
ret,thresh = cv2.threshold(gray,127,255,1)
contours,h = cv2.findContours(thresh,1,2)
for cnt in contours:
approx = cv2.approxPolyDP(cnt,0.02*cv2.arcLength(cnt,True),True)
if len(approx)==4:
cv2.drawContours(img,[cnt],0,(26,60,232),-1)
cv2.imshow('img',img)
cv2.waitKey(0)
sample image
With the above code, I get a lot of squares (around 152 small points like squares) and of course not the 3 boxes.
Replies appreciated. [sample image is attached]
I would suggest you read up on template matching. There is also a good OpenCV tutorial on this.
For your use case, the idea would be to generate a stereotyped image of a rectangular box with the same shape (width/height ratio) as the boxes found on your documents. Depending on whether your input images show the document always in the same scaling or not, your would need to either resize the inputs to keep their magnification constant, or you would need to operate with a template bank (e.g. an array of box templates in various scalings).
Briefly, you would then cross-correlate the template box(es) with the input image and (in case of well-matched scaling) would find ideally relatively sharp peaks indicating the centers of your document boxes.
In the code above, use image pyramids (to merge unwanted contour noises) and cv2.findContours in combination. Post to that filtering list of Contours based on contour area cv2.contourArea will lead to only bigger squares.
There is also an alternate solution. Looking at images, we can see that the signature text usually is bigger than that of printed text in that ROI. so we can filter out contours smaller than signature contours and extract only the signature.
Its always good to remove noise before using cv2.findContours e.g. dilate, erode, blurring etc.
I would like to check whether a given image is matched with a cropped face image. I tried to crop the face from the image using OpenCV Python. Now i want to check if the input image is a match with the cropped face or not. What methods can I use with OpenCV to achieve this?
For sufficiently small (and not scientifically accurate) purposes, you could use OpenCV's template matching.
Feature extraction and matching may give you more accurate results in many cases. Face detector comes in as a part of OpenCV. Face recognition, however, is a much larger problem altogether.