I am working on a task to extract the account number from cheque images. My current approach can be divided into 2 steps
Localize account number digits (Printed digits)
Perform OCR using OCR libraries like Tesseract OCR
The second step is straight forward assuming we have properly localized the account number digits
I tried to localize account number digits using OpenCV contours methods and using MSER (Maximally stable extremal regions) but didn’t get useful results. It’s difficult to generalize pattern because
Different bank cheques have variations in template
Account number position is not fixed
How can we approach this problem. Do I have to look for some deep learning based approaches.
Sample Images
Assuming the account number has the unique purple text color, we can use color thresholding. The idea is to convert the image to HSV color space then define a lower/upper color range and perform color thresholding using cv2.inRange(). From here we filter by contour area to remove small noise. Finally we invert the image since we want the text in black with the background in white. One last step is to Gaussian blur the image before throwing it into Pytesseract. Here's the result:
Result from Pytesseract
import numpy as np
import pytesseract
import cv2
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
image = cv2.imread('1.png')
hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
lower = np.array([103,79,60])
upper = np.array([129,255,255])
mask = cv2.inRange(hsv, lower, upper)
cnts = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
area = cv2.contourArea(c)
if area < 10:
cv2.drawContours(mask, [c], -1, (0,0,0), -1)
mask = 255 - mask
mask = cv2.GaussianBlur(mask, (3,3), 0)
data = pytesseract.image_to_string(mask, lang='eng',config='--psm 6')
cv2.imshow('mask', mask)
Thanks, everyone for the suggestions, I ended up training deep learning object detection method to localize Account number and it gave very good results as compared to OpenCV based methods
I'm working on a project to detect text in images. So far I have been able to isolate candidate text regions. I used some threshold values for aspect ratio, contour area and white pixel count inside the bounding box of a counter to remove non text regions. But I cannot give too smaller thresholds for these parameters as there are images with small font sizes. Still there are some non text regions present. I read that Stroke Width Transform is a solution for this problem but it is to complicated. Is there any other method to remove these non text regions?
I thought of using the curve shape of text to distinguish the regions but couldn't think of a way to implement it.
This is a sample image
Identified regions
You can use simple contour area filtering to remove the noise. The idea is to find contours, filter using cv2.contourArea(), and draw the valid contours onto a blank mask. To reconstruct the image without the noise, we bitwise-and the input image with the mask to get our result.
Noise to remove highlighted in green
import cv2
import numpy as np
# Load image, create blank mask, grayscale, Otsu's threshold
image = cv2.imread('1.png')
mask = np.zeros(image.shape, dtype=np.uint8)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
# Find contours and filter using contour area
cnts = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
area = cv2.contourArea(c)
if area > 250:
cv2.drawContours(mask, [c], -1, (255,255,255), -1)
# Bitwise and to reconstruct image
result = cv2.bitwise_and(image, mask)
cv2.imshow('mask', mask)
cv2.imshow('result', result)
Note: If you know that the text will be yellow, another approach would be to use color thresholding to isolate the text. You can use this HSV color thresholder script to determine the lower/upper bounds
So I have applied contouring on a big image and reached the following cropped part of the image:
But now without using any machine learning model, how do I actually get the image to a text variable? I came to know about template matching but I do not understand how do I proceed from here. I do have images of letters and numbers (named according to their image value) stored in a directory, but how do I match each of them and get the text as a string? I don't want to use any ML model or library like pyTesseract.
I would appreciate any help.
The code I have tried for template matching.
def templateMatch(image):
path = "location"
for image_path in os.listdir(path + "/characters-images"):
template = cv2.imread(os.path.join(path, "characters-images", image_path))
template = cv2.cvtColor(template, cv2.COLOR_BGR2GRAY)
template = template.astype(np.uint8)
image = image.astype(np.uint8)
res = cv2.matchTemplate(template, image, cv2.TM_SQDIFF_NORMED)
mn, _, mnLoc, _ = cv2.minMaxLoc(res)
if res is not None:
return image_path.replace(".bmp", "")
def match(image):
plate = ""
# mask = np.zeros(image.shape, dtype=np.uint8)
# print(image.shape)
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# print(image.shape)
# print(image)
thresh = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
cnts = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
(cnts, _) = contours.sort_contours(cnts, method="left-to-right")
for con in cnts:
area = cv2.contourArea(con)
if 800 > area > 200:
x, y, w, h = cv2.boundingRect(con)
# cv2.drawContours(mask, [c], 1, (255, 0, 0), 2)
temp = thresh[y:y+h, x:x+w]
character = templateMatching(temp)
if character is not None:
plate += character
return plate
Template matching is used to locate a object in an image given a template, not to extract text from an image. Matching a template with the position of the object in the image will not help to get the text as a string. For examples on how to apply dynamic scale variant template matching, take a look at how to isolate everything inside of a contour, scale it, and test the similarity to an image? and Python OpenCV line detection to detect X symbol in image. I don't understand why would wouldn't want to use an OCR library. If you want to extract text from the image as a string variable, you should use some type of deep/machine learning. PyTesseract is probably the easiest. Here's a solution using PyTesseract
The idea is to obtain a binary image using Otsu's threshold then perform contour area and aspect ratio filtering to extract the letter/number ROIs. From here we use Numpy slicing to crop each ROI onto a blank mask then apply OCR using Pytesseract. Here's a visualization of each step:
Binary image
Detected ROIs highlighted in green
Isolated ROIs on a blank mask ready for OCR
We use the --psm 6 configuration option to tell Pytesseract to assume a uniform block of text. Look here for more configuration options. Result from Pytesseract:
XS NB 23
import cv2
import numpy as np
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
# Load image, create mask, grayscale, Otsu's threshold
image = cv2.imread('1.png')
mask = np.zeros(image.shape, dtype=np.uint8)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
# Filter for ROI using contour area and aspect ratio
cnts = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
area = cv2.contourArea(c)
peri = cv2.arcLength(c, True)
approx = cv2.approxPolyDP(c, 0.05 * peri, True)
x,y,w,h = cv2.boundingRect(approx)
aspect_ratio = w / float(h)
if area > 2000 and aspect_ratio > .5:
mask[y:y+h, x:x+w] = image[y:y+h, x:x+w]
# Perfrom OCR with Pytesseract
data = pytesseract.image_to_string(mask, lang='eng', config='--psm 6')
cv2.imshow('thresh', thresh)
cv2.imshow('mask', mask)
An option is to consider the bounding box around the characters and to compute the correlation score between a character at hand and those in the training set. You will keep the largest correlation score. (One of SAD, SSD, normalized grayscale correlation or just Hamming distance if your work on a binary image).
You will need to develop a suitable strategy to ensure that the tested characters and the learnt characters have compatible sizes and are properly overlaid.
How do I isolate or crop only the handwritten text using OpenCV and Phyton for the image:
I have tried to use:
but because of the noise (background and dirty in paper) I can't get only the paper.
How do I do this?
To smooth noisy images, typical methods are to apply some type of blurring filter. For instance cv2.GaussianBlur(), cv2.medianBlur(), or cv2.bilaterialFilter() can be used to remove salt/pepper noise. After blurring, we can threshold to obtain a binary image then perform morphological operations. From here, we can find contours and filter using aspect ratio or contour area. To crop the ROI, we can use Numpy slicing
Detected text
Extracted ROI
import cv2
image = cv2.imread('1.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.medianBlur(gray, 5)
thresh = cv2.adaptiveThreshold(blur,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV,11,8)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5))
dilate = cv2.dilate(thresh, kernel, iterations=6)
cnts = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)
for c in cnts:
x,y,w,h = cv2.boundingRect(c)
ROI = image[y:y+h, x:x+w]
cv2.imwrite('ROI.png', ROI)
cv2.imshow('thresh', thresh)
cv2.imshow('dilate', dilate)
cv2.imshow('ROI', ROI)
Convert image to single channel gray.
Apply adaptiveThreshold to your image. Handwriting will become of black color, rest will be white.
If you want to get segmentation for this word as a solid thing, then also apply morphologyEx, with MORPH_CLOSE. Here you should play with kernel, most likely it will be ellipse 3x3, and number of iterations, usually 5-10 iterations is ok.
kernel = cv2.getStructuringElement(shape=cv2.MORPH_ELLIPSE, ksize=(3, 3))
image = cv2.morphologyEx(image, cv2.MORPH_CLOSE, kernel, iterations=7)
Use connectedComponentsWithStats. It will put every char into separate component. stats will hold bounding boxes either for whole word, or (if you omit step #2) it will hold info for each connected characters group.
P.S.: Let me know if you need full code example.
I am trying to do OCR from this toy example of Receipts. Using Python 2.7 and OpenCV 3.1.
Grayscale + Blur + External Edge Detection + Segmentation of each area in the Receipts (for example "Category" to see later which one is marked -in this case cash-).
I find complicated when the image is "skewed" to be able to properly transform and then "automatically" segment each segment of the receipts.
Any suggestion?
The code below is an example to get until the edge detection, but when the receipt is like the first image. My issue is not the Image to text. Is the pre-processing of the image.
Any help more than appreciated! :)
import os;
os.chdir() # Put your own directory
import cv2
import numpy as np
image = cv2.imread("Rent-Receipt.jpg", cv2.IMREAD_GRAYSCALE)
blurred = cv2.GaussianBlur(image, (5, 5), 0)
#blurred = cv2.bilateralFilter(gray,9,75,75)
# apply Canny Edge Detection
edged = cv2.Canny(blurred, 0, 20)
#Find external contour
(_,contours, _) = cv2.findContours(edged, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
A great tutorial on the first step you described is available at pyimagesearch (and they have great tutorials in general)
In short, as described by Ella, you would have to use cv2.CHAIN_APPROX_SIMPLE. A slightly more robust method would be to use cv2.RETR_LIST instead of cv2.RETR_EXTERNAL and then sort the areas, as it should decently work even in white backgrounds/if the page inscribes a bigger shape in the background, etc.
Coming to the second part of your question, a good way to segment the characters would be to use the Maximally stable extremal region extractor available in OpenCV. A complete implementation in CPP is available here in a project I was helping out in recently. The Python implementation would go along the lines of (Code below works for OpenCV 3.0+. For the OpenCV 2.x syntax, check it up online)
import cv2
img = cv2.imread('test.jpg')
mser = cv2.MSER_create()
#Resize the image so that MSER can work better
img = cv2.resize(img, (img.shape[1]*2, img.shape[0]*2))
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
vis = img.copy()
regions = mser.detectRegions(gray)
hulls = [cv2.convexHull(p.reshape(-1, 1, 2)) for p in regions[0]]
cv2.polylines(vis, hulls, 1, (0,255,0))
cv2.namedWindow('img', 0)
cv2.imshow('img', vis)
This gives the output as
Now, to eliminate the false positives, you can simply cycle through the points in hulls, and calculate the perimeter (sum of distance between all adjacent points in hulls[i], where hulls[i] is a list of all points in one convexHull). If the perimeter is too large, classify it as not a character.
The diagnol lines across the image are coming because the border of the image is black. that can simply be removed by adding the following line as soon as the image is read (below line 7)
img = img[5:-5,5:-5,:]
which gives the output
The option on the top of my head requires the extractions of 4 corners of the skewed image. This is done by using cv2.CHAIN_APPROX_SIMPLE instead of cv2.CHAIN_APPROX_NONE when finding contours. Afterwards, you could use cv2.approxPolyDP and hopefully remain with the 4 corners of the receipt (If all your images are like this one then there is no reason why it shouldn't work).
Now use cv2.findHomography and cv2.wardPerspective to rectify the image according to source points which are the 4 points extracted from the skewed image and destination points that should form a rectangle, for example the full image dimensions.
Here you could find code samples and more information:
OpenCV-Geometric Transformations of Images
Also this answer may be useful - SO - Detect and fix text skew
EDIT: Corrected the second chain approx to cv2.CHAIN_APPROX_NONE.
Preprocessing the image by converting the desired text in the foreground to black while turning unwanted background to white can help to improve OCR accuracy. In addition, removing the horizontal and vertical lines can improve results. Here's the preprocessed image after removing unwanted noise such as the horizontal/vertical lines. Note the removed border and table lines
import cv2
# Load in image, convert to grayscale, and threshold
image = cv2.imread('1.jpg')
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Find and remove horizontal lines
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (35,2))
detect_horizontal = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2)
cnts = cv2.findContours(detect_horizontal, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
cv2.drawContours(thresh, [c], -1, (0,0,0), 3)
# Find and remove vertical lines
vertical_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1,35))
detect_vertical = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, vertical_kernel, iterations=2)
cnts = cv2.findContours(detect_vertical, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
cv2.drawContours(thresh, [c], -1, (0,0,0), 3)
# Mask out unwanted areas for result
result = cv2.bitwise_and(image,image,mask=thresh)
result[thresh==0] = (255,255,255)
cv2.imshow('thresh', thresh)
cv2.imshow('result', result)
Try using Stroke Width Transform. Python 3 implementation of the algorithm is present here at SWTloc
EDIT : v2.0.0 onwards
Install the Library
pip install swtloc
Transform The Image
import swtloc as swt
imgpath = 'images/path_to_image.jpeg'
swtl = swt.SWTLocalizer(image_paths=imgpath)
swtImgObj = swtl.swtimages[0]
# Perform SWT Transformation with numba engine
swt_mat = swtImgObj.transformImage(text_mode='lb_df', gaussian_blurr=False,
minimum_stroke_width=3, maximum_stroke_width=12,
Localize Letters
localized_letters = swtImgObj.localizeLetters(minimum_pixels_per_cc=10,
Localize Words
localized_words = swtImgObj.localizeWords(localize_by='bbox')
There are multiple parameters in the of the .transformImage, .localizeLetters and .localizeWords function sthat you can play around with to get the desired results.
Full Disclosure : I am the author of this library
I have been working on a binary image on opencv python. I need to get the largest region. I have used following code, but I am not getting desired output.
edged = cv2.Canny(im_bw, 35, 125)
(cnts, _) = cv2.findContours(edged.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
c = max(cnts, key = cv2.contourArea)
You don't need to use the canny output to do this. Just do findContours on im_bw directly and you should get the desired results. If still not what you want, try to use different threshold values (given that your original image isn't BW itself)
(_, im_bw) = threshold(frame, 100, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
(cnts, _) = cv2.findContours(im_bw.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
c = max(cnts, key = cv2.contourArea)
You really didn't explain what are you looking for,"largest region"? The code you posted will give you the largest contour found but you need to understand what is an OpenCV contour here. Now depending of your image you can have a lot of noise and that makes OpenCV gives you not the "region" you are expecting, so you need to reduce the noise. Before apply the Canny or the threshold you can apply BLUR to the image, EROTION and/or DILATION.
The algorithm should be like this:
Get the frame / image
Grayscale it
Apply Blur / Erode / Dilate to reduce noise
Apply Canny or threshold
Find contours
Get the largest
Do what you need
Here you'll find good documentation in Python.
I am using the scikit-image package of python which measures the area of the islands and chooses the largest area as follows -
import skimage
from skimage import measure
labels_mask = measure.label(input_mask)
regions = measure.regionprops(labels_mask)
regions.sort(key=lambda x: x.area, reverse=True)
if len(regions) > 1:
for rg in regions[1:]:
labels_mask[rg.coords[:,0], rg.coords[:,1]] = 0
labels_mask[labels_mask!=0] = 1
mask = labels_mask
Input image -
Output image -