I am trying to do OCR from this toy example of Receipts. Using Python 2.7 and OpenCV 3.1.
Grayscale + Blur + External Edge Detection + Segmentation of each area in the Receipts (for example "Category" to see later which one is marked -in this case cash-).
I find complicated when the image is "skewed" to be able to properly transform and then "automatically" segment each segment of the receipts.
Example:
Any suggestion?
The code below is an example to get until the edge detection, but when the receipt is like the first image. My issue is not the Image to text. Is the pre-processing of the image.
Any help more than appreciated! :)
import os;
os.chdir() # Put your own directory
import cv2
import numpy as np
image = cv2.imread("Rent-Receipt.jpg", cv2.IMREAD_GRAYSCALE)
blurred = cv2.GaussianBlur(image, (5, 5), 0)
#blurred = cv2.bilateralFilter(gray,9,75,75)
# apply Canny Edge Detection
edged = cv2.Canny(blurred, 0, 20)
#Find external contour
(_,contours, _) = cv2.findContours(edged, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
A great tutorial on the first step you described is available at pyimagesearch (and they have great tutorials in general)
In short, as described by Ella, you would have to use cv2.CHAIN_APPROX_SIMPLE. A slightly more robust method would be to use cv2.RETR_LIST instead of cv2.RETR_EXTERNAL and then sort the areas, as it should decently work even in white backgrounds/if the page inscribes a bigger shape in the background, etc.
Coming to the second part of your question, a good way to segment the characters would be to use the Maximally stable extremal region extractor available in OpenCV. A complete implementation in CPP is available here in a project I was helping out in recently. The Python implementation would go along the lines of (Code below works for OpenCV 3.0+. For the OpenCV 2.x syntax, check it up online)
import cv2
img = cv2.imread('test.jpg')
mser = cv2.MSER_create()
#Resize the image so that MSER can work better
img = cv2.resize(img, (img.shape[1]*2, img.shape[0]*2))
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
vis = img.copy()
regions = mser.detectRegions(gray)
hulls = [cv2.convexHull(p.reshape(-1, 1, 2)) for p in regions[0]]
cv2.polylines(vis, hulls, 1, (0,255,0))
cv2.namedWindow('img', 0)
cv2.imshow('img', vis)
while(cv2.waitKey()!=ord('q')):
continue
cv2.destroyAllWindows()
This gives the output as
Now, to eliminate the false positives, you can simply cycle through the points in hulls, and calculate the perimeter (sum of distance between all adjacent points in hulls[i], where hulls[i] is a list of all points in one convexHull). If the perimeter is too large, classify it as not a character.
The diagnol lines across the image are coming because the border of the image is black. that can simply be removed by adding the following line as soon as the image is read (below line 7)
img = img[5:-5,5:-5,:]
which gives the output
The option on the top of my head requires the extractions of 4 corners of the skewed image. This is done by using cv2.CHAIN_APPROX_SIMPLE instead of cv2.CHAIN_APPROX_NONE when finding contours. Afterwards, you could use cv2.approxPolyDP and hopefully remain with the 4 corners of the receipt (If all your images are like this one then there is no reason why it shouldn't work).
Now use cv2.findHomography and cv2.wardPerspective to rectify the image according to source points which are the 4 points extracted from the skewed image and destination points that should form a rectangle, for example the full image dimensions.
Here you could find code samples and more information:
OpenCV-Geometric Transformations of Images
Also this answer may be useful - SO - Detect and fix text skew
EDIT: Corrected the second chain approx to cv2.CHAIN_APPROX_NONE.
Preprocessing the image by converting the desired text in the foreground to black while turning unwanted background to white can help to improve OCR accuracy. In addition, removing the horizontal and vertical lines can improve results. Here's the preprocessed image after removing unwanted noise such as the horizontal/vertical lines. Note the removed border and table lines
import cv2
# Load in image, convert to grayscale, and threshold
image = cv2.imread('1.jpg')
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Find and remove horizontal lines
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (35,2))
detect_horizontal = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2)
cnts = cv2.findContours(detect_horizontal, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
cv2.drawContours(thresh, [c], -1, (0,0,0), 3)
# Find and remove vertical lines
vertical_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1,35))
detect_vertical = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, vertical_kernel, iterations=2)
cnts = cv2.findContours(detect_vertical, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
cv2.drawContours(thresh, [c], -1, (0,0,0), 3)
# Mask out unwanted areas for result
result = cv2.bitwise_and(image,image,mask=thresh)
result[thresh==0] = (255,255,255)
cv2.imshow('thresh', thresh)
cv2.imshow('result', result)
cv2.waitKey()
Try using Stroke Width Transform. Python 3 implementation of the algorithm is present here at SWTloc
EDIT : v2.0.0 onwards
Install the Library
pip install swtloc
Transform The Image
import swtloc as swt
imgpath = 'images/path_to_image.jpeg'
swtl = swt.SWTLocalizer(image_paths=imgpath)
swtImgObj = swtl.swtimages[0]
# Perform SWT Transformation with numba engine
swt_mat = swtImgObj.transformImage(text_mode='lb_df', gaussian_blurr=False,
minimum_stroke_width=3, maximum_stroke_width=12,
maximum_angle_deviation=np.pi/2)
Localize Letters
localized_letters = swtImgObj.localizeLetters(minimum_pixels_per_cc=10,
localize_by='min_bbox')
Localize Words
localized_words = swtImgObj.localizeWords(localize_by='bbox')
There are multiple parameters in the of the .transformImage, .localizeLetters and .localizeWords function sthat you can play around with to get the desired results.
Full Disclosure : I am the author of this library
Related
I am planning to split the questions from this PDF document. The challenge is that the questions are not orderly spaced. For example the first question occupies an entire page, second also the same while the third and fourth together make up one page. If I have to manually slice it, it will be ages. So, I thought to split it up into images and work on them. Is there a possibility to take image as this
and split into individual components like this?
This is a classic situation for dilate. The idea is that adjacent text corresponds with the same question while text that is farther away is part of another question. Whenever you want to connect multiple items together, you can dilate them to join adjacent contours into a single contour. Here's a simple approach:
Obtain binary image. Load the image, convert to grayscale, Gaussian blur, then Otsu's threshold to obtain a binary image.
Remove small noise and artifacts. We create a rectangular kernel and morph open to remove small noise and artifacts in the image.
Connect adjacent words together. We create a larger rectangular kernel and dilate to merge individual contours together.
Detect questions. From here we find contours, sort contours from top-to-bottom using imutils.sort_contours(), filter with a minimum contour area, obtain the rectangular bounding rectangle coordinates and highlight the rectangular contours. We then crop each question using Numpy slicing and save the ROI image.
Otsu's threshold to obtain a binary image
Here's where the interesting section happens. We assume that adjacent text/characters are part of the same question so we merge individual words into a single contour. A question is a section of words that are close together so we dilate to connect them all together.
Individual questions highlighted in green
Top question
Bottom question
Saved ROI questions (assumption is from top-to-bottom)
Code
import cv2
from imutils import contours
# Load image, grayscale, Gaussian blur, Otsu's threshold
image = cv2.imread('1.png')
original = image.copy()
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (7,7), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Remove small artifacts and noise with morph open
open_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5))
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, open_kernel, iterations=1)
# Create rectangular structuring element and dilate
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (9,9))
dilate = cv2.dilate(opening, kernel, iterations=4)
# Find contours, sort from top to bottom, and extract each question
cnts = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
(cnts, _) = contours.sort_contours(cnts, method="top-to-bottom")
# Get bounding box of each question, crop ROI, and save
question_number = 0
for c in cnts:
# Filter by area to ensure its not noise
area = cv2.contourArea(c)
if area > 150:
x,y,w,h = cv2.boundingRect(c)
cv2.rectangle(image, (x, y), (x + w, y + h), (36,255,12), 2)
question = original[y:y+h, x:x+w]
cv2.imwrite('question_{}.png'.format(question_number), question)
question_number += 1
cv2.imshow('thresh', thresh)
cv2.imshow('dilate', dilate)
cv2.imshow('image', image)
cv2.waitKey()
We may solve it using (mostly) morphological operations:
Read the input image as grayscale.
Apply thresholding with inversion.
Automatic thresholding using cv2.THRESH_OTSU is working well.
Apply opening morphological operation for removing small artifacts (using the kernel np.ones(1, 3))
Dilate horizontally with very long horizontal kernel - make horizontal lines out of the text lines.
Apply closing vertically - create two large clusters.
The size of the vertical kernel should be tuned according to the typical gap.
Finding connected components with statistics.
Iterate the connected components and crop the relevant area in the vertical direction.
Complete code sample:
import cv2
import numpy as np
img = cv2.imread('scanned_image.png', cv2.IMREAD_GRAYSCALE) # Read image as grayscale
thesh = cv2.threshold(img, 0, 255, cv2.THRESH_OTSU + cv2.THRESH_BINARY_INV)[1] # Apply automatic thresholding with inversion.
thesh = cv2.morphologyEx(thesh, cv2.MORPH_OPEN, np.ones((1, 3), np.uint8)) # Apply opening morphological operation for removing small artifacts.
thesh = cv2.dilate(thesh, np.ones((1, img.shape[1]), np.uint8)) # Dilate horizontally - make horizontally lines out of the text.
thesh = cv2.morphologyEx(thesh, cv2.MORPH_CLOSE, np.ones((50, 1), np.uint8)) # Apply closing vertically - create two large clusters
nlabel, labels, stats, centroids = cv2.connectedComponentsWithStats(thesh, 4) # Finding connected components with statistics
parts_list = []
# Iterate connected components:
for i in range(1, nlabel):
top = int(stats[i, cv2.CC_STAT_TOP]) # Get most top y coordinate of the connected component
height = int(stats[i, cv2.CC_STAT_HEIGHT]) # Get the height of the connected component
roi = img[top-5:top+height+5, :] # Crop the relevant part of the image (add 5 extra rows from top and bottom).
parts_list.append(roi.copy()) # Add the cropped area to a list
cv2.imwrite(f'part{i}.png', roi) # Save the image part for testing
cv2.imshow(f'part{i}', roi) # Show part for testing
# Show image and thesh testing
cv2.imshow('img', img)
cv2.imshow('thesh', thesh)
cv2.waitKey()
cv2.destroyAllWindows()
Results:
Stage 1:
Stage 2:
Stage 3:
Stage 4:
Top area:
Bottom area:
I am working on a task to extract the account number from cheque images. My current approach can be divided into 2 steps
Localize account number digits (Printed digits)
Perform OCR using OCR libraries like Tesseract OCR
The second step is straight forward assuming we have properly localized the account number digits
I tried to localize account number digits using OpenCV contours methods and using MSER (Maximally stable extremal regions) but didn’t get useful results. It’s difficult to generalize pattern because
Different bank cheques have variations in template
Account number position is not fixed
How can we approach this problem. Do I have to look for some deep learning based approaches.
Sample Images
Assuming the account number has the unique purple text color, we can use color thresholding. The idea is to convert the image to HSV color space then define a lower/upper color range and perform color thresholding using cv2.inRange(). From here we filter by contour area to remove small noise. Finally we invert the image since we want the text in black with the background in white. One last step is to Gaussian blur the image before throwing it into Pytesseract. Here's the result:
Result from Pytesseract
30002010108841
Code
import numpy as np
import pytesseract
import cv2
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
image = cv2.imread('1.png')
hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
lower = np.array([103,79,60])
upper = np.array([129,255,255])
mask = cv2.inRange(hsv, lower, upper)
cnts = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
area = cv2.contourArea(c)
if area < 10:
cv2.drawContours(mask, [c], -1, (0,0,0), -1)
mask = 255 - mask
mask = cv2.GaussianBlur(mask, (3,3), 0)
data = pytesseract.image_to_string(mask, lang='eng',config='--psm 6')
print(data)
cv2.imshow('mask', mask)
cv2.waitKey()
Thanks, everyone for the suggestions, I ended up training deep learning object detection method to localize Account number and it gave very good results as compared to OpenCV based methods
How do I isolate or crop only the handwritten text using OpenCV and Phyton for the image:
I have tried to use:
cv2.findContours
but because of the noise (background and dirty in paper) I can't get only the paper.
How do I do this?
To smooth noisy images, typical methods are to apply some type of blurring filter. For instance cv2.GaussianBlur(), cv2.medianBlur(), or cv2.bilaterialFilter() can be used to remove salt/pepper noise. After blurring, we can threshold to obtain a binary image then perform morphological operations. From here, we can find contours and filter using aspect ratio or contour area. To crop the ROI, we can use Numpy slicing
Detected text
Extracted ROI
Code
import cv2
image = cv2.imread('1.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.medianBlur(gray, 5)
thresh = cv2.adaptiveThreshold(blur,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV,11,8)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5))
dilate = cv2.dilate(thresh, kernel, iterations=6)
cnts = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)
for c in cnts:
x,y,w,h = cv2.boundingRect(c)
ROI = image[y:y+h, x:x+w]
cv2.imwrite('ROI.png', ROI)
break
cv2.imshow('thresh', thresh)
cv2.imshow('dilate', dilate)
cv2.imshow('ROI', ROI)
cv2.waitKey()
Convert image to single channel gray.
Apply adaptiveThreshold to your image. Handwriting will become of black color, rest will be white.
If you want to get segmentation for this word as a solid thing, then also apply morphologyEx, with MORPH_CLOSE. Here you should play with kernel, most likely it will be ellipse 3x3, and number of iterations, usually 5-10 iterations is ok.
kernel = cv2.getStructuringElement(shape=cv2.MORPH_ELLIPSE, ksize=(3, 3))
image = cv2.morphologyEx(image, cv2.MORPH_CLOSE, kernel, iterations=7)
Use connectedComponentsWithStats. It will put every char into separate component. stats will hold bounding boxes either for whole word, or (if you omit step #2) it will hold info for each connected characters group.
P.S.: Let me know if you need full code example.
i'm trying to detect a card, but the problem is that sometimes the image is not good and has several backgrounds, like this:
Not well define edges
![Not well define edges][1]
Example background
![Example background][2]
I did this:
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray,(11,11),0)
edg = cv2.Canny(gray, 10, 20)
contours,_ = cv2.findContours(edg.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
cv2.drawContours(image,contours,-1,[0,255,0],2)
cv2.imshow('image',image)
cv2.waitKey(0)
But sometimes he detects other stuff, and not the card. Anyone has ideias how to solve this?
I've tried Object Detection with YOLO, but it's hard
first of all, note that, there are some condition like light condition and medium condition in take photo which if you can control them, the image processing section load will decreases. for example in your example image, you can put A4 white paper in the background to reduce small contours and so on(Of course it's impossible to change condition).
Well, i try on your test image with this code :
import cv2
rectKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (8, 8))
sqKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (17, 17))
img = cv2.imread('edge.jpg')
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
gradX = cv2.Sobel(gray, ddepth=cv2.CV_32F, dx=1, dy=0,
ksize=-1)
morph1 = cv2.morphologyEx(gradX, cv2.MORPH_OPEN, rectKernel)
morph2 = cv2.morphologyEx(morph1, cv2.MORPH_CLOSE, sqKernel)
cv2.imshow("img",img)
cv2.imshow("gradx",gradX)
cv2.imshow("tophat",morph1)
cv2.imshow("tophat2",morph2)
cv2.waitKey()
here is the results:
You can use contours and remove unwanted contours using contour properties : Contour Properties
I am working on Retinal fundus images.The image consists of a circular retina on a black background. With OpenCV, I have managed to get a contour which surrounds the whole circular Retina. What I need is to crop out the circular retina from the black background.
It is unclear in your question whether you want to actually crop out the information that is defined within the contour or mask out the information that isn't relevant to the contour chosen. I'll explore what to do in both situations.
Masking out the information
Assuming you ran cv2.findContours on your image, you will have received a structure that lists all of the contours available in your image. I'm also assuming that you know the index of the contour that was used to surround the object you want. Assuming this is stored in idx, first use cv2.drawContours to draw a filled version of this contour onto a blank image, then use this image to index into your image to extract out the object. This logic masks out any irrelevant information and only retain what is important - which is defined within the contour you have selected. The code to do this would look something like the following, assuming your image is a grayscale image stored in img:
import numpy as np
import cv2
img = cv2.imread('...', 0) # Read in your image
# contours, _ = cv2.findContours(...) # Your call to find the contours using OpenCV 2.4.x
_, contours, _ = cv2.findContours(...) # Your call to find the contours
idx = ... # The index of the contour that surrounds your object
mask = np.zeros_like(img) # Create mask where white is what we want, black otherwise
cv2.drawContours(mask, contours, idx, 255, -1) # Draw filled contour in mask
out = np.zeros_like(img) # Extract out the object and place into output image
out[mask == 255] = img[mask == 255]
# Show the output image
cv2.imshow('Output', out)
cv2.waitKey(0)
cv2.destroyAllWindows()
If you actually want to crop...
If you want to crop the image, you need to define the minimum spanning bounding box of the area defined by the contour. You can find the top left and lower right corner of the bounding box, then use indexing to crop out what you need. The code will be the same as before, but there will be an additional cropping step:
import numpy as np
import cv2
img = cv2.imread('...', 0) # Read in your image
# contours, _ = cv2.findContours(...) # Your call to find the contours using OpenCV 2.4.x
_, contours, _ = cv2.findContours(...) # Your call to find the contours
idx = ... # The index of the contour that surrounds your object
mask = np.zeros_like(img) # Create mask where white is what we want, black otherwise
cv2.drawContours(mask, contours, idx, 255, -1) # Draw filled contour in mask
out = np.zeros_like(img) # Extract out the object and place into output image
out[mask == 255] = img[mask == 255]
# Now crop
(y, x) = np.where(mask == 255)
(topy, topx) = (np.min(y), np.min(x))
(bottomy, bottomx) = (np.max(y), np.max(x))
out = out[topy:bottomy+1, topx:bottomx+1]
# Show the output image
cv2.imshow('Output', out)
cv2.waitKey(0)
cv2.destroyAllWindows()
The cropping code works such that when we define the mask to extract out the area defined by the contour, we additionally find the smallest horizontal and vertical coordinates which define the top left corner of the contour. We similarly find the largest horizontal and vertical coordinates that define the bottom left corner of the contour. We then use indexing with these coordinates to crop what we actually need. Note that this performs cropping on the masked image - that is the image that removes everything but the information contained within the largest contour.
Note with OpenCV 3.x
It should be noted that the above code assumes you are using OpenCV 2.4.x. Take note that in OpenCV 3.x, the definition of cv2.findContours has changed. Specifically, the output is a three element tuple output where the first image is the source image, while the other two parameters are the same as in OpenCV 2.4.x. Therefore, simply change the cv2.findContours statement in the above code to ignore the first output:
_, contours, _ = cv2.findContours(...) # Your call to find contours
Here's another approach to crop out a rectangular ROI. The main idea is to find the edges of the retina using Canny edge detection, find contours, and then extract the ROI using Numpy slicing. Assuming you have an input image like this:
Extracted ROI
import cv2
# Load image, convert to grayscale, and find edges
image = cv2.imread('1.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_OTSU + cv2.THRESH_BINARY)[1]
# Find contour and sort by contour area
cnts = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)
# Find bounding box and extract ROI
for c in cnts:
x,y,w,h = cv2.boundingRect(c)
ROI = image[y:y+h, x:x+w]
break
cv2.imshow('ROI',ROI)
cv2.imwrite('ROI.png',ROI)
cv2.waitKey()
This is a pretty simple way. Mask the image with transparency.
Read the image
Make a grayscale version.
Otsu Threshold
Apply morphology open and close to thresholded image as a mask
Put the mask into the alpha channel of the input
Save the output
Input
Code
import cv2
import numpy as np
# load image as grayscale
img = cv2.imread('retina.jpeg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# threshold input image using otsu thresholding as mask and refine with morphology
ret, mask = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)
kernel = np.ones((9,9), np.uint8)
mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, kernel)
mask = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel)
# put mask into alpha channel of result
result = img.copy()
result = cv2.cvtColor(result, cv2.COLOR_BGR2BGRA)
result[:, :, 3] = mask
# save resulting masked image
cv2.imwrite('retina_masked.png', result)
Output