Slicing of a scanned image based on large white spaces - python

I am planning to split the questions from this PDF document. The challenge is that the questions are not orderly spaced. For example the first question occupies an entire page, second also the same while the third and fourth together make up one page. If I have to manually slice it, it will be ages. So, I thought to split it up into images and work on them. Is there a possibility to take image as this
and split into individual components like this?

This is a classic situation for dilate. The idea is that adjacent text corresponds with the same question while text that is farther away is part of another question. Whenever you want to connect multiple items together, you can dilate them to join adjacent contours into a single contour. Here's a simple approach:
Obtain binary image. Load the image, convert to grayscale, Gaussian blur, then Otsu's threshold to obtain a binary image.
Remove small noise and artifacts. We create a rectangular kernel and morph open to remove small noise and artifacts in the image.
Connect adjacent words together. We create a larger rectangular kernel and dilate to merge individual contours together.
Detect questions. From here we find contours, sort contours from top-to-bottom using imutils.sort_contours(), filter with a minimum contour area, obtain the rectangular bounding rectangle coordinates and highlight the rectangular contours. We then crop each question using Numpy slicing and save the ROI image.
Otsu's threshold to obtain a binary image
Here's where the interesting section happens. We assume that adjacent text/characters are part of the same question so we merge individual words into a single contour. A question is a section of words that are close together so we dilate to connect them all together.
Individual questions highlighted in green
Top question
Bottom question
Saved ROI questions (assumption is from top-to-bottom)
Code
import cv2
from imutils import contours
# Load image, grayscale, Gaussian blur, Otsu's threshold
image = cv2.imread('1.png')
original = image.copy()
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (7,7), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Remove small artifacts and noise with morph open
open_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5))
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, open_kernel, iterations=1)
# Create rectangular structuring element and dilate
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (9,9))
dilate = cv2.dilate(opening, kernel, iterations=4)
# Find contours, sort from top to bottom, and extract each question
cnts = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
(cnts, _) = contours.sort_contours(cnts, method="top-to-bottom")
# Get bounding box of each question, crop ROI, and save
question_number = 0
for c in cnts:
# Filter by area to ensure its not noise
area = cv2.contourArea(c)
if area > 150:
x,y,w,h = cv2.boundingRect(c)
cv2.rectangle(image, (x, y), (x + w, y + h), (36,255,12), 2)
question = original[y:y+h, x:x+w]
cv2.imwrite('question_{}.png'.format(question_number), question)
question_number += 1
cv2.imshow('thresh', thresh)
cv2.imshow('dilate', dilate)
cv2.imshow('image', image)
cv2.waitKey()

We may solve it using (mostly) morphological operations:
Read the input image as grayscale.
Apply thresholding with inversion.
Automatic thresholding using cv2.THRESH_OTSU is working well.
Apply opening morphological operation for removing small artifacts (using the kernel np.ones(1, 3))
Dilate horizontally with very long horizontal kernel - make horizontal lines out of the text lines.
Apply closing vertically - create two large clusters.
The size of the vertical kernel should be tuned according to the typical gap.
Finding connected components with statistics.
Iterate the connected components and crop the relevant area in the vertical direction.
Complete code sample:
import cv2
import numpy as np
img = cv2.imread('scanned_image.png', cv2.IMREAD_GRAYSCALE) # Read image as grayscale
thesh = cv2.threshold(img, 0, 255, cv2.THRESH_OTSU + cv2.THRESH_BINARY_INV)[1] # Apply automatic thresholding with inversion.
thesh = cv2.morphologyEx(thesh, cv2.MORPH_OPEN, np.ones((1, 3), np.uint8)) # Apply opening morphological operation for removing small artifacts.
thesh = cv2.dilate(thesh, np.ones((1, img.shape[1]), np.uint8)) # Dilate horizontally - make horizontally lines out of the text.
thesh = cv2.morphologyEx(thesh, cv2.MORPH_CLOSE, np.ones((50, 1), np.uint8)) # Apply closing vertically - create two large clusters
nlabel, labels, stats, centroids = cv2.connectedComponentsWithStats(thesh, 4) # Finding connected components with statistics
parts_list = []
# Iterate connected components:
for i in range(1, nlabel):
top = int(stats[i, cv2.CC_STAT_TOP]) # Get most top y coordinate of the connected component
height = int(stats[i, cv2.CC_STAT_HEIGHT]) # Get the height of the connected component
roi = img[top-5:top+height+5, :] # Crop the relevant part of the image (add 5 extra rows from top and bottom).
parts_list.append(roi.copy()) # Add the cropped area to a list
cv2.imwrite(f'part{i}.png', roi) # Save the image part for testing
cv2.imshow(f'part{i}', roi) # Show part for testing
# Show image and thesh testing
cv2.imshow('img', img)
cv2.imshow('thesh', thesh)
cv2.waitKey()
cv2.destroyAllWindows()
Results:
Stage 1:
Stage 2:
Stage 3:
Stage 4:
Top area:
Bottom area:

Related

How to crop image to only text section with Python OpenCV?

I want to crop the image to only extract the text sections. There are thousands of them with different sizes so I can't hardcode coordinates. I'm trying to remove the unwanted lines on the left and on the bottom. How can I do this?
Original
Expected
Determine the least spanning bounding box by finding all the non-zero points in the image. Finally, crop your image using this bounding box. Finding the contours is time-consuming and unnecessary here, especially because your text is axis-aligned. You may accomplish your goal by combining cv2.findNonZero and cv2.boundingRect.
Hope this will work ! :
import numpy as np
import cv2
img = cv2.imread(r"W430Q.png")
# Read in the image and convert to grayscale
img = img[:-20, :-20] # Perform pre-cropping
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
gray = 255*(gray < 50).astype(np.uint8) # To invert the text to white
gray = cv2.morphologyEx(gray, cv2.MORPH_OPEN, np.ones(
(2, 2), dtype=np.uint8)) # Perform noise filtering
coords = cv2.findNonZero(gray) # Find all non-zero points (text)
x, y, w, h = cv2.boundingRect(coords) # Find minimum spanning bounding box
# Crop the image - note we do this on the original image
rect = img[y:y+h, x:x+w]
cv2.imshow("Cropped", rect) # Show it
cv2.waitKey(0)
cv2.destroyAllWindows()
in above code from forth line of code is where I set the threshold below 50 to make the dark text white. However, because this outputs a binary image, I convert to uint8, then scale by 255. The text is effectively inverted.
Then, using cv2.findNonZero, we discover all of the non-zero locations for this image.We then passed this to cv2.boundingRect, which returns the top-left corner of the bounding box, as well as its width and height. Finally, we can utilise this to crop the image. This is done on the original image, not the inverted version.
Here's a simple approach:
Obtain binary image. Load the image, grayscale, Gaussian blur, then Otsu's threshold to obtain a binary black/white image.
Remove horizontal lines. Since we're trying to only extract text, we remove horizontal lines to aid us in our next step so incorrect contours will not merge together.
Merge text into a single contour. The idea is that characters which are adjacent to each other are part of the wall of text. So we can dilate individual contours together to obtain a single contour to extract.
Find contours and extract ROI. We find contours, sort contours by area, then extract the largest contour ROI using Numpy slicing.
Here's the visualization of each step:
Binary image -> Removed horizontal lines in green
1
2
Dilate to combine into a single contour -> Detected ROI to extract in green
3
4
Result
Code
import cv2
import numpy as np
# Load image, grayscale, Gaussian blur, Otsu's threshold
image = cv2.imread('1.png')
original = image.copy()
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (3, 3), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Remove horizontal lines
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (25,1))
detected_lines = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=1)
cnts = cv2.findContours(detected_lines, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
cv2.drawContours(thresh, [c], -1, 0, -1)
# Dilate to merge into a single contour
vertical_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2,30))
dilate = cv2.dilate(thresh, vertical_kernel, iterations=3)
# Find contours, sort for largest contour and extract ROI
cnts, _ = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)[-2:]
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)[:-1]
for c in cnts:
x,y,w,h = cv2.boundingRect(c)
cv2.rectangle(image, (x, y), (x + w, y + h), (36,255,12), 4)
ROI = original[y:y+h, x:x+w]
break
cv2.imshow('image', image)
cv2.imshow('dilate', dilate)
cv2.imshow('thresh', thresh)
cv2.imshow('ROI', ROI)
cv2.waitKey()

How to enclose the irregular figure contour and fill it with with 5px dots in opencv using python?

This is my
I want to get this
but the problem is I am not able to enclose the contour and how should I add these dots?
Does Open cv have any such function to handle this?
So basically,
The first problem is how to enclose this image
Second, how to add Dots.
Thank you
Here is one way to do that in Python/OpenCV. However, I cannot close your dotted outline without connecting separate regions. But it will give you some idea how to proceed with most of what you want to do.
If you manually add a few more dots to your input image where there are large gaps, then the morphology kernel can be made smaller such that it can connected the regions without merging separate parts that should remain isolated.
Read the input
Convert to grayscale
Threshold to binary
Apply morphology close to try to close the dotted outline. Unfortunately it connected separate regions.
Get the external contours
Draw white filled contours on a black background as a mask
Draw a single black circle on a white background
Tile out the circle image to the size of the input
Mask the tiled circle image with the filled contour image
Save results
Input:
import cv2
import numpy as np
import math
# read input image
img = cv2.imread('island.png')
hh, ww = img.shape[:2]
# convert img to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# threshold
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY)[1]
# use morphology to close figure
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (35,35))
morph = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel, )
# find contours and bounding boxes
mask = np.zeros_like(thresh)
contours = cv2.findContours(morph, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours = contours[0] if len(contours) == 2 else contours[1]
for cntr in contours:
cv2.drawContours(mask, [cntr], 0, 255, -1)
# create a single tile as black circle on white background
circle = np.full((11,11), 255, dtype=np.uint8)
circle = cv2.circle(circle, (7,7), 3, 0, -1)
# tile out the tile pattern to the size of the input
numht = math.ceil(hh / 11)
numwd = math.ceil(ww / 11)
tiled_circle = np.tile(circle, (numht,numwd))
tiled_circle = tiled_circle[0:hh, 0:ww]
# composite tiled_circle with mask
result = cv2.bitwise_and(tiled_circle, tiled_circle, mask=mask)
# save result
cv2.imwrite("island_morph.jpg", morph)
cv2.imwrite("island_mask.jpg", mask)
cv2.imwrite("tiled_circle.jpg", tiled_circle)
cv2.imwrite("island_result.jpg", result)
# show images
cv2.imshow("morph", morph)
cv2.imshow("mask", mask)
cv2.imshow("tiled_circle", tiled_circle)
cv2.imshow("result", result)
cv2.waitKey(0)
Morphology connected image:
Contour Mask image:
Tiled circles:
Result:

How to remove border edge noise from an image using python?

I am trying to preprocess a photo of the eye vessels by removing the black border and extraneous non-eye features in the image (Ex. see below for text and "clip") by replacing the black areas with the average pixel values from 3 random squares.
crop1 = randomCrop(image2, 50, 50) #Function that finds random 50x50 area
crop2 = randomCrop(image2, 50, 50)
crop3 = randomCrop(image2, 50, 50)
mean1 = RGB_Mean(crop1)
mean2 = RGB_Mean(crop2)
mean3 = RGB_Mean(crop3)
#RGB Mean
result = [statistics.mean(k) for k in zip(mean1, mean2, mean3)]
for i in range(len(image2[:,0, 0])):
for j in range(len(image2[0,:,0])):
thru_pixel = image2[i, j]
if (thru_pixel[0] < 50 and thru_pixel[1] < 50 and thru_pixel[2] < 50):
image2[i,j, :] = result
if (thru_pixel[0] > 190 and thru_pixel[1] > 190 and thru_pixel[2] > 190):
image2[i,j, :] = result
However, there is leftover noise around the border of the image, as well as leftover text and a clip at the bottom left.
Here's an example image.
Original :
and Post-Processing
You can see there's still left over black-gray border noise as well as text at the bottom right and a "clip" at the bottom left. Is there anything I could try to get rid of any of these artifacts while maintaining the integrity of the eye blood vessels?
Thank you for your time and help!
Assuming you want to isolate the eye blood vessels, here's an approach which can be broken down into two stages, one to remove the artifacts and another to isolate blood vessels
Convert image to grayscale
Otsu's threshold to obtain binary image
Perform morphological operations to remove artifacts
Adaptive threshold to isolate blood vessels
Find contours and filter using maximum threshold area
Bitwise-and to get final result
Beginning from your original image, we convert to grayscale and Otsu's threshold to obtain a binary image
Now we perform morph open to remove the artifacts (left). We inverse this mask to obtain the white border and then do a series of bitwise operations to get the removed artifacts image (right)
From here we adaptive threshold to get the veins
Note there is the unwanted border so we find contours and filter using a maximum threshold area. If a contour passes the filter, we draw it onto a blank mask
Finally we perform bitwise-and on the original image to get our result
Note we could have performed additional morph open after the adaptive threshold to remove the small particles of noise but the tradeoff is that it will remove vein details. I'll leave this optional step up to you
import cv2
import numpy as np
# Grayscale, Otsu's threshold, opening
image = cv2.imread('1.png')
blank_mask = np.zeros(image.shape, dtype=np.uint8)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (15,15))
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=3)
inverse = 255 - opening
inverse = cv2.merge([inverse,inverse,inverse])
removed_artifacts = cv2.bitwise_and(image,image,mask=opening)
removed_artifacts = cv2.bitwise_or(removed_artifacts, inverse)
# Isolate blood vessels
veins_gray = cv2.cvtColor(removed_artifacts, cv2.COLOR_BGR2GRAY)
adaptive = cv2.adaptiveThreshold(veins_gray,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV,11,3)
cnts = cv2.findContours(adaptive, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
area = cv2.contourArea(c)
if area < 5000:
cv2.drawContours(blank_mask, [c], -1, (255,255,255), 1)
blank_mask = cv2.cvtColor(blank_mask, cv2.COLOR_BGR2GRAY)
final = cv2.bitwise_and(image, image, mask=blank_mask)
# final[blank_mask==0] = (255,255,255) # White version
cv2.imshow('thresh', thresh)
cv2.imshow('opening', opening)
cv2.imshow('removed_artifacts', removed_artifacts)
cv2.imshow('adaptive', adaptive)
cv2.imshow('blank_mask', blank_mask)
cv2.imshow('final', final)
cv2.waitKey()

How to determine whether object is embossed or debossed using OpenCV?

Let's say I have image with embossed and debossed object like this
or
Is there a way to determine the above object is embossed and the below object is debossed using OpenCV? Preferably using C++, but Python is also fine. I couldn't find any good resource on the internet.
Here's an approach which takes advantage of the sunken and lifted contours of the embossed/debossed image. The main idea is:
Convert the image to grayscale
Perform a morphological transformation
Find outlines using Canny edge detection
Dilate canny image to merge individual contours into a single contour
Perform contour detection to find the ROI dimensions of top/bottom halves
Obtain ROI of top/bottom canny image
Count non-zero array elements for each half
Convert to grayscale and perform morphological transformation
Perform canny edge detection to find outlines. The key to determine if an object is embossed/debossed is to compare the canny edges. Here's the approach: We look at the object, if its upper half has more contour/lines/pixels than its lower half then it is debossed. Similarly, if the upper half has less pixels than its lower half then it is embossed.
Now that we have the canny edges, we dialte the image until all the contours connect so we obtain one single object.
We then perform contour detection to obtain the ROI of the objects
From here, we separate each object into top and bottom sections
Now that we have the ROI of the top and bottom sections, we crop the ROI in the canny image
With each half, we count non-zero array elements using cv2.countNonZero(). For the embossed object, we get this
('top', 1085)
('bottom', 1899)
For the debossed object, we get this
('top', 979)
('bottom', 468)
Therefore by comparing the values between the two halves, if the top half has less pixels than the bottom, it is embossed. If it has more, it is debossed.
import numpy as np
import cv2
original_image = cv2.imread("1.jpg")
image = original_image.copy()
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5))
morph = cv2.morphologyEx(gray, cv2.MORPH_OPEN, kernel)
canny = cv2.Canny(morph, 130, 255, 1)
# Dilate canny image so contours connect and form a single contour
dilate = cv2.dilate(canny, kernel, iterations=4)
cv2.imshow("morph", morph)
cv2.imshow("canny", canny)
cv2.imshow("dilate", dilate)
# Find contours in the image
cnts = cv2.findContours(dilate.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
contours = []
# For each image separate it into top/bottom halfs
for c in cnts:
# Obtain bounding rectangle for each contour
x,y,w,h = cv2.boundingRect(c)
# Draw bounding box rectangle
cv2.rectangle(original_image,(x,y),(x+w,y+h),(0,255,0),3)
# cv2.rectangle(original_image,(x,y),(x+w,y+h/2),(0,255,0),3) # top
# cv2.rectangle(original_image,(x,y+h/2),(x+w,y+h),(0,255,0),3) # bottom
top_half = ((x,y), (x+w, y+h/2))
bottom_half = ((x,y+h/2), (x+w, y+h))
# Collect top/bottom ROIs
contours.append((top_half, bottom_half))
for index, c in enumerate(contours):
top_half, bottom_half = c
top_x1,top_y1 = top_half[0]
top_x2,top_y2 = top_half[1]
bottom_x1,bottom_y1 = bottom_half[0]
bottom_x2,bottom_y2 = bottom_half[1]
# Grab ROI of top/bottom section from canny image
top_image = canny[top_y1:top_y2, top_x1:top_x2]
bottom_image = canny[bottom_y1:bottom_y2, bottom_x1:bottom_x2]
cv2.imshow('top_image', top_image)
cv2.imshow('bottom_image', bottom_image)
# Count non-zero array elements
top_pixels = cv2.countNonZero(top_image)
bottom_pixels = cv2.countNonZero(bottom_image)
print('top', top_pixels)
print('bottom', bottom_pixels)
cv2.imshow("detected", original_image)
print('contours detected: {}'.format(len(contours)))
cv2.waitKey(0)
One insight you could use is that an embossed object is usually brighter than a debossed object.
I would probably do an edge detection to find the "boss-edges" which should form a closed polygon, and compare the relative lightness value of the enclosed "bossment". Special care must be taken for objects with holes, e.g. the letter O, but it is do-able.
You can probably do more sophisticated processing if you know the light direction that is hitting the bossment. e.g. if you know the light is coming from top left, you can try only focusing on the top left edge pixels

Python + OpenCV: OCR Image Segmentation

I am trying to do OCR from this toy example of Receipts. Using Python 2.7 and OpenCV 3.1.
Grayscale + Blur + External Edge Detection + Segmentation of each area in the Receipts (for example "Category" to see later which one is marked -in this case cash-).
I find complicated when the image is "skewed" to be able to properly transform and then "automatically" segment each segment of the receipts.
Example:
Any suggestion?
The code below is an example to get until the edge detection, but when the receipt is like the first image. My issue is not the Image to text. Is the pre-processing of the image.
Any help more than appreciated! :)
import os;
os.chdir() # Put your own directory
import cv2
import numpy as np
image = cv2.imread("Rent-Receipt.jpg", cv2.IMREAD_GRAYSCALE)
blurred = cv2.GaussianBlur(image, (5, 5), 0)
#blurred = cv2.bilateralFilter(gray,9,75,75)
# apply Canny Edge Detection
edged = cv2.Canny(blurred, 0, 20)
#Find external contour
(_,contours, _) = cv2.findContours(edged, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
A great tutorial on the first step you described is available at pyimagesearch (and they have great tutorials in general)
In short, as described by Ella, you would have to use cv2.CHAIN_APPROX_SIMPLE. A slightly more robust method would be to use cv2.RETR_LIST instead of cv2.RETR_EXTERNAL and then sort the areas, as it should decently work even in white backgrounds/if the page inscribes a bigger shape in the background, etc.
Coming to the second part of your question, a good way to segment the characters would be to use the Maximally stable extremal region extractor available in OpenCV. A complete implementation in CPP is available here in a project I was helping out in recently. The Python implementation would go along the lines of (Code below works for OpenCV 3.0+. For the OpenCV 2.x syntax, check it up online)
import cv2
img = cv2.imread('test.jpg')
mser = cv2.MSER_create()
#Resize the image so that MSER can work better
img = cv2.resize(img, (img.shape[1]*2, img.shape[0]*2))
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
vis = img.copy()
regions = mser.detectRegions(gray)
hulls = [cv2.convexHull(p.reshape(-1, 1, 2)) for p in regions[0]]
cv2.polylines(vis, hulls, 1, (0,255,0))
cv2.namedWindow('img', 0)
cv2.imshow('img', vis)
while(cv2.waitKey()!=ord('q')):
continue
cv2.destroyAllWindows()
This gives the output as
Now, to eliminate the false positives, you can simply cycle through the points in hulls, and calculate the perimeter (sum of distance between all adjacent points in hulls[i], where hulls[i] is a list of all points in one convexHull). If the perimeter is too large, classify it as not a character.
The diagnol lines across the image are coming because the border of the image is black. that can simply be removed by adding the following line as soon as the image is read (below line 7)
img = img[5:-5,5:-5,:]
which gives the output
The option on the top of my head requires the extractions of 4 corners of the skewed image. This is done by using cv2.CHAIN_APPROX_SIMPLE instead of cv2.CHAIN_APPROX_NONE when finding contours. Afterwards, you could use cv2.approxPolyDP and hopefully remain with the 4 corners of the receipt (If all your images are like this one then there is no reason why it shouldn't work).
Now use cv2.findHomography and cv2.wardPerspective to rectify the image according to source points which are the 4 points extracted from the skewed image and destination points that should form a rectangle, for example the full image dimensions.
Here you could find code samples and more information:
OpenCV-Geometric Transformations of Images
Also this answer may be useful - SO - Detect and fix text skew
EDIT: Corrected the second chain approx to cv2.CHAIN_APPROX_NONE.
Preprocessing the image by converting the desired text in the foreground to black while turning unwanted background to white can help to improve OCR accuracy. In addition, removing the horizontal and vertical lines can improve results. Here's the preprocessed image after removing unwanted noise such as the horizontal/vertical lines. Note the removed border and table lines
import cv2
# Load in image, convert to grayscale, and threshold
image = cv2.imread('1.jpg')
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Find and remove horizontal lines
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (35,2))
detect_horizontal = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2)
cnts = cv2.findContours(detect_horizontal, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
cv2.drawContours(thresh, [c], -1, (0,0,0), 3)
# Find and remove vertical lines
vertical_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1,35))
detect_vertical = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, vertical_kernel, iterations=2)
cnts = cv2.findContours(detect_vertical, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
cv2.drawContours(thresh, [c], -1, (0,0,0), 3)
# Mask out unwanted areas for result
result = cv2.bitwise_and(image,image,mask=thresh)
result[thresh==0] = (255,255,255)
cv2.imshow('thresh', thresh)
cv2.imshow('result', result)
cv2.waitKey()
Try using Stroke Width Transform. Python 3 implementation of the algorithm is present here at SWTloc
EDIT : v2.0.0 onwards
Install the Library
pip install swtloc
Transform The Image
import swtloc as swt
imgpath = 'images/path_to_image.jpeg'
swtl = swt.SWTLocalizer(image_paths=imgpath)
swtImgObj = swtl.swtimages[0]
# Perform SWT Transformation with numba engine
swt_mat = swtImgObj.transformImage(text_mode='lb_df', gaussian_blurr=False,
minimum_stroke_width=3, maximum_stroke_width=12,
maximum_angle_deviation=np.pi/2)
Localize Letters
localized_letters = swtImgObj.localizeLetters(minimum_pixels_per_cc=10,
localize_by='min_bbox')
Localize Words
localized_words = swtImgObj.localizeWords(localize_by='bbox')
There are multiple parameters in the of the .transformImage, .localizeLetters and .localizeWords function sthat you can play around with to get the desired results.
Full Disclosure : I am the author of this library

Categories