Filling gaps in characters using cv2 - python

I have an image file with text which I want to extract using OCR.
But it has a diagonal overlapping line of text over it (top right), like .
I remove this line using,
image = cv2.imread(image_path)
image = cv2.resize(image, None, fx=2, fy=2, interpolation=cv2.INTER_CUBIC)
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
image = cv2.GaussianBlur(image, (5, 5), 0)
image = cv2.threshold(image, 100, 255, cv2.THRESH_BINARY)[1] # 100 here as the diagonal line is grey
This results in an image like, .
Notice the thick characters for shear stress, it is one of the regions where the diagonal line overlapped.
Now I apply OCR. However, the previous steps remove some pixels. For instance, the e in edge dislocation is not complete.
This results in poor results like, "edve dislocation". I tried erosion and dilation but with no significant improvement.
Is there any way to fill up the holes in characters?
Is there any way to reduce thickness of the characters which overlap with the diagonal line?

Since in Images if you see, we can represent dark areas(Black) from 2^0 = 0 to light areas(white) 2^8 = 256 .
So one thing which you can try (I'm also not sure on this):
img = cv2.imread(image_path,0)
new_img = img.copy()
new_img[new_img<=230] = 0 ## just try to change that 230 value to anywhere b/w 150 to 230
then try to use OCR to check if it really worked.
-- apply this on result of your image after removing overlapping

Related

The mask I am creating is clipping the image I am trying to paste over it

I am trying to paste an image(noise) on top of a background image(back_eq).
The problem is that when applying the mask (mask = np.uint8(alpha/255) the mask gets clipped clipped mask
this is the original shape i am trying to paste the white shape should get onto the background (but black)
so the result is this clipped result
The problem fixes when instead of normalizing with 255 we use a value smaller s.a 245 or 240 (mask = np.uint8(alpha/240))
The problem is that this is a correct normalization. Any suggestion on how to fix the mask with a correct normalization being mandatory?
import numpy as np
import cv2
import matplotlib.pyplot as plt
noise = cv2.imread("3_noisy.jpg")
noise = cv2.resize(noise,(300,300), interpolation = cv2.INTER_LINEAR)
alpha = cv2.imread("3_alpha.jpg")
alpha = cv2.resize(alpha,(300,300), interpolation = cv2.INTER_LINEAR)
back_eq = cv2.imread('Results/back_eq.jpg')
back_eq_crop = cv2.imread('Results/back_eq_crop.jpg')
im_3_tone = cv2.imread('Results/im_3_tone.jpg')
final = back_eq.copy()
back_eq_h, back_eq_w, _ = back_eq.shape
noisy_h, noisy_w,_ = noise.shape
l1 = back_eq_h//2 - noisy_h//2
l2 = back_eq_h//2 + noisy_h//2
l3 = back_eq_w//2 - noisy_w//2
l4 = back_eq_w//2 + noisy_w//2
print(alpha.shape)
# normalizing the values
mask = np.uint8(alpha/255)
# masking back_eq_crop
masked_back_eq_crop = cv2.multiply(back_eq_crop,(1-mask))
cv2.imshow('as',masked_back_eq_crop)
cv2.waitKey(0)
cv2.destroyAllWindows()
# creating the masked region
mask_to_add = cv2.multiply(im_3_tone, mask)
cv2.imshow('as',mask_to_add)
cv2.waitKey(0)
cv2.destroyAllWindows()
# Combining
masked_image = cv2.add(masked_back_eq_crop, mask_to_add)
final[l1:l2, l3:l4] = masked_image
cv2.imshow('aa',masked_image)
cv2.waitKey(0)
cv2.destroyAllWindows()
plt.figure()
plt.imshow(final[:, :, ::-1]);plt.axis("off");plt.title("Final Image")
plt.show()
retval=cv2.imwrite("Results/Final Image.jpg", final)
To use binary mask threshold of 255, you have to have a properly prepared image - preferably already binary. Because 255 means only pure white (#FFFFFF) will stay white. Even the lightest gray will become black.
And in your case, well... the image has antialiasing (edges are softened), and you're doing scaling in the code. But moreover, your white is not pure white. There's a hole in the result.
To show it, instead of just talking:
I loaded your mask in GIMP,I loaded your mask pic in gimp, got the 'select by colour' tool, disabled antialiasing and turned threshold to 0 - everything so that only pure white FFFFFF gets selected, the same as your code.
Aaaand we see the holes. The tail is pixely already, same with hair... the hole in the face is there... Hole's colour is #FEFEFE - 254, making it black with threshold of 255.
The best threshold in such (pseudo) "black-and-white" is actually near the middle (128). Because antialiasing makes the edges be blackish-gray or whiteish-gray - no middle grays, so middle gray separates the two groups nicely. And your "visually white but not pure white" (+similar blacks) get into those groups as well. Even if you believe to have only pure black and pure white in your image - if you load it as colour or grayscale, you have 0 and 255 values anyways, so 128 will work. (I don't have access to my old code right now, but I believe I kept my thresholds around 200 when I played with images?)
tl;dr:
Threshold 255 only makes #FFFFFF white, it's never good
your picture has a lot of "visually white but not #FFFFFF white" pixels
there's nothing bad in using lower threshold, even around middle of the range for pseudo black-and-white

how to draw outlines of objects on an image to a separate image

i am working on a puzzle, my final task here is to identify edge type of the puzzle piece.
as shown in the above image i have mange to rotate and crop out every edge of the piece in same angle. my next step is to separate the edge line into a separate image like as shown in the image bellow
then to fill up one side of the line with with a color and try to process it to decide what type of edge it is.
i dont see a proper way to separate the edge line from the image for now.
my approach::
one way to do is scan pixel by pixel and find the black pixels where there is a nun black pixel next to it. this is a code that i can implement. but it feels like a primitive and a time consuming approach.
so if there you can offer any help or ideas, or any completely different way to detect the hollows and humps.
thanks in advance..
First convert your color image to grayscale. Then apply a threshold, say zero to obtain a binary image. You may have to use morphological operations to further process the binary image if there are holes. Then find the contours of this image and draw them to a new image.
A simple code is given below, using opencv 4.0.1 in python 2.7.
bgr = cv2.imread('puzzle.png')
gray = cv2.cvtColor(bgr, cv2.COLOR_BGR2GRAY)
_, roi = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY)
cv2.imwrite('/home/dhanushka/stack/roi.png', roi)
cont = cv2.findContours(roi, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
output = np.zeros(gray.shape, dtype=np.uint8)
cv2.drawContours(output, cont[0], -1, (255, 255, 255))
# removing boundary
boundary = 255*np.ones(gray.shape, dtype=np.uint8)
boundary[1:boundary.shape[0]-1, 1:boundary.shape[1]-1] = 0
toremove = output & boundary
output = output ^ toremove

Masking horizontal and vertical lines with Open CV

I'm trying to remove horizontal and vertical lines in this image in order to have more distinct text areas.
I'm using the below code, which follows this guide
image = cv2.imread('image.jpg')
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
thresh = cv2.adaptiveThreshold(
blurred, 255,
cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY_INV,
25,
15
)
# Create the images that will use to extract the horizontal and vertical lines
horizontal = np.copy(thresh)
vertical = np.copy(thresh)
# Specify size on horizontal axis
cols = horizontal.shape[1]
horizontal_size = math.ceil(cols / 20)
# Create structure element for extracting horizontal lines through morphology operations
horizontalStructure = cv2.getStructuringElement(cv2.MORPH_RECT, (horizontal_size, 1))
# Apply morphology operations
horizontal = cv2.erode(horizontal, horizontalStructure)
horizontal = cv2.dilate(horizontal, horizontalStructure)
# Show extracted horizontal lines
cv2.imwrite("horizontal.jpg", horizontal)
# Specify size on vertical axis
rows = vertical.shape[0]
verticalsize = math.ceil(rows / 20)
# Create structure element for extracting vertical lines through morphology operations
verticalStructure = cv2.getStructuringElement(cv2.MORPH_RECT, (1, verticalsize))
# Apply morphology operations
vertical = cv2.erode(vertical, verticalStructure)
vertical = cv2.dilate(vertical, verticalStructure)
After this, I know I would need to isolate the lines and mask the original image with the white lines, however I'm not really sure on how to proceed.
Does anyone have any suggestion?
Jeru's answer already gives you what you want. But I wanted to add an alternative that is maybe a bit more general than what you have so far.
You are converting the color image to gray-value, then apply adaptive threshold in an attempt to find lines. You filter this to get only the long horizontal and vertical lines, then use that mask to paint the original image white at those locations.
Here we look for all lines, and remove them from the image making painting them with whatever the surrounding color is. This process does not involve thresholding at all, all morphological operations are applied to the channels of the color image.
Ideally we'd use color morphology, but implementations of that are rare. Mathematical morphology is based on maximum and minimum operations, and the maximum or minimum of a color triplet (i.e. a vector) is not well defined.
So instead we apply the following procedure to each of the three color channels independently. This should produce results that are good enough for this application:
Extract the red channel: take the input RGB image, and extract the first channel. This is a gray-value image. We'll call this image channel.
Apply a top-hat filter to detect the thin structures: the difference between a closing with a small structuring element (SE) applied to channel, and channel (a closing is a dilation followed by an erosion with the same SE, you're using this to find lines as well). We'll call this output thin. thin = closing(channel)-channel. This step is similar to your local thresholding, but no actual threshold is applied. The resulting intensities indicate how dark the lines are w.r.t. to background. If you add thin to channel, you'll fill in these thin structures. The size of the SE here determines what is considered "thin".
Filter out the short lines, to keep only the long ones: apply an opening with a long horizontal SE to thin, and an opening with a long vertical SE to thin, and take the maximum of the two result. We'll call this lines. Note that this is the same process you used to generate horizontal and vertical. Instead of adding them together as Jeru suggested, we take the maximum. This makes it so that output intensities still match the contrast in channel. (In Mathematical Morphology parlance, the supremum of openings is an opening). The length of the SEs here determines what is long enough to be a line.
Fill in the lines in the original image channel: now simply add lines to channel. Write the result to the first channel of the output image.
Repeat the same process with the other two channels.
Using DIPlib this is quite a simple script:
import diplib as dip
input = dip.ImageReadTIFF('/home/cris/tmp/T4tbM.tif')
output = input.Copy()
for ii in range(0,3):
channel = output.TensorElement(ii)
thin = dip.Closing(channel, dip.SE(5, 'rectangular')) - channel
vertical = dip.Opening(thin, dip.SE([100,1], 'rectangular'))
horizontal = dip.Opening(thin, dip.SE([1,100], 'rectangular'))
lines = dip.Supremum(vertical,horizontal)
channel += lines # overwrites output image
Edit:
When increasing the size of the first SE, above set to 5, to be large enough to remove also the thicker gray bar in the middle of the example image, causes part of the block containing the inverted text "POWERLIFTING" to be left in thin.
To filter out those parts as well, we can change the definition of thin as follows:
notthin = dip.Closing(channel, dip.SE(11, 'rectangular'), ["add max"]))
notthin = dip.MorphologicalReconstruction(notthin, channel, 1, "erosion")
thin = notthin - channel
That is, instead of thin=closing(channel)-channel, we do thin=reconstruct(closing(channel))-channel. The reconstruction simply expands selected (not thin) structures so that where part of a structure was selected, now the full structure is selected. The only thing that is now in thin are lines that are not connected to thicker structures.
I've also added "add max" as a boundary condition -- this causes the closing to expand the area outside the image with white, and therefore see lines at the edges of the image as lines.
To elaborate more here is what to do:
First, add the resulting images of vertical and horizontal. This will give you an image containing both the horizontal and vertical lines. Since both the images are of type uint8 (unsigned 8-bit integer) adding them won't be a problem:
res = vertical + horizontal
Finally, mask the resulting image obtained above with the original 3-channel image. This can be accomplished using cv2.bitwise_and:
fin = cv2.bitwise_and(image, image, mask = cv2.bitwise_not(res))
A sample for removing horizontal lines.
Sample image:
import cv2
import numpy as np
img = cv2.imread("Image path", 0)
if len(img.shape) != 2:
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
else:
gray = img
gray = cv2.bitwise_not(gray)
bw = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
cv2.THRESH_BINARY, 15, -2)
horizontal = np.copy(bw)
cols = horizontal.shape[1]
horizontal_size = cols // 30
horizontalStructure = cv2.getStructuringElement(cv2.MORPH_RECT, (horizontal_size, 1))
horizontal = cv2.erode(horizontal, horizontalStructure)
horizontal = cv2.dilate(horizontal, horizontalStructure)
cv2.imwrite("horizontal_lines_extracted.png", horizontal)
horizontal_inv = cv2.bitwise_not(horizontal)
cv2.imwrite("inverse_extracted.png", horizontal_inv)
masked_img = cv2.bitwise_and(gray, gray, mask=horizontal_inv)
masked_img_inv = cv2.bitwise_not(masked_img)
cv2.imwrite("masked_img.jpg", masked_img_inv)
=> horizontal_lines_extracted.png:
=> inverse_extracted.png
=> masked_img.png(resultant image after masking)
Do you want something like this?
image = cv2.imread('image.jpg', cv2.IMREAD_UNCHANGED);
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
ret,binary = cv2.threshold(gray, 170, 255, cv2.THRESH_BINARY)#|cv2.THRESH_OTSU)
V = cv2.Sobel(binary, cv2.CV_8U, dx=1, dy=0)
H = cv2.Sobel(binary, cv2.CV_8U, dx=0, dy=1)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))
V = cv2.morphologyEx(V, cv2.MORPH_DILATE, kernel, iterations = 2)
H = cv2.morphologyEx(H, cv2.MORPH_DILATE, kernel, iterations = 2)
rows,cols = image.shape[:2]
mask = np.zeros(image.shape[:2], dtype=np.uint8)
contours = cv2.findContours(V, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)[1]
for cnt in contours:
(x,y,w,h) = cv2.boundingRect(cnt)
# manipulate these values to change accuracy
if h > rows/2 and w < 10:
cv2.drawContours(mask, [cnt], -1, 255,-1)
contours = cv2.findContours(H, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)[1]
for cnt in contours:
(x,y,w,h) = cv2.boundingRect(cnt)
# manipulate these values to change accuracy
if w > cols/2 and h < 10:
cv2.drawContours(mask, [cnt], -1, 255,-1)
mask = cv2.morphologyEx(mask, cv2.MORPH_DILATE, kernel, iterations = 2)
image[mask == 255] = (255,255,255)
So I have found a solution by using part of Juke's suggestion. Eventually I would need to continue to process the image using a binary mode so figured I might keep it that way.
First, add the resulting images of vertical and horizontal. This will give you an image containing both the horizontal and vertical lines. Since both the images are of type uint8 (unsigned 8-bit integer) adding them won't be a problem:
res = vertical + horizontal
Then, subtract res from the original input image tresh, which was used to find the lines. This will remove the white lines and can than be used to apply some other morphology transformations.
fin = thresh - res

Removing horizontal underlines

I am attempting to pull text from a few hundred JPGs that contain information on capital punishment records; the JPGs are hosted by the Texas Department of Criminal Justice (TDCJ). Below is an example snippet with personally identifiable information removed.
I've identified the underlines as being the impediment to proper OCR--if I go in, screenshot a sub-snippet and manually white-out lines, the resulting OCR through pytesseract is very good. But with underlines present, it's extremely poor.
How can I best remove these horizontal lines? What I have tried:
Started on OpenCV doc's walkthrough: Extract horizontal and vertical lines by using morphological operations. Got stuck pretty quickly, because I know zero C++.
Followed along with Removing Horizontal Lines in image - ended up with an illegible string.
Followed along with Removing long horizontal/vertical lines from edge image using OpenCV - wasn't able to get the intuition behind sizing the array of zeros here.
Tagging this question with c++ in the hope that someone could help to translate Step 5 of the docs walkthrough to Python. I've tried a batch of transformations such as Hugh Line Transform, but I am feeling around in the dark within a library and area I have zero prior experience with.
import cv2
# Inverted grayscale
img = cv2.imread('rsnippet.jpg', cv2.IMREAD_GRAYSCALE)
img = cv2.bitwise_not(img)
# Transform inverted grayscale to binary
th = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
cv2.THRESH_BINARY, 15, -2)
# An alternative; Not sure if `th` or `th2` is optimal here
th2 = cv2.threshold(img, 170, 255, cv2.THRESH_BINARY)[1]
# Create corresponding structure element for horizontal lines.
# Start by cloning th/th2.
horiz = th.copy()
r, c = horiz.shape
# Lost after here - not understanding intuition behind sizing/partitioning
All the answers so far seem to be using morphological operations. Here's something a bit different. This should give fairly good results if the lines are horizontal.
For this I use a part of your sample image shown below.
Load the image, convert it to gray scale and invert it.
import cv2
import numpy as np
import matplotlib.pyplot as plt
im = cv2.imread('sample.jpg')
gray = 255 - cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
Inverted gray-scale image:
If you scan a row in this inverted image, you'll see that its profile looks different depending on the presence or the absence of a line.
plt.figure(1)
plt.plot(gray[18, :] > 16, 'g-')
plt.axis([0, gray.shape[1], 0, 1.1])
plt.figure(2)
plt.plot(gray[36, :] > 16, 'r-')
plt.axis([0, gray.shape[1], 0, 1.1])
Profile in green is a row where there's no underline, red is for a row with underline. If you take the average of each profile, you'll see that red one has a higher average.
So, using this approach you can detect the underlines and remove them.
for row in range(gray.shape[0]):
avg = np.average(gray[row, :] > 16)
if avg > 0.9:
cv2.line(im, (0, row), (gray.shape[1]-1, row), (0, 0, 255))
cv2.line(gray, (0, row), (gray.shape[1]-1, row), (0, 0, 0), 1)
cv2.imshow("gray", 255 - gray)
cv2.imshow("im", im)
Here are the detected underlines in red, and the cleaned image.
tesseract output of the cleaned image:
Convthed as th(
shot once in the
she stepped fr<
brother-in-lawii
collect on life in
applied for man
to the scheme i|
Reason for using part of the image should be clear by now. Since personally identifiable information have been removed in the original image, the threshold wouldn't have worked. But this should not be a problem when you apply it for processing. Sometimes you may have to adjust the thresholds (16, 0.9).
The result does not look very good with parts of the letters removed and some of the faint lines still remaining. Will update if I can improve it a bit more.
UPDATE:
Dis some improvements; cleanup and link the missing parts of the letters. I've commented the code, so I believe the process is clear. You can also check the resulting intermediate images to see how it works. Results are a bit better.
tesseract output of the cleaned image:
Convicted as th(
shot once in the
she stepped fr<
brother-in-law. ‘
collect on life ix
applied for man
to the scheme i|
tesseract output of the cleaned image:
)r-hire of 29-year-old .
revolver in the garage ‘
red that the victim‘s h
{2000 to kill her. mum
250.000. Before the kil
If$| 50.000 each on bin
to police.
python code:
import cv2
import numpy as np
import matplotlib.pyplot as plt
im = cv2.imread('sample2.jpg')
gray = 255 - cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
# prepare a mask using Otsu threshold, then copy from original. this removes some noise
__, bw = cv2.threshold(cv2.dilate(gray, None), 128, 255, cv2.THRESH_BINARY or cv2.THRESH_OTSU)
gray = cv2.bitwise_and(gray, bw)
# make copy of the low-noise underlined image
grayu = gray.copy()
imcpy = im.copy()
# scan each row and remove lines
for row in range(gray.shape[0]):
avg = np.average(gray[row, :] > 16)
if avg > 0.9:
cv2.line(im, (0, row), (gray.shape[1]-1, row), (0, 0, 255))
cv2.line(gray, (0, row), (gray.shape[1]-1, row), (0, 0, 0), 1)
cont = gray.copy()
graycpy = gray.copy()
# after contour processing, the residual will contain small contours
residual = gray.copy()
# find contours
contours, hierarchy = cv2.findContours(cont, cv2.RETR_CCOMP, cv2.CHAIN_APPROX_SIMPLE)
for i in range(len(contours)):
# find the boundingbox of the contour
x, y, w, h = cv2.boundingRect(contours[i])
if 10 < h:
cv2.drawContours(im, contours, i, (0, 255, 0), -1)
# if boundingbox height is higher than threshold, remove the contour from residual image
cv2.drawContours(residual, contours, i, (0, 0, 0), -1)
else:
cv2.drawContours(im, contours, i, (255, 0, 0), -1)
# if boundingbox height is less than or equal to threshold, remove the contour gray image
cv2.drawContours(gray, contours, i, (0, 0, 0), -1)
# now the residual only contains small contours. open it to remove thin lines
st = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (3, 3))
residual = cv2.morphologyEx(residual, cv2.MORPH_OPEN, st, iterations=1)
# prepare a mask for residual components
__, residual = cv2.threshold(residual, 0, 255, cv2.THRESH_BINARY)
cv2.imshow("gray", gray)
cv2.imshow("residual", residual)
# combine the residuals. we still need to link the residuals
combined = cv2.bitwise_or(cv2.bitwise_and(graycpy, residual), gray)
# link the residuals
st = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (1, 7))
linked = cv2.morphologyEx(combined, cv2.MORPH_CLOSE, st, iterations=1)
cv2.imshow("linked", linked)
# prepare a msak from linked image
__, mask = cv2.threshold(linked, 0, 255, cv2.THRESH_BINARY)
# copy region from low-noise underlined image
clean = 255 - cv2.bitwise_and(grayu, mask)
cv2.imshow("clean", clean)
cv2.imshow("im", im)
One can try this.
img = cv2.imread('img_provided_by_op.jpg', 0)
img = cv2.bitwise_not(img)
# (1) clean up noises
kernel_clean = np.ones((2,2),np.uint8)
cleaned = cv2.erode(img, kernel_clean, iterations=1)
# (2) Extract lines
kernel_line = np.ones((1, 5), np.uint8)
clean_lines = cv2.erode(cleaned, kernel_line, iterations=6)
clean_lines = cv2.dilate(clean_lines, kernel_line, iterations=6)
# (3) Subtract lines
cleaned_img_without_lines = cleaned - clean_lines
cleaned_img_without_lines = cv2.bitwise_not(cleaned_img_without_lines)
plt.imshow(cleaned_img_without_lines)
plt.show()
cv2.imwrite('img_wanted.jpg', cleaned_img_without_lines)
Demo
The method is based on the answer by Zaw Lin. He/she identified lines in the image and just did subtraction to get rid of them. However, we cannot just subtract lines here because we have letters e, t, E, T, - containing lines as well! If we just subtract horizontal lines from the image, e will be nearly identical to c. - will be gone...
Q: How do we find lines?
To find lines, we can make use of erode function. To make use of erode, we need to define a kernel. (You can think of a kernel as a window/shape that functions operate on.)
The kernel slides through
the image (as in 2D convolution). A pixel in the original image
(either 1 or 0) will be considered 1 only if all the pixels under the
kernel is 1, otherwise it is eroded (made to zero). -- (Source).
To extract lines, we define a kernel, kernel_line as np.ones((1, 5)), [1, 1, 1, 1, 1]. This kernel will slide through the image and erode pixels that have 0 under the kernel.
More specifically, while the kernel is applied to one pixel, it will capture the two pixels to its left and two to its right.
[X X Y X X]
^
|
Applied to Y, `kernel_line` captures Y's neighbors. If any of them is not
0, Y will be set to 0.
Horizontal lines will be preserved under this kernel while pixel that don't have horizontal neighbors will disappear. This is how we capture lines with the following line.
clean_lines = cv2.erode(cleaned, kernel_line, iterations=6)
Q: How do we avoid extracting lines within e, E, t, T, and -?
We will combine erosion and dilation with iteration parameter.
clean_lines = cv2.erode(cleaned, kernel_line, iterations=6)
You might have noticed the iterations=6 part. The effect of this parameter will make the flat part in e, E, t, T, - disappear. This is because while we apply the same operation multiple times, the boundary part of these lines would be shrinking. (Applying the same kernel, only the boundary part will meet 0s and become 0 as the result.) We use this trick to make the lines in these characters disappear.
This, however, comes with a side effect that the long underline part that we want to get rid of also shrinks. We can grow it with dilate!
clean_lines = cv2.dilate(clean_lines, kernel_line, iterations=6)
Contrary to erosion that shrinks a image, dilation makes image larger. While we still have the same kernel, kernel_line, if any part under the kernel is 1, the target pixel will be 1. Applying this, the boundary will grow back. (The part in e, E, t, T, - won't grow back if we pick the parameter carefully such that it disappears at the erosion part.)
With this additional trick, we can successfully get rid of the lines without hurting e, E, t, T, and -.
As most of the lines to be detected in your source are horizontal-long-lines, similar with my another answer, that is Find single color, horizontal spaces in image
This is the source image:
Here are my two main steps to remove the long horizontal line:
Do morph-close with long line kernel on the gray image
kernel = np.ones((1,40), np.uint8)
morphed = cv2.morphologyEx(gray, cv2.MORPH_CLOSE, kernel)
then, get the morphed image contains the long lines:
Invert the morphed image, and add to the source image:
dst = cv2.add(gray, (255-morphed))
then get image with long lines removed:
Simple enough, right? And also there exist small line segments, I think it has little effects on OCR. Notice, almost all chars keep original, except g,j,p,q,y,Q, maybe a little diffent. But mordern OCR tools such as Tesseract( with LSTM technology) has ability to deal with such simple confusion.
0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
Total code to save removed image as line_removed.png:
#!/usr/bin/python3
# 2018.01.21 16:33:42 CST
import cv2
import numpy as np
## Read
img = cv2.imread("img04.jpg")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
## (1) Create long line kernel, and do morph-close-op
kernel = np.ones((1,40), np.uint8)
morphed = cv2.morphologyEx(gray, cv2.MORPH_CLOSE, kernel)
cv2.imwrite("line_detected.png", morphed)
## (2) Invert the morphed image, and add to the source image:
dst = cv2.add(gray, (255-morphed))
cv2.imwrite("line_removed.png", dst)
Update # 2018.01.23 13:15:15 CST:
Tesseract is a powerful tool to do OCR. Today I install the tesseract-4.0 and pytesseract. Then I do ocr using pytesseract on the my result line_removed.png.
import cv2
import pytesseract
img = cv2.imread("line_removed.png")
print(pytesseract.image_to_string(img, lang="eng"))
This is the reuslt, fine to me.
Convicted as the triggerman in the murder—for—hire of 29—year—old .
shot once in the head with a 357 Magnum revolver in the garage of her home at ..
she stepped from her car. Police discovered that the victim‘s husband,
brother—in—law, _ ______ paid _ $2,000 to kill her, apparently so .. _
collect on life insurance policies totaling $250,000. Before the killing, .
applied for additional life insurance policies of $150,000 each on himself and his wife
to the scheme in three different statements to police.
was
and
could
had also
. confessed
A few suggestions:
Given that you're starting with a JPEG, don't compound the loss. Save your intermediate files as PNGs. Tesseract copes with those just fine.
Scale the image 2x (using cv2.resize) handing to Tesseract.
Try detecting and removing the black underline. (This question might help). Doing that while preserving descenders might be tricky.
Explore Tesseract command-line options, of which there are many (and they're horribly documented, some requiring dives into C++ source to try to understand them). It's looking like ligatures are causing some grief. IIRC (it's been a while), there's a setting or two that might help.

image analysis (opencv or scikit image), deskewing of noisy scan

I do have some old bank statements as scan and would like to use google´s thesseract engine to extract the text. Works pretty well unless the image is slightly rotated. I thought of detecting the dashed lines in order to estimate the slope and afterwards the angle of rotation. However, it is tricky to get the parameters right.
If I could get rid of the large line artefact, I might use the minimum rotated bounding box (cv2.minAreaRect) on the text characters.
Maybe another strategy is suited better ? Any ideas ?
An example image (deleted some characters for data protection):
EIDT: I have found a solution which seems to work. However, I am stil wondering if there might be a faster solution (takes about 1.5 seconds per Image)
I do use template matching from skimage with following template:
template = plt.imread('template_long.png')
template = rgb2gray(template)
template = template > threshold_mean(template)
for i in range(1):
# read in image
img = cv2.imread('conversion/umsatz_{}.png'.format(i))
# convert to grayscale
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
gray = cv2.bitwise_not(gray)
# threshold the image, setting all foreground pixels to
# 255 and all background pixels to 0
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
# edge detection
#edges = cv2.Canny(thresh,2,100, apertureSize = 3)
# fill the holes from detected edges
#kernel = np.ones((2,2),np.uint8)
#dilate = cv2.dilate(thresh, kernel, iterations=1)
result = match_template(thresh, template)
mask = result < 0.5
r = result.copy()
r[mask] = 0
r[~mask] = 1
plt.imshow(r)

Categories