How to improve the accuracy of pytesseract? - python

I startetd an ocr project a few days ago. The input image is a really noisy gray image with white letters. With the EAST text detector it is possible to recognize the text and draw borders around.
After that i crop the rectangle do some image processing. After that, I pass the processed parts to pytesseract, but with bad results. Images and source vode is below. Maybe some have a good idea for better image processing and/or pytesseract settings.
Images
Input image
Rectangles after Recognition
First part
Second part
Third part
Tesseract Result
AY U N74 O54
Sourcecode for image processing
kernel = cv2.getStructuringElement(cv2.MORPH_RECT , (8,8))
kernel2 = np.ones((3,3),np.uint8)
kernel3 = np.ones((5,5),np.uint8)
gray = cv2.cvtColor(cropped, cv2.COLOR_BGR2GRAY)
gray = cv2.resize(gray, None, fx=7, fy=7)
gray = cv2.GaussianBlur(gray, (5,5), 1)
#cv2.medianBlur(gray, 5)
gray = cv2.dilate(gray, kernel3, iterations = 1)
gray = cv2.erode(gray, kernel3, iterations = 1)
gray = cv2.morphologyEx(gray, cv2.MORPH_DILATE, kernel3)
gray = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
gray = cv2.bitwise_not(gray)
ts_img = Image.fromarray(gray)
txt = pytesseract.image_to_string(ts_img, config='--oem 3 --psm 12 -c tessedit_char_whitelist=12345678ABCDEFGHIJKLMNOPQRSTUVWXYZ load_system_dawg=false load_freq_dawg=false')
I tried some other psm settings like psm 11, psm 8 and ps6. The results are different, but also bad.
I guess the biggest problem are the black spots which are connected with the letters and digits but I have no idea how to remove them.
I appreciate every help :)

OCR software will perform poorly when interpreting this text as a word or sentence because it's expecting real English words and not a random combination of characters. I'd recommend analyzing the text as individual characters. I solved the (example) problem by first determining which groups of labeled pixels (connected components of a thresholded image) are characters based on the size and location of the group. Then for each image portion containing a (single) character I use easyocr to obtain the character. I found that pytesseract performs poorly or not at all on single characters (even when setting --psm 10 and other arguments). The code below produces this result:
OCR out: 6UAE005X0721295
import cv2
import matplotlib.pyplot as plt
import numpy as np
import easyocr
reader = easyocr.Reader(["en"])
# Threshold image and determine connected components
img_bgr = cv2.imread("C5U3m.png")
img_gray = cv2.cvtColor(img_bgr[35:115, 30:], cv2.COLOR_BGR2GRAY)
ret, img_bin = cv2.threshold(img_gray, 195, 255, cv2.THRESH_BINARY_INV)
retval, labels = cv2.connectedComponents(255 - img_bin, np.zeros_like(img_bin), 8)
fig, axs = plt.subplots(4)
axs[0].imshow(img_gray, cmap="gray")
axs[0].set_title("grayscale")
axs[1].imshow(img_bin, cmap="gray")
axs[1].set_title("thresholded")
axs[2].imshow(labels, vmin=0, vmax=retval - 1, cmap="tab20b")
axs[2].set_title("connected components")
# Find and process individual characters
OCR_out = ""
all_img_chars = np.zeros((labels.shape[0], 0), dtype=np.uint8)
labels_xmin = [np.argwhere(labels == i)[:, 1].min() for i in range(0, retval)]
# Process the labels (connected components) from left to right
for i in np.argsort(labels_xmin):
label_yx = np.argwhere(labels == i)
label_ymin = label_yx[:, 0].min()
label_ymax = label_yx[:, 0].max()
label_xmin = label_yx[:, 1].min()
label_xmax = label_yx[:, 1].max()
# Characters are large blobs that don't border the top/bottom edge
if label_yx.shape[0] > 250 and label_ymin > 0 and label_ymax < labels.shape[0]:
img_char = img_bin[:, label_xmin - 3 : label_xmax + 3]
all_img_chars = np.hstack((all_img_chars, img_char))
# Use EasyOCR on single char (pytesseract performs poorly on single characters)
OCR_out += reader.recognize(img_char, detail=0)[0]
axs[3].imshow(all_img_chars, cmap="gray")
axs[3].set_title("individual characters")
fig.show()
print("Thruth: 6UAE005X0721295")
print("OCR out: " + OCR_out)

Related

How to remove noise around numbers using OpenCV

I'm trying to use Tesseract-OCR to get the readings on below images but having issues getting consistent results with the spotted background. I have below configuration on my pytesseract
CONFIG = f"—psm 6 -c tessedit_char_whitelist=01234567890ABCDEFGHIJKLMNOPQRSTUVWXYZÅÄabcdefghijklmnopqrstuvwxyzåäö.,-"
I have also tried below image pre-processing with some good results, but still not perfect results
blur = cv2.blur(img,(4,4))
(T, threshInv) = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
What I want is to consistently be able to identify the numbers and the decimal separator. What image pre-processing could help in getting consistent results on images as below?
That was a challenge but i think i have an interesting approach: Pattern-matching
If you zoom in, you realize that the pattern in the back only has 4 possible dots, a single full pixle, a double full pixel and a double pixel with a medium left or right. So what i did was grab these 4 patterns from the image with 17.160.000,00 and got to work. Save these to load again, i just grabbed them on the fly
img = cv2.imread('C:/Users/***/17.jpg', cv2.IMREAD_GRAYSCALE)
pattern_1 = img[2:5,1:5]
pattern_2 = img[6:9,5:9]
pattern_3 = img[6:9,11:15]
pattern_4 = img[9:12,22:26]
# just to show it carries over to other pics ;)
img = cv2.imread('C:/Users/****/6.jpg', cv2.IMREAD_GRAYSCALE)
Actual Pattern Matching
Next we match all the patterns and threshold to find all occurrences, i used 0.7 but you can play around with it a little. These patterns take off some pixels on the side and only match a sigle pixel on the left so we pad twice (one with an extra) to hit both for the first 3 patterns. The last one is the single pixel so it doesnt need it
res_1 = cv2.matchTemplate(img,pattern_1,cv2.TM_CCOEFF_NORMED )
thresh_1 = cv2.threshold(res_1,0.7,1,cv2.THRESH_BINARY)[1].astype(np.uint8)
pat_thresh_1 = np.pad(thresh_1,((1,1),(1,2)),'constant')
pat_thresh_15 = np.pad(thresh_1,((1,1),(2,1)), 'constant')
res_2 = cv2.matchTemplate(img,pattern_2,cv2.TM_CCOEFF_NORMED )
thresh_2 = cv2.threshold(res_2,0.7,1,cv2.THRESH_BINARY)[1].astype(np.uint8)
pat_thresh_2 = np.pad(thresh_2,((1,1),(1,2)),'constant')
pat_thresh_25 = np.pad(thresh_2,((1,1),(2,1)), 'constant')
res_3 = cv2.matchTemplate(img,pattern_3,cv2.TM_CCOEFF_NORMED )
thresh_3 = cv2.threshold(res_3,0.7,1,cv2.THRESH_BINARY)[1].astype(np.uint8)
pat_thresh_3 = np.pad(thresh_3,((1,1),(1,2)),'constant')
pat_thresh_35 = np.pad(thresh_3,((1,1),(2,1)), 'constant')
res_4 = cv2.matchTemplate(img,pattern_4,cv2.TM_CCOEFF_NORMED )
thresh_4 = cv2.threshold(res_4,0.7,1,cv2.THRESH_BINARY)[1].astype(np.uint8)
pat_thresh_4 = np.pad(thresh_4,((1,1),(1,2)),'constant')
Editing the Image
Now the only thing left to do is remove all the matches from the image. Since we have a mostly white backround we just set them to 255 to blend in.
img[pat_thresh_1==1] = 255
img[pat_thresh_15==1] = 255
img[pat_thresh_2==1] = 255
img[pat_thresh_25==1] = 255
img[pat_thresh_3==1] = 255
img[pat_thresh_35==1] = 255
img[pat_thresh_4==1] = 255
Output
Edit:
Take a look at Abstracts answer as well for refining this output and tesseract finetuning
You may find a solution using a slightly more complex approach by filtering in the frequency domain instead of the spatial domain. The thresholds might require some tweaking depending on how tesseract performs with the output images.
Implementation:
import cv2
import numpy as np
from matplotlib import pyplot as plt
img = cv2.imread('C:\\Test\\number.jpg', cv2.IMREAD_GRAYSCALE)
# Perform 2D FFT
f = np.fft.fft2(img)
fshift = np.fft.fftshift(f)
magnitude_spectrum = 20*np.log(np.abs(fshift))
# Squash all of the frequency magnitudes above a threshold
for idx, x in np.ndenumerate(magnitude_spectrum):
if x > 195:
fshift[idx] = 0
# Inverse FFT back into the real-spatial-domain
f_ishift = np.fft.ifftshift(fshift)
img_back = np.fft.ifft2(f_ishift)
img_back = np.real(img_back)
img_back = cv2.normalize(img_back, None, alpha=0, beta=255, norm_type=cv2.NORM_MINMAX, dtype=cv2.CV_32F)
out_img = np.copy(img)
# Use the inverted FFT image to keep only the black values below a threshold
for idx, x in np.ndenumerate(img_back):
if x < 100:
out_img[idx] = 0
else:
out_img[idx] = 255
plt.subplot(131),plt.imshow(img, cmap = 'gray')
plt.title('Input Image'), plt.xticks([]), plt.yticks([])
plt.subplot(132),plt.imshow(img_back, cmap = 'gray')
plt.title('Reversed FFT'), plt.xticks([]), plt.yticks([])
plt.subplot(133),plt.imshow(out_img, cmap = 'gray')
plt.title('Output'), plt.xticks([]), plt.yticks([])
plt.show()
Output:
Median Blur Implementation:
import cv2
import numpy as np
img = cv2.imread('C:\\Test\\number.jpg', cv2.IMREAD_GRAYSCALE)
blur = cv2.medianBlur(img, 3)
for idx, x in np.ndenumerate(blur):
if x < 20:
blur[idx] = 0
cv2.imshow("Test", blur)
cv2.waitKey()
Output:
Final Edit:
So using Eumel's solution and combining this bit of code on the bottom of it yields a 100% successful result:
img[pat_thresh_1==1] = 255
img[pat_thresh_15==1] = 255
img[pat_thresh_2==1] = 255
img[pat_thresh_25==1] = 255
img[pat_thresh_3==1] = 255
img[pat_thresh_35==1] = 255
img[pat_thresh_4==1] = 255
# Eumel's code above this line
img = cv2.erode(img, np.ones((3,3)))
cv2.imwrite("out.png", img)
cv2.imshow("Test", img)
print(pytesseract.image_to_string(Image.open("out.png"), lang='eng', config='--psm 10 --oem 3 -c tessedit_char_whitelist=0123456789.,'))
Output Image Examples:
Whitelisting the tesseract characters appears to help quite a bit as well to prevent false identification.

Robust Algorithm to detect uneven illumination in images [Detection Only Needed]

One of the biggest challenges in tesseract OCR text recognition is the uneven illumination of images.
I need an algorithm that can decide the image is containing uneven illuminations or not.
Test Images
I Attached the images of no illumination image, glare image( white-spotted image) and shadow containing image.
If we give an image to the algorithm, the algorithm should divide into two class like
No uneven illumination - our no illumination image will fall into this category.
Uneven illumination - Our glare image( white-spotted image), shadow containing image will fall in this category.
No Illumination Image - Category A
UnEven Illumination Image (glare image( white-spotted image)) Category B
Uneven Illumination Image (shadow containing an image) Category B
Initial Approach
Change colour space to HSV
Histogram analysis of the value channel of HSV to identify the uneven illumination.
Instead of the first two steps, we can use the perceived brightness
channel instead of the value channel of HSV
Set a low threshold value to get the number of pixels which are less than the low threshold
Set a high threshold value to get the number of pixels which are higher than the high threshold
percentage of low pixels values and percentage of high pixel values to detect uneven lightning condition (The setting threshold for percentage as well )
But I could not find big similarities between uneven illumination
images. I just found there are some pixels that have low value and
some pixels have high value with histogram analysis.
Basically what I feel is if setting some threshold values in the low and to find how many pixels are less than the low threshold and setting some high threshold value to find how many pixels are greater than that threshold. with the pixels counts can we come to a conclusion to detect uneven lightning conditions in images? Here we need to finalize two threshold values and the percentage of the number of pixels to come to the conclusion.
def show_hist_v(img_path):
img = cv2.imread(img_path)
hsv_img = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
h,s,v = cv2.split(hsv_img)
histr =cv2.calcHist(v, [0], None, [255],[0,255])
plt.plot(histr)
plt.show()
low_threshold =np.count_nonzero(v < 50)
high_threshold =np.count_nonzero(v >200)
total_pixels = img.shape[0]* img.shape[1]
percenet_low =low_threshold/total_pixels*100
percenet_high =high_threshold/total_pixels*100
print("Total Pixels - {}\n Pixels More than 200 - {} \n Pixels Less than 50 - {} \n Pixels percentage more than 200 - {} \n Pixel spercentage less than 50 - {} \n".format(total_pixels,high_threshold,low_threshold,percenet_low,percenet_high))
return total_pixels,high_threshold,low_threshold,percenet_low,percenet_high
So can someone improve my initial approach or give better than this approach to detect uneven illumination in images for general cases?
Also, I tried perceived brightness instead of the value channel since the value channel takes the maximum of (b,g,r) values the perceive brightness is a good choice as I think
def get_perceive_brightness( float_img):
float_img = np.float64(float_img) # unit8 will make overflow
b, g, r = cv2.split(float_img)
float_brightness = np.sqrt(
(0.241 * (r ** 2)) + (0.691 * (g ** 2)) + (0.068 * (b ** 2)))
brightness_channel = np.uint8(np.absolute(float_brightness))
return brightness_channel
def show_hist_v(img_path):
img = cv2.imread(img_path)
v = get_perceive_brightness(img)
histr =cv2.calcHist(v, [0], None, [255],[0,255])
plt.plot(histr)
plt.show()
low_threshold =np.count_nonzero(v < 50)
high_threshold =np.count_nonzero(v >200)
total_pixels = img.shape[0]* img.shape[1]
percenet_low =low_threshold/total_pixels*100
percenet_high =high_threshold/total_pixels*100
print("Total Pixels - {}\n Pixels More than 200 - {} \n Pixels Less than 50 - {} \n Pixels percentage more than 200 - {} \n Pixel spercentage less than 50 - {} \n".format(total_pixels,high_threshold,low_threshold,percenet_low,percenet_high))
return total_pixels,high_threshold,low_threshold,percenet_low,percenet_high
Histogram analysis of perceived brightness channel
As Ahmet suggested.
def get_percentage_of_binary_pixels(img=None, img_path=None):
if img is None:
if img_path is not None:
gray_img = cv2.imread(img_path, 0)
else:
return "No img or img_path"
else:
print(img.shape)
if len(img.shape) > 2:
gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
else:
gray_img = img
h, w = gray_img.shape
guassian_blur = cv2.GaussianBlur(gray_img, (5, 5), 0)
thresh_value, otsu_img = cv2.threshold(guassian_blur, 0, 255,
cv2.THRESH_BINARY + cv2.THRESH_OTSU)
cv2.imwrite("binary/{}".format(img_path.split('/')[-1]), otsu_img)
black_pixels = np.count_nonzero(otsu_img == 0)
# white_pixels = np.count_nonzero(otsu_img == 255)
black_pixels_percentage = black_pixels / (h * w) * 100
# white_pixels_percentage = white_pixels / (h * w) * 100
return black_pixels_percentage
when we get more than 35% of black_ pixels percentage with otsu binarization, we can detect the uneven illumination images around 80 percentage. When the illumination occurred in a small region of the image, the detection fails.
Thanks in advance
I suggest using the division trick to separate text from the background, and then calculate statistics on the background only. After setting some reasonable thresholds it is easy to create classifier for the illumination.
def get_image_stats(img_path, lbl):
img = cv2.imread(img_path)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (25, 25), 0)
no_text = gray * ((gray/blurred)>0.99) # select background only
no_text[no_text<10] = no_text[no_text>20].mean() # convert black pixels to mean value
no_bright = no_text.copy()
no_bright[no_bright>220] = no_bright[no_bright<220].mean() # disregard bright pixels
print(lbl)
std = no_bright.std()
print('STD:', std)
bright = (no_text>220).sum()
print('Brigth pixels:', bright)
plt.figure()
plt.hist(no_text.reshape(-1,1), 25)
plt.title(lbl)
if std>25:
print("!!! Detected uneven illumination")
if no_text.mean()<200 and bright>8000:
print("!!! Detected glare")
This results in:
good_img
STD: 11.264569863071165
Brigth pixels: 58
glare_img
STD: 15.00149131296984
Brigth pixels: 15122
!!! Detected glare
uneven_img
STD: 57.99510339944441
Brigth pixels: 688
!!! Detected uneven illumination
Now let's analyze the histograms and apply some common sense. We expect background to be even and have low variance, like it is the case in "good_img". If it has high variance, then its standard deviation would be high and it is the case of uneven brightness. On the lower image you can see 3 (smaller) peaks that are responsible for the 3 different illuminated areas. The largest peak in the middle is the result of setting all black pixels to the mean value. I believe it is safe to call images with STD above 25 as "uneven illumination" case.
It is easy to spot a high amount of bright pixels when there is glare (see image on right). Glared image looks like a good image, besided the hot spot. Setting threshold of bright pixels to something like 8000 (1.5% of total image size) should be good to detect such images. There is a possibility that the background is very bright everywhere, so if the mean of no_text pixels is above 200, then it is the case and there is no need to detect hot spots.
Why don't you remove the lightning effect from the images?
For instance:
If we want to read with pytesseract output will be ' \n\f'
But if we remove the lightning:
import cv2
import pytesseract
img = cv2.imread('img2.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
smooth = cv2.GaussianBlur(gray, (95, 95), 0)
division = cv2.divide(gray, smooth, scale=192)
And read with the pytesseract, some part of the output will be:
.
.
.
Dosage & use
See package insert for compicic
information,
Instruction:
Keep all medicines out of the re.
Read the instructions carefully
Storage:
Store at temperature below 30°C.
Protect from Heat, light & moisture. BATCH NO. : 014C003
MFG. DATE - 03-2019
—— EXP. DATE : 03-2021
GENIX Distributed
AS Exclusi i :
genx PHARMA PRIVATE LIMITED Cevoka Pv 2 A ‘<
» 45-B, Kore ci
Karachi-75190, | Pakisier al Pei yaa fans
www.genixpharma.com
Repeat for the last image:
And read with the pytesseract, some part of the output will be:
.
.
.
Dosage & use
See package insert for complete prescribing
information. Rx Only
Instruction:
Keep all medicines out of the reach of children.
Read the instructions carefully before using.
Storage:
Store at temperature below 30°C. 5
Protect from Neat, light & moisture. BATCH NO, : 0140003
MFG. DATE : 03-2019
EXP. DATE : 03-2021
Manufactured by:
GENI N Exclusively Distributed by:
GENIX PHARMA PRIVATE LIMITED Ceyoka (Pvt) Ltd.
44, 45-B, Korangi Creek Road, 55, Negombe Road,
Karachi-75190, Pakistan. Peliyagoda, Snianka,
www. genixpharma.com
Update
You can find the illuminated part using erode and dilatation methods.
Result:
Code:
import cv2
import imutils
import numpy as np
from skimage import measure
from imutils import contours
img = cv2.imread('img2.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (95, 95), 0)
thresh = cv2.threshold(blurred, 200, 255, cv2.THRESH_BINARY)[1]
thresh = cv2.erode(thresh, None, iterations=2)
thresh = cv2.dilate(thresh, None, iterations=4)
labels = measure.label(thresh, neighbors=8, background=0)
mask = np.zeros(thresh.shape, dtype="uint8")
for label in np.unique(labels):
if label == 0:
continue
labelMask = np.zeros(thresh.shape, dtype="uint8")
labelMask[labels == label] = 255
numPixels = cv2.countNonZero(labelMask)
if numPixels > 300:
mask = cv2.add(mask, labelMask)
cnts = cv2.findContours(mask.copy(), cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_SIMPLE)
cnts = imutils.grab_contours(cnts)
cnts = contours.sort_contours(cnts)[0]
for (i, c) in enumerate(cnts):
(x, y, w, h) = cv2.boundingRect(c)
((cX, cY), radius) = cv2.minEnclosingCircle(c)
cv2.circle(img, (int(cX), int(cY)), int(radius),
(0, 0, 255), 3)
cv2.putText(img, "#{}".format(i + 1), (x, y - 15),
cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 0, 255), 2)
cv2.imshow("Image", img)
cv2.waitKey(0)
Though I only tested with the second-image. You may need to change the parameters for the other images.
Here is a quick solution in ImageMagick. But it can easily be implemented in Python/OpenCV as shown further down.
Use division normalization.
Read the input
Optionally convert to grayscale
Copy the image and blur it
Divide the blurred image by the original
Save the results
Input:
convert 8W0bp.jpg \( +clone -blur 0x13 \) +swap -compose divide -composite x1.png
convert ob87W.jpg \( +clone -blur 0x13 \) +swap -compose divide -composite x2.png
convert HLJuA.jpg \( +clone -blur 0x13 \) +swap -compose divide -composite x3.png
Results:
In Python/OpenCV:
import cv2
import numpy as np
import skimage.filters as filters
# read the image
img = cv2.imread('8W0bp.jpg')
#img = cv2.imread('ob87W.jpg')
#img = cv2.imread('HLJuA.jpg')
# convert to gray
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# blur
smooth = cv2.GaussianBlur(gray, (33,33), 0)
# divide gray by morphology image
division = cv2.divide(gray, smooth, scale=255)
# sharpen using unsharp masking
sharp = filters.unsharp_mask(division, radius=1.5, amount=2.5, multichannel=False, preserve_range=False)
sharp = (255*sharp).clip(0,255).astype(np.uint8)
# save results
cv2.imwrite('8W0bp_division.jpg',division)
cv2.imwrite('8W0bp_division_sharp.jpg',sharp)
#cv2.imwrite('ob87W_division.jpg',division)
#cv2.imwrite('ob87W_division_sharp.jpg',sharp)
#cv2.imwrite('HLJuA_division.jpg',division)
#cv2.imwrite('HLJuA_division_sharp.jpg',sharp)
# show results
cv2.imshow('smooth', smooth)
cv2.imshow('division', division)
cv2.imshow('sharp', sharp)
cv2.waitKey(0)
cv2.destroyAllWindows()
Results:
Here my pipeline:
%matplotlib inline
import numpy as np
import cv2
from matplotlib import pyplot as plt
from scipy.signal import find_peaks
I use the functions:
def get_perceived_brightness( float_img):
float_img = np.float64(float_img) # unit8 will make overflow
b, g, r = cv2.split(float_img)
float_brightness = np.sqrt((0.241 * (r ** 2)) + (0.691 * (g ** 2)) + (0.068 * (b ** 2)))
brightness_channel = np.uint8(np.absolute(float_brightness))
return brightness_channel
# from: https://stackoverflow.com/questions/46300577/find-locale-minimum-in-histogram-1d-array-python
def smooth(x,window_len=11,window='hanning'):
if x.ndim != 1:
raise ValueError("smooth only accepts 1 dimension arrays.")
if x.size < window_len:
raise ValueError("Input vector needs to be bigger than window size.")
if window_len<3:
return x
if not window in ['flat', 'hanning', 'hamming', 'bartlett', 'blackman']:
raise ValueError("Window is on of 'flat', 'hanning', 'hamming', 'bartlett', 'blackman'")
s=np.r_[x[window_len-1:0:-1],x,x[-2:-window_len-1:-1]]
if window == 'flat': #moving average
w=np.ones(window_len,'d')
else:
w=eval('np.'+window+'(window_len)')
y=np.convolve(w/w.sum(),s,mode='valid')
return y
I load the image
image_file_name = 'im3.jpg'
image = cv2.imread(image_file_name)
# image category
category = 0
# gray convertion
image_gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
height = image.shape[0]
width = image.shape[1]
First test. Does the image have any big white spots?
# First test. Does the image have any big white spots?
saturation_thresh = 250
raw_saturation_region = cv2.threshold(image_gray, saturation_thresh, 255, cv2.THRESH_BINARY)[1]
num_raw_saturation_regions, raw_saturation_regions,stats, _ = cv2.connectedComponentsWithStats(raw_saturation_region)
# index 0 is the background -> to remove
area_raw_saturation_regions = stats[1:,4]
min_area_bad_spot = 1000 # this can be calculated as percentage of the image area
if (np.max(area_raw_saturation_regions) > min_area_bad_spot):
category = 2 # there is at least one spot
The result for the image normal:
The result for the image with spots:
The result for the image with shadows:
If the image pass the first test, I process the second test. Is the image dark?
# Second test. Is the image dark?
min_mean_intensity = 60
if category == 0 :
mean_intensity = np.mean(image_gray)
if (mean_intensity < min_mean_intensity):
category = 3 # dark image
If the image pass also the second test, I process the third test. Is the image uniformy illuminatad?
window_len = 15 # odd number
delay = int((window_len-1)/2) # delay is the shift introduced from the smoothing. It's half window_len
# for example if the window_len is 15, the delay is 7
# infact hist.shape = 256 and smooted_hist.shape = 270 (= 256 + 2*delay)
if category == 0 :
perceived_brightness = get_perceived_brightness(image)
hist,bins = np.histogram(perceived_brightness.ravel(),256,[0,256])
# smoothed_hist is shifted from the original one
smoothed_hist = smooth(hist,window_len)
# smoothed histogram syncronized with the original histogram
sync_smoothed_hist = smoothed_hist[delay:-delay]
# if number the peaks with:
# 20<bin<250
# prominance >= mean histogram value
# the image could have shadows (but it could have also a background with some colors)
mean_hist = int(height*width / 256)
peaks, _ = find_peaks(sync_smoothed_hist, prominence=mean_hist)
selected_peaks = peaks[(peaks > 20) & (peaks < 250)]
if (selected_peaks.size>1) :
category = 4 # there are shadows
The histogram for the image normal:
The histogram for the image with spots:
The histogram for the image with shadows:
If the image pass all the tests, than it's normal
# all tests are passed. The image is ok
if (category == 0) :
category=1 # the image is ok

Reading numbers using PyTesseract

I am trying to read numbers from images and cannot find a way to get it to work consistently (not all images have numbers). These are the images:
(here is the link to the album in case the images are not working)
This is the command I'm using to run tesseract on the images: pytesseract.image_to_string(image, timeout=2, config='--psm 13 --oem 3 -c tessedit_char_whitelist=0123456789'). I have tried multiple configurations, but this seems to work best.
As far as preprocessing goes, this works the best:
gray = cv2.cvtColor(np.array(img), cv2.COLOR_RGB2GRAY)
gray = cv2.bilateralFilter(gray, 11, 17, 17)
im_bw = cv2.threshold(gray, thresh, 255, cv2.THRESH_BINARY_INV)[1]
This works for all images except the 3rd one. To solve the problem of lines in the 3rd image, i tried getting the edges with cv2.Canny and a pretty large threshold which works, but when drawing them back, even though it gets more than 95% of each number's edges, tesseract does not read them correctly.
I have also tried resizing the image, using cv2.morphologyEx, blurring it etc. I cannot find a way to get it to work for each case.
Thank you.
cv2.resize has consistently worked for me with INTER_CUBIC interpolation.
Adding this last step to pre-processing would most likely solve your problem.
im_bw_scaled = cv2.resize(im_bw, (0, 0), fx=4, fy=4, interpolation=cv2.INTER_CUBIC)
You could play around with the scale. I have used '4' above.
EDIT:
The following code worked with your images very well, even special characters. Please try it out with the rest of your dataset. Scaling, OTSU and erosion was the best combination.
import cv2
import numpy
import pytesseract
pytesseract.pytesseract.tesseract_cmd = "<path to tesseract.exe>"
# Page segmentation mode, PSM was changed to 6 since each page is a single uniform text block.
custom_config = r'--psm 6 --oem 3 -c tessedit_char_whitelist=0123456789'
# load the image as grayscale
img = cv2.imread("5.png",cv2.IMREAD_GRAYSCALE)
# Change all pixels to black, if they aren't white already (since all characters were white)
img[img != 255] = 0
# Scale it 10x
scaled = cv2.resize(img, (0,0), fx=10, fy=10, interpolation = cv2.INTER_CUBIC)
# Retained your bilateral filter
filtered = cv2.bilateralFilter(scaled, 11, 17, 17)
# Thresholded OTSU method
thresh = cv2.threshold(filtered, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]
# Erode the image to bulk it up for tesseract
kernel = numpy.ones((5,5),numpy.uint8)
eroded = cv2.erode(thresh, kernel, iterations = 2)
pre_processed = eroded
# Feed the pre-processed image to tesseract and print the output.
ocr_text = pytesseract.image_to_string(pre_processed, config=custom_config)
if len(ocr_text) != 0:
print(ocr_text)
else: print("No string detected")

image analysis (opencv or scikit image), deskewing of noisy scan

I do have some old bank statements as scan and would like to use google´s thesseract engine to extract the text. Works pretty well unless the image is slightly rotated. I thought of detecting the dashed lines in order to estimate the slope and afterwards the angle of rotation. However, it is tricky to get the parameters right.
If I could get rid of the large line artefact, I might use the minimum rotated bounding box (cv2.minAreaRect) on the text characters.
Maybe another strategy is suited better ? Any ideas ?
An example image (deleted some characters for data protection):
EIDT: I have found a solution which seems to work. However, I am stil wondering if there might be a faster solution (takes about 1.5 seconds per Image)
I do use template matching from skimage with following template:
template = plt.imread('template_long.png')
template = rgb2gray(template)
template = template > threshold_mean(template)
for i in range(1):
# read in image
img = cv2.imread('conversion/umsatz_{}.png'.format(i))
# convert to grayscale
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
gray = cv2.bitwise_not(gray)
# threshold the image, setting all foreground pixels to
# 255 and all background pixels to 0
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
# edge detection
#edges = cv2.Canny(thresh,2,100, apertureSize = 3)
# fill the holes from detected edges
#kernel = np.ones((2,2),np.uint8)
#dilate = cv2.dilate(thresh, kernel, iterations=1)
result = match_template(thresh, template)
mask = result < 0.5
r = result.copy()
r[mask] = 0
r[~mask] = 1
plt.imshow(r)

opencv python connectedComponents select component per label

I want to select each component of this image :
In practice, each and every triangle, by its labels. I don't figure out how.
I have this code:
#!/usr/bin/python
import cv2
import numpy as np
img = cv2.imread('invMehs.png', -1)
imGray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
ret, imBw = cv2.threshold(imGray, 250, 255, cv2.THRESH_BINARY)
invBwMesh = cv2.bitwise_not(imBw)
Mask = np.ones(imBw.shape, dtype="uint8") * 255
connectivity = 4
output = cv2.connectedComponentsWithStats(imBw, connectivity, cv2.CV_32S)
num_labels = output[0]
labels = output[1]
stats = output[2]
centroids = output[3]
labels = labels + 1
b = ( labels == 1)
cv2.imwrite('tst.jpg',labels[b])
But the image is complety black :S
Thank you very much.
The image you want save (labels[b]) only contains the thin lines (greylevel 1). When saving image using JPEG format, the compression algorithm smooths them, but since they have only 1 greylevel with the background, they are erased. That's why you get a black image
Saving in PNG format do not alter the image labels.
In order to keep all labels for each connected component (0 for the background), the code to write should be :
cv2.imwrite('labels.png',output[1])

Categories