How to implement imbinarize in OpenCV - python

I developed script in Matlab which is analysing engraved text on a colour steal. I'm using range of morphological techniques to extract the text and read it with OCR. I need to implement it on Raspberry Pi therefore I decided to transfer my Matlab code into OpenCV (in python). I tried to transfer some methods and they work similarly but how do I implement imreconstruct and imbinarize (shown below) to OpenCV? (the challenge here is appropriate differentiate foreground and background).
Maybe I should try adding grabCut or getStructuringElement or morphologyEx or dilate? I tried them in range of combinations but have not found a perfect solution.
I will put the whole script for both if anyone could give me suggestions on how to generally improve this extraction and accuracy of OCR process I would greatly appreciate it.
Based on bin values of grey-scale image. I change some parameters in
those functions:
Matlab:
se = strel('disk', 300);
img = imtophat(img, se);
maker = imerode(img, strel('line',100,0)); %for whiter ones
maker = imerode(img, strel('line',85,0)); %for medium
maker = imerode(img, strel('line',5,0));
imgClear = imreconstruct(maker, img);
imgBlur = imgaussfilt(imgClear,1); %less blur for whiter frames
BW = imbinarize(imgBlur,'adaptive','ForegroundPolarity','Bright',...
'Sensitivity',0.7); %process for medium
BW = imbinarize(imgBlur, 'adaptive', 'ForegroundPolarity',...
'Dark', 'Sensitivity', 0.4); % process for black and white
res = ocr(BW, 'CharacterSet', '0123456789', 'TextLayout', 'Block');
res.Text;
OpenCv
kernel = numpy.ones((5,5),numpy.uint8)
blur = cv2.GaussianBlur(img,(5,5),0)
erosion = cv2.erode(blur,kernel,iterations = 1)
opening = cv2.morphologyEx(erosion, cv2.MORPH_OPEN, kernel)
#bremove = cv2.grabCut(opening,mask,rect,bgdModelmode==GC_INIT_WITH_MASK)
#th3 = cv2.adaptiveThreshold(opening,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU,11,2)
ret, thresh= cv2.threshold(opening,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
ocr = pytesseract.image_to_string(Image.open('image2.png'),config='stdout -c tessedit_char_whitelist=0123456789')
Here is the input image:

I am surprised at how much difference between matlab and opencv there is when they both appear to use the same algorithm. Why do you run imbinarize twice? What does the sensitivity keyword actually do (mathematically, behind the background). Because they obviously have several steps more than just the bare OTSU.
import cv2
import numpy as np
import matplotlib.pyplot as plt
def show(img):
plt.imshow(img, cmap="gray")
plt.show()
img = cv2.imread("letters.jpg", cv2.IMREAD_GRAYSCALE)
kernel = np.ones((3,3), np.uint8)
blur = cv2.GaussianBlur(img,(3,3), 0)
erosion = cv2.erode(blur, kernel, iterations=3)
opening = cv2.dilate(erosion, kernel)
th3 = cv2.adaptiveThreshold(opening, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY, 45, 2)
show(th3)
kernel2 = cv2.getGaussianKernel(6, 2) #np.ones((6,6))
kernel2 = np.outer(kernel2, kernel2)
th3 = cv2.dilate(th3, kernel2)
th3 = cv2.erode(th3, kernel)
show(th3)
The images that get displayed are:
After a bit of cleaning up:
So all in all not the same and certainly not as nice as matlab. But the basic principle seems the same, it's just that the numbers need playing with.
A better approach would probably be to do a threshold by the mean of the image and then use the output of that as a mask to adaptive threshold the original image. Hopefully then the results would be better than both opencv and matlab.
Try doing it with ADAPTIVE_THRESH_MEAN_C you can get some really nice results but there's more trash lying around. Again, maybe if you can use it as a mask to isolate the text and then do tresholding again it might turn out to be better. Also the shape of the erosion and dilation kernels will make a big difference here.

I worked out the code to have a positive result based on your engraved text sample.
import cv2
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
def show(img):
plt.imshow(img, cmap="gray")
plt.show()
# load the input image
img = cv2.imread('./imagesStackoverflow/engraved_text.jpg',0);
show(img)
ret, mask = cv2.threshold(img, 60, 120, cv2.THRESH_BINARY) # turn 60, 120 for the best OCR results
kernel = np.ones((5,3),np.uint8)
mask = cv2.erode(mask,kernel,iterations = 1)
show(mask)
# I used a version of OpenCV with Tesseract, you may use your pytesseract and set the modes as:
# OCR Enginer Mode (OEM) = 3 (defualt = 3)
# Page Segmentation mode (PSmode) = 11 (defualt = 3)
tesser = cv2.text.OCRTesseract_create('C:/Program Files/Tesseract 4.0.0/tessdata/','eng','0123456789',11,3)
retval = tesser.run(mask, 0) # return string type
print 'OCR:' + retval
Processed image and OCR output:
It would be great if you can feedback your test results with more sample images.
opencvpythontesseractocr

What I can see from your code is you have used tophat filtering in your Matlab code as the first step. However, I couldn't see the same in your python OpenCV code.
Python has built in tophat filter try applying that for getting similar result
kernel = np.ones((5,5),np.uint8)
tophat = cv2.morphologyEx(img, cv2.MORPH_TOPHAT, kernel)
Also, try using CLAHE it gives better contrast to your image and then apply blackhat to filter out small details.
clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8,8))
cl1 = clahe.apply(img)
I have got better results by applying these transformations.

Tried below, it works to recognize the lighter engraved text sample. Hope it helps.
def show(img):
plt.imshow(img, cmap="gray")
plt.show()
# load the input image
img = cv2.imread('./imagesStackoverflow/engraved_text2.jpg',0);
show(img)
# apply CLAHE to adjust the contrast
clahe = cv2.createCLAHE(clipLimit=5.1, tileGridSize=(5,3))
cl1 = clahe.apply(img)
img = cl1.copy()
show(img)
img = cv2.GaussianBlur(img,(3,3), 1)
ret, mask = cv2.threshold(img, 125, 150, cv2.THRESH_BINARY) # turn 125, 150 for the best OCR results
kernel = np.ones((5,3),np.uint8)
mask = cv2.erode(mask,kernel,iterations = 1)
show(mask)
# I used a version of OpenCV with Tesseract, you may use your pytesseract and set the modes as:
# Page Segmentation mode (PSmode) = 11 (defualt = 3)
# OCR Enginer Mode (OEM) = 3 (defualt = 3)
tesser = cv2.text.OCRTesseract_create('C:/Program Files/Tesseract 4.0.0/tessdata/','eng','0123456789',11,3)
retval = tesser.run(mask, 0) # return string type
print 'OCR:' + retval

Related

python OCR recognize image into text

seems resolution of image effect the output is success or not
usually the image's resolution/quality from production line is like test image 1, instead of change camera quality, is there any way to make success rate higher? like improve code make simple AI to help detect or something? I need a hand thanks.
the demo .py code I found from tutorial
from PIL import Image
import pytesseract
img = Image.open('new_003.png')
text = pytesseract.image_to_string(img, lang='eng')
print("size")
print(img.size)
print(text)
(pic) test image 1: https://ibb.co/VLsM9LL
size
(122, 119)
# the output is:
R carac7
(pic) test image 2: https://ibb.co/XyRcf45
size
(329, 249)
# the output is:
R1 oun,
2A
R ca7ac2
(pic) test image 3: https://ibb.co/fNtDRc7
this one just for test but is the only one 100% correct
size
(640, 640)
# the output is:
BREAKING THE STATUE
i have always known
i just didn't understand
the inner conflictions
arresting our hands
gravitating close enough
expansive distamce between
i couldn't give you more
but i meant everything
when the day comes
you find your heart
wants something more
than a viece and a part
your life will change
like astatue set free
to walk among us
to created estiny
we didn't break any rules
we didn't make mistakes
making beauty in loving
making lovine for days
SHILOW
I tried to find out/proof the solution can only be the image resolution or there can be other alternative way to solve this issue
I try Dilation and Erosion to image, hoped can get more clear image for OCR recognize like link demo pic https://ibb.co/3pDgDnF
import cv2
import numpy as np
import matplotlib.pyplot as plt
import glob
from IPython.display import clear_output
def show_img(img, bigger=False):
if bigger:
plt.figure(figsize=(15,15))
image_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
plt.imshow(image_rgb)
plt.show()
def sharpen(img, sigma=100):
# sigma = 5、15、25
blur_img = cv2.GaussianBlur(img, (0, 0), sigma)
usm = cv2.addWeighted(img, 1.5, blur_img, -0.5, 0)
return usm
def img_processing(img):
# do something here
img = sharpen(img)
return img
img = cv2.imread("/home/joy/桌面/all_pic_OCR/simple_pic/03.png")
cv2.imshow('03', img) # Original image
img2 = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (11, 11))
img = cv2.dilate(img, kernel) # tried Dilation
cv2.imshow('image_after_Dilation', img) # image after Dilation
img = cv2.erode(img, kernel) # tried Erosion
cv2.imshow('then_Erosion', img) # image after Erosion
cv2.waitKey(0)
cv2.destroyAllWindows()
result: https://ibb.co/TbZjg3d
so still trying to achieve python OCR recognize image into text with 99.9999% correct

Generate noiseless binary image and skeleton from image

I have a overlapping filaments image. I am interested in generating the noiseless binary image and subsequently use it for skeleton generation. I have tried different ways to get skeleton but not able to succeed. Below find the code written in python for the same and attached skeleton image generated through it. It would be great if anyone help in solving issue.
Original vs Skeleton image:
import cv2
import numpy as np
from skimage import morphology, graph
from skan import Skeleton
from skimage.morphology import skeletonize
import matplotlib.pyplot as plt
img00 = cv2.imread(r'img_test.jpg')
img01 = cv2.cvtColor(img00, cv2.COLOR_BGR2GRAY)
cv2.imshow('1',img01)
img02 = cv2.adaptiveThreshold(img01,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY,5,5)
cv2.imshow('2',img02)
i_size = min(np.size(img02,1),600) # image size for imshow
kernel = np.ones((2, 2), np.uint8) # Creating kernel
# Using cv2.erode() method
img_erosion = cv2.erode(img02, kernel, borderType = cv2.BORDER_REFLECT, iterations=1, borderValue = 1)
filterSize =(5,5)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, filterSize)
tophat_img = cv2.morphologyEx(img_erosion, cv2.MORPH_BLACKHAT, kernel)
img03 = cv2.bitwise_not(tophat_img)
cv2.imshow('3',img03)
kernel1 = cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(2,2))
img04 = cv2.morphologyEx(img03, cv2.MORPH_CLOSE, kernel1)
img04 = cv2.morphologyEx(img04, cv2.MORPH_OPEN, kernel1)
cv2.imshow('4',img04)
thresh = (img04/255).astype(np.uint8)
# skeleton based on default method
skeleton1 = skeletonize(thresh)
skeleton2 = (skeleton1*255).astype(np.uint8)
cv2.imshow('5',skeleton2)
# Avg diameter calculation
diameter = np.sum(thresh)/np.sum(skeleton1)
print('diameter',diameter)
cv2.waitKey(0) & 0xFF == ord('q')
cv2.destroyAllWindows()
Straight binarization followed by morphological closing can give interesting results, though they are sensitive to parameters.

Opencv, Python - How to remove the gray pixels around the date text

I am trying to remove the grayish “noise” surrounding the dates using Python/OpenCV to help the OCR (Optical Character Recognition) to recognize the dates.
The original image looks like this: https://static.mothership.sg/1/2017/03/10-Feb-MC-1.jpg
The python script I tried looked as below. However, I have other similar images in which the contrast or lighting coditions varies.
import cv2
import numpy as np
img = cv2.imread("mc.jpeg")
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
alpha = 3.5
beta = -2
new = alpha * img + beta
new = np.clip(new, 0, 255).astype(np.uint8)
cv2.imwrite("cleaned.png", new)
I also tried Thresholding and/or adaptiveThresholding and some time, I was able to separate the dates from the grayish background. Sometimes it was very challenging. I wonder is there an automatic way to determine the threshold value ?
Below are example of what I hope to achieve.
Blurry Image:
Otsu's Binarization automatically calculates a threshold value from an image histogram.
# Otsu's thresholding after Gaussian filtering
blur = cv2.GaussianBlur(img,(5,5),0)
ret,Otsu = cv2.threshold(blur,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
cv2.imwrite("Otsu's_thresholding", Otsu)
see this link
You can try to build a model of the background and then weight each input pixel by that model. The output gain should be relatively constant during most of the image. These are the steps for this method:
Apply a soft median blur filter to get rid of small noise
Get the model of the background via local maximum. Apply a very strong close operation, with a big structuring element (I’m using a rectangular kernel of size 15)
Perform gain adjustment by dividing 255 between each local maximum pixel. Weight this value with each input image pixel.
You should get a nice image where the background illumination is pretty much normalized, threshold this image to get a binary mask of the text
This is the code:
import numpy as np
import cv2
# image path
path = "C:/opencvImages/sheet01.jpg"
# Read an image in default mode:
inputImage = cv2.imread(path)
# Remove small noise via median:
filterSize = 5
imageMedian = cv2.medianBlur(inputImage, filterSize)
# Get local maximum:
kernelSize = 15
maxKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernelSize, kernelSize))
localMax = cv2.morphologyEx(imageMedian, cv2.MORPH_CLOSE, maxKernel, None, None, 1, cv2.BORDER_REFLECT101)
# Adjust image gain:
height, width, depth = localMax.shape
# Create output Mat:
outputImage = np.zeros(shape=[height, width, depth], dtype=np.uint8)
for i in range(0, height):
for j in range(0, width):
# Get current BGR pixels:
v1 = inputImage[i, j]
v2 = localMax[i, j]
# Gain adjust:
tempArray = []
for c in range(0, 3):
currentPixel = v2[c]
if currentPixel != 0:
gain = 255 / v2[c]
gain = v1[c] * gain
else:
gain = 0
# Gain set and clamp:
tempArray.append(np.clip(gain, 0, 255))
# Set pixel vec to out image:
outputImage[i, j] = tempArray
# Convert RGB to grayscale:
grayscaleImage = cv2.cvtColor(outputImage, cv2.COLOR_BGR2GRAY)
# Threshold:
threshValue = 110
_, binaryImage = cv2.threshold(grayscaleImage, threshValue, 255, cv2.THRESH_BINARY)
# Write image:
imageFilename = "C:/opencvImages/binaryMask2.png"
cv2.imwrite(imageFilename, binaryImage)
I get the following results testing the complete image:
And the cropped text:
Please note that the gain adjustment operations are not vectorized. The script is slow, mainly because I'm starting with Python and don’t know the proper Numpy syntax to speed-up this operation. I've been using C++ for a long time, so feel free to further improve the code.
Edit:
Please, be aware that your result can only be as good as the quality of your input. See your input and ask yourself "Is this a good input for an automated process?" (Automated processes are usually not very smart). The second picture you posted is very low quality. Not only is blurry but also is low res and has compression artifacts. All these factors will hinder automated processing.
With that said, here's an improvement you can include in the original:
Try to normalize brightness-contrast on the grayscale output:
grayscaleImage = np.uint8(cv2.normalize(grayscaleImage, grayscaleImage, 0, 255, cv2.NORM_MINMAX))
Your grayscale image goes from this:
to this:
A little bit darker and improved on contrast. Let's try to compute the optimal threshold value automatically via Otsu thresholding:
threshValue, binaryImage = cv2.threshold(grayscaleImage, 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)
It gets you this:
However, we can adjust the result if we add bias to Otsu's threshold, like this:
threshValue, binaryImage = cv2.threshold(grayscaleImage, 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)
bias = 0.9
threshValue = bias * threshValue
_, binaryImage = cv2.threshold(grayscaleImage, threshValue, 255, cv2.THRESH_BINARY)
That's the best quality you can get with these images using this method.
If you find these suggestions and tips useful, please, at least up-vote my answer.

Reading numbers using PyTesseract

I am trying to read numbers from images and cannot find a way to get it to work consistently (not all images have numbers). These are the images:
(here is the link to the album in case the images are not working)
This is the command I'm using to run tesseract on the images: pytesseract.image_to_string(image, timeout=2, config='--psm 13 --oem 3 -c tessedit_char_whitelist=0123456789'). I have tried multiple configurations, but this seems to work best.
As far as preprocessing goes, this works the best:
gray = cv2.cvtColor(np.array(img), cv2.COLOR_RGB2GRAY)
gray = cv2.bilateralFilter(gray, 11, 17, 17)
im_bw = cv2.threshold(gray, thresh, 255, cv2.THRESH_BINARY_INV)[1]
This works for all images except the 3rd one. To solve the problem of lines in the 3rd image, i tried getting the edges with cv2.Canny and a pretty large threshold which works, but when drawing them back, even though it gets more than 95% of each number's edges, tesseract does not read them correctly.
I have also tried resizing the image, using cv2.morphologyEx, blurring it etc. I cannot find a way to get it to work for each case.
Thank you.
cv2.resize has consistently worked for me with INTER_CUBIC interpolation.
Adding this last step to pre-processing would most likely solve your problem.
im_bw_scaled = cv2.resize(im_bw, (0, 0), fx=4, fy=4, interpolation=cv2.INTER_CUBIC)
You could play around with the scale. I have used '4' above.
EDIT:
The following code worked with your images very well, even special characters. Please try it out with the rest of your dataset. Scaling, OTSU and erosion was the best combination.
import cv2
import numpy
import pytesseract
pytesseract.pytesseract.tesseract_cmd = "<path to tesseract.exe>"
# Page segmentation mode, PSM was changed to 6 since each page is a single uniform text block.
custom_config = r'--psm 6 --oem 3 -c tessedit_char_whitelist=0123456789'
# load the image as grayscale
img = cv2.imread("5.png",cv2.IMREAD_GRAYSCALE)
# Change all pixels to black, if they aren't white already (since all characters were white)
img[img != 255] = 0
# Scale it 10x
scaled = cv2.resize(img, (0,0), fx=10, fy=10, interpolation = cv2.INTER_CUBIC)
# Retained your bilateral filter
filtered = cv2.bilateralFilter(scaled, 11, 17, 17)
# Thresholded OTSU method
thresh = cv2.threshold(filtered, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]
# Erode the image to bulk it up for tesseract
kernel = numpy.ones((5,5),numpy.uint8)
eroded = cv2.erode(thresh, kernel, iterations = 2)
pre_processed = eroded
# Feed the pre-processed image to tesseract and print the output.
ocr_text = pytesseract.image_to_string(pre_processed, config=custom_config)
if len(ocr_text) != 0:
print(ocr_text)
else: print("No string detected")

Remove features from binarized image

I wrote a little script to transform pictures of chalkboards into a form that I can print off and mark up.
I take an image like this:
Auto-crop it, and binarize it. Here's the output of the script:
I would like to remove the largest connected black regions from the image. Is there a simple way to do this?
I was thinking of eroding the image to eliminate the text and then subtracting the eroded image from the original binarized image, but I can't help thinking that there's a more appropriate method.
Sure you can just get connected components (of certain size) with findContours or floodFill, and erase them leaving some smear. However, if you like to do it right you would think about why do you have the black area in the first place.
You did not use adaptive thresholding (locally adaptive) and this made your output sensitive to shading. Try not to get the black region in the first place by running something like this:
Mat img = imread("desk.jpg", 0);
Mat img2, dst;
pyrDown(img, img2);
adaptiveThreshold(255-img2, dst, 255, ADAPTIVE_THRESH_MEAN_C,
THRESH_BINARY, 9, 10); imwrite("adaptiveT.png", dst);
imshow("dst", dst);
waitKey(-1);
In the future, you may read something about adaptive thresholds and how to sample colors locally. I personally found it useful to sample binary colors orthogonally to the image gradient (that is on the both sides of it). This way the samples of white and black are of equal size which is a big deal since typically there are more background color which biases estimation. Using SWT and MSER may give you even more ideas about text segmentation.
I tried this:
import numpy as np
import cv2
im = cv2.imread('image.png')
gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
grayout = 255*np.ones((im.shape[0],im.shape[1],1), np.uint8)
blur = cv2.GaussianBlur(gray,(5,5),1)
thresh = cv2.adaptiveThreshold(blur,255,1,1,11,2)
contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)
wcnt = 0
for item in contours:
area =cv2.contourArea(item)
print wcnt,area
[x,y,w,h] = cv2.boundingRect(item)
if area>10 and area<200:
roi = gray[y:y+h,x:x+w]
cntd = 0
for i in range(x,x+w):
for j in range(y,y+h):
if gray[j,i]==0:
cntd = cntd + 1
density = cntd/(float(h*w))
if density<0.5:
for i in range(x,x+w):
for j in range(y,y+h):
grayout[j,i] = gray[j,i];
wcnt = wcnt + 1
cv2.imwrite('result.png',grayout)
You have to balance two things, removing the black spots but balance that with not losing the contents of what is on the board. The output I got is this:
Here is a Python numpy implementation (using my own mahotas package) of the method for the top answer (almost the same, I think):
import mahotas as mh
import numpy as np
Imported mahotas & numpy with standard abbreviations
im = mh.imread('7Esco.jpg', as_grey=1)
Load the image & convert to gray
im2 = im[::2,::2]
im2 = mh.gaussian_filter(im2, 1.4)
Downsample and blur (for speed and noise removal).
im2 = 255 - im2
Invert the image
mean_filtered = mh.convolve(im2.astype(float), np.ones((9,9))/81.)
Mean filtering is implemented "by hand" with a convolution.
imc = im2 > mean_filtered - 4
You might need to adjust the number 4 here, but it worked well for this image.
mh.imsave('binarized.png', (imc*255).astype(np.uint8))
Convert to 8 bits and save in PNG format.

Categories