I`m trying to implement the digits classifier by myself. And I faced some troubles with it. I'm training the NN on MNIST handwritten dataset, MNIST sample digit. But when I'm trying to predict what the digit is, I predicted from the image that I found and processed using cv2 - cv2 processed digit, as you can see, my own image has a fatter circuit and clear boundaries.
That is my digit-image before the processing - Before, and after - After. But I want to image be like this.
after the processing.
I use the following code to process each digit:
def main():
image = cv2.imread('digit.jpg', cv2.IMREAD_GRAYSCALE)
image = image.reshape((32,32,1))
image = postprocess(image)
def postprocess(gray):
kernel_size = 15
blur_gray = cv2.GaussianBlur(gray,(kernel_size, kernel_size), 0)
thresh = cv2.adaptiveThreshold(blur_gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 11, 3)
return thresh
I use 11 as a Threshold parameter to discard most artifacts, but my digit circuit is still bold/fat and have too clear boundaries.
Question is: how can I process the image to make it look like a training sample image(thicker and with blurred boundaries)?
I found a solution to my problem by using kernel filtering. Today I stumbled upon an article about kernel image processing, there were some kernels for "Edge detection", I have tried them all, but no one of them was good enough. But I did my own kernel by an oversight, and it working awesomely for me!
So, there is the code:
def main():
image = cv2.imread('digit.jpg', cv2.IMREAD_GRAYSCALE)
image = postprocess(image)
image = image.reshape((32,32,1))
def postprocess(gray):
gray_big = cv2.resize(gray, (256,256))
kernel = np.array([[0,-2,0],[-2,10,-2],[0,-2,0]])
filtered = cv2.filter2D(gray_big, -1, kernel)
filtered = 255 - filtered
filtered = filtered / 255
filtered = cv2.resize(filtered, (32,32))
return filtered
Resizing the image to a bigger one keeps the image clear from "artifacts" after the kernel processing, I've tried to do it on a original image size, but the image wasn't so clear as in my final code version.
result
Related
I'm trying to process an image with opencv for feeding it to tesseract for text recognition. I'm working with tags which have a lot of noise and an holographic pattern which varies greatly depending on light conditions.
For this reason I tried different approaches.
First I convert the image to grayscale, then I apply a median blur to soften the noise, then apply an adaptive threshold for masking.
I end up with this result
However the image still has a lot of noise around the text which really messes with its recognition. I'm new to image processing and I'm a bit lost as to how to proceed.
Here's the code for the aforementioned approach:
def approach_3(img, blurIterations=13):
img = rotate_img(img)
img = resize(img)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY).astype(np.uint8)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2, 2))
median = cv2.medianBlur(gray, blurIterations)
th2 = cv2.adaptiveThreshold(median,255,cv2.ADAPTIVE_THRESH_MEAN_C,\
cv2.THRESH_BINARY, 31 , 12)
return th2
I'm working on performing OCR of energy meter displays: example 1 example 2 example 3
I tried to use tesseract-ocr with the letsgodigital trained data. But the performance is very poor.
I'm fairly new to the topic and this is what I've done:
import numpy as np
import cv2
import imutils
from skimage import exposure
from pytesseract import image_to_string
import PIL
def process_image(orig_image_arr):
gry_disp_arr = cv2.cvtColor(orig_image_arr, cv2.COLOR_BGR2GRAY)
gry_disp_arr = exposure.rescale_intensity(gry_disp_arr, out_range= (0,255))
#thresholding
ret, thresh = cv2.threshold(gry_disp_arr,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
return thresh
def ocr_image(orig_image_arr):
otsu_thresh_image = process_image(orig_image_arr)
cv2_imshow(otsu_thresh_image)
return image_to_string(otsu_thresh_image, lang="letsgodigital", config="--psm 8 -c tessedit_char_whitelist=.0123456789")
img1 = cv2.imread('test2.jpg')
cnv = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
text = ocr_image(cnv)
This gives very poor results with the example images. I have a couple of questions:
How can I identify the four corners of the display? (Edge detection doesn’t seem to work very well)
Is there any futher preprocessing that I can do to improve the performance?
Thanks for any help.
Notice how your power meters either use blue or green LEDs to light up the display; I suggest you use this color display to your advantage. What I'd do is select only one RGB channel based on the LED color. Then I can threshold it based on some algorithm or assumption. After that, you can do the downstream steps of cropping / resizing / transformation / OCR etc.
For example, using your example image 1, look at its histogram here.
Notice how there is a small peak of green to the right of the 150 mark.
I take advantage of this, and set anything below 150 to zero. My assumption being that the green peak is the bright green LED in the image.
img = cv2.imread('example_1.jpg', 1)
# Get only green channel
img_g = img[:,:,1]
# Set threshold for green value, anything less than 150 becomes zero
img_g[img_g < 150] = 0
This is what I get.
This should be much easier for downstream OCR now.
# You should also set anything >= 150 to max value as well, but I didn't in this example
img_g[img_g >= 150] = 255
The above steps should replace this step
_ret, thresh = cv2.threshold(img_g, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
Here's the output of this step.
I am trying to remove the grayish “noise” surrounding the dates using Python/OpenCV to help the OCR (Optical Character Recognition) to recognize the dates.
The original image looks like this: https://static.mothership.sg/1/2017/03/10-Feb-MC-1.jpg
The python script I tried looked as below. However, I have other similar images in which the contrast or lighting coditions varies.
import cv2
import numpy as np
img = cv2.imread("mc.jpeg")
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
alpha = 3.5
beta = -2
new = alpha * img + beta
new = np.clip(new, 0, 255).astype(np.uint8)
cv2.imwrite("cleaned.png", new)
I also tried Thresholding and/or adaptiveThresholding and some time, I was able to separate the dates from the grayish background. Sometimes it was very challenging. I wonder is there an automatic way to determine the threshold value ?
Below are example of what I hope to achieve.
Blurry Image:
Otsu's Binarization automatically calculates a threshold value from an image histogram.
# Otsu's thresholding after Gaussian filtering
blur = cv2.GaussianBlur(img,(5,5),0)
ret,Otsu = cv2.threshold(blur,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
cv2.imwrite("Otsu's_thresholding", Otsu)
see this link
You can try to build a model of the background and then weight each input pixel by that model. The output gain should be relatively constant during most of the image. These are the steps for this method:
Apply a soft median blur filter to get rid of small noise
Get the model of the background via local maximum. Apply a very strong close operation, with a big structuring element (I’m using a rectangular kernel of size 15)
Perform gain adjustment by dividing 255 between each local maximum pixel. Weight this value with each input image pixel.
You should get a nice image where the background illumination is pretty much normalized, threshold this image to get a binary mask of the text
This is the code:
import numpy as np
import cv2
# image path
path = "C:/opencvImages/sheet01.jpg"
# Read an image in default mode:
inputImage = cv2.imread(path)
# Remove small noise via median:
filterSize = 5
imageMedian = cv2.medianBlur(inputImage, filterSize)
# Get local maximum:
kernelSize = 15
maxKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernelSize, kernelSize))
localMax = cv2.morphologyEx(imageMedian, cv2.MORPH_CLOSE, maxKernel, None, None, 1, cv2.BORDER_REFLECT101)
# Adjust image gain:
height, width, depth = localMax.shape
# Create output Mat:
outputImage = np.zeros(shape=[height, width, depth], dtype=np.uint8)
for i in range(0, height):
for j in range(0, width):
# Get current BGR pixels:
v1 = inputImage[i, j]
v2 = localMax[i, j]
# Gain adjust:
tempArray = []
for c in range(0, 3):
currentPixel = v2[c]
if currentPixel != 0:
gain = 255 / v2[c]
gain = v1[c] * gain
else:
gain = 0
# Gain set and clamp:
tempArray.append(np.clip(gain, 0, 255))
# Set pixel vec to out image:
outputImage[i, j] = tempArray
# Convert RGB to grayscale:
grayscaleImage = cv2.cvtColor(outputImage, cv2.COLOR_BGR2GRAY)
# Threshold:
threshValue = 110
_, binaryImage = cv2.threshold(grayscaleImage, threshValue, 255, cv2.THRESH_BINARY)
# Write image:
imageFilename = "C:/opencvImages/binaryMask2.png"
cv2.imwrite(imageFilename, binaryImage)
I get the following results testing the complete image:
And the cropped text:
Please note that the gain adjustment operations are not vectorized. The script is slow, mainly because I'm starting with Python and don’t know the proper Numpy syntax to speed-up this operation. I've been using C++ for a long time, so feel free to further improve the code.
Edit:
Please, be aware that your result can only be as good as the quality of your input. See your input and ask yourself "Is this a good input for an automated process?" (Automated processes are usually not very smart). The second picture you posted is very low quality. Not only is blurry but also is low res and has compression artifacts. All these factors will hinder automated processing.
With that said, here's an improvement you can include in the original:
Try to normalize brightness-contrast on the grayscale output:
grayscaleImage = np.uint8(cv2.normalize(grayscaleImage, grayscaleImage, 0, 255, cv2.NORM_MINMAX))
Your grayscale image goes from this:
to this:
A little bit darker and improved on contrast. Let's try to compute the optimal threshold value automatically via Otsu thresholding:
threshValue, binaryImage = cv2.threshold(grayscaleImage, 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)
It gets you this:
However, we can adjust the result if we add bias to Otsu's threshold, like this:
threshValue, binaryImage = cv2.threshold(grayscaleImage, 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)
bias = 0.9
threshValue = bias * threshValue
_, binaryImage = cv2.threshold(grayscaleImage, threshValue, 255, cv2.THRESH_BINARY)
That's the best quality you can get with these images using this method.
If you find these suggestions and tips useful, please, at least up-vote my answer.
I am trying to read numbers from images and cannot find a way to get it to work consistently (not all images have numbers). These are the images:
(here is the link to the album in case the images are not working)
This is the command I'm using to run tesseract on the images: pytesseract.image_to_string(image, timeout=2, config='--psm 13 --oem 3 -c tessedit_char_whitelist=0123456789'). I have tried multiple configurations, but this seems to work best.
As far as preprocessing goes, this works the best:
gray = cv2.cvtColor(np.array(img), cv2.COLOR_RGB2GRAY)
gray = cv2.bilateralFilter(gray, 11, 17, 17)
im_bw = cv2.threshold(gray, thresh, 255, cv2.THRESH_BINARY_INV)[1]
This works for all images except the 3rd one. To solve the problem of lines in the 3rd image, i tried getting the edges with cv2.Canny and a pretty large threshold which works, but when drawing them back, even though it gets more than 95% of each number's edges, tesseract does not read them correctly.
I have also tried resizing the image, using cv2.morphologyEx, blurring it etc. I cannot find a way to get it to work for each case.
Thank you.
cv2.resize has consistently worked for me with INTER_CUBIC interpolation.
Adding this last step to pre-processing would most likely solve your problem.
im_bw_scaled = cv2.resize(im_bw, (0, 0), fx=4, fy=4, interpolation=cv2.INTER_CUBIC)
You could play around with the scale. I have used '4' above.
EDIT:
The following code worked with your images very well, even special characters. Please try it out with the rest of your dataset. Scaling, OTSU and erosion was the best combination.
import cv2
import numpy
import pytesseract
pytesseract.pytesseract.tesseract_cmd = "<path to tesseract.exe>"
# Page segmentation mode, PSM was changed to 6 since each page is a single uniform text block.
custom_config = r'--psm 6 --oem 3 -c tessedit_char_whitelist=0123456789'
# load the image as grayscale
img = cv2.imread("5.png",cv2.IMREAD_GRAYSCALE)
# Change all pixels to black, if they aren't white already (since all characters were white)
img[img != 255] = 0
# Scale it 10x
scaled = cv2.resize(img, (0,0), fx=10, fy=10, interpolation = cv2.INTER_CUBIC)
# Retained your bilateral filter
filtered = cv2.bilateralFilter(scaled, 11, 17, 17)
# Thresholded OTSU method
thresh = cv2.threshold(filtered, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]
# Erode the image to bulk it up for tesseract
kernel = numpy.ones((5,5),numpy.uint8)
eroded = cv2.erode(thresh, kernel, iterations = 2)
pre_processed = eroded
# Feed the pre-processed image to tesseract and print the output.
ocr_text = pytesseract.image_to_string(pre_processed, config=custom_config)
if len(ocr_text) != 0:
print(ocr_text)
else: print("No string detected")
I developed script in Matlab which is analysing engraved text on a colour steal. I'm using range of morphological techniques to extract the text and read it with OCR. I need to implement it on Raspberry Pi therefore I decided to transfer my Matlab code into OpenCV (in python). I tried to transfer some methods and they work similarly but how do I implement imreconstruct and imbinarize (shown below) to OpenCV? (the challenge here is appropriate differentiate foreground and background).
Maybe I should try adding grabCut or getStructuringElement or morphologyEx or dilate? I tried them in range of combinations but have not found a perfect solution.
I will put the whole script for both if anyone could give me suggestions on how to generally improve this extraction and accuracy of OCR process I would greatly appreciate it.
Based on bin values of grey-scale image. I change some parameters in
those functions:
Matlab:
se = strel('disk', 300);
img = imtophat(img, se);
maker = imerode(img, strel('line',100,0)); %for whiter ones
maker = imerode(img, strel('line',85,0)); %for medium
maker = imerode(img, strel('line',5,0));
imgClear = imreconstruct(maker, img);
imgBlur = imgaussfilt(imgClear,1); %less blur for whiter frames
BW = imbinarize(imgBlur,'adaptive','ForegroundPolarity','Bright',...
'Sensitivity',0.7); %process for medium
BW = imbinarize(imgBlur, 'adaptive', 'ForegroundPolarity',...
'Dark', 'Sensitivity', 0.4); % process for black and white
res = ocr(BW, 'CharacterSet', '0123456789', 'TextLayout', 'Block');
res.Text;
OpenCv
kernel = numpy.ones((5,5),numpy.uint8)
blur = cv2.GaussianBlur(img,(5,5),0)
erosion = cv2.erode(blur,kernel,iterations = 1)
opening = cv2.morphologyEx(erosion, cv2.MORPH_OPEN, kernel)
#bremove = cv2.grabCut(opening,mask,rect,bgdModelmode==GC_INIT_WITH_MASK)
#th3 = cv2.adaptiveThreshold(opening,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU,11,2)
ret, thresh= cv2.threshold(opening,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
ocr = pytesseract.image_to_string(Image.open('image2.png'),config='stdout -c tessedit_char_whitelist=0123456789')
Here is the input image:
I am surprised at how much difference between matlab and opencv there is when they both appear to use the same algorithm. Why do you run imbinarize twice? What does the sensitivity keyword actually do (mathematically, behind the background). Because they obviously have several steps more than just the bare OTSU.
import cv2
import numpy as np
import matplotlib.pyplot as plt
def show(img):
plt.imshow(img, cmap="gray")
plt.show()
img = cv2.imread("letters.jpg", cv2.IMREAD_GRAYSCALE)
kernel = np.ones((3,3), np.uint8)
blur = cv2.GaussianBlur(img,(3,3), 0)
erosion = cv2.erode(blur, kernel, iterations=3)
opening = cv2.dilate(erosion, kernel)
th3 = cv2.adaptiveThreshold(opening, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY, 45, 2)
show(th3)
kernel2 = cv2.getGaussianKernel(6, 2) #np.ones((6,6))
kernel2 = np.outer(kernel2, kernel2)
th3 = cv2.dilate(th3, kernel2)
th3 = cv2.erode(th3, kernel)
show(th3)
The images that get displayed are:
After a bit of cleaning up:
So all in all not the same and certainly not as nice as matlab. But the basic principle seems the same, it's just that the numbers need playing with.
A better approach would probably be to do a threshold by the mean of the image and then use the output of that as a mask to adaptive threshold the original image. Hopefully then the results would be better than both opencv and matlab.
Try doing it with ADAPTIVE_THRESH_MEAN_C you can get some really nice results but there's more trash lying around. Again, maybe if you can use it as a mask to isolate the text and then do tresholding again it might turn out to be better. Also the shape of the erosion and dilation kernels will make a big difference here.
I worked out the code to have a positive result based on your engraved text sample.
import cv2
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
def show(img):
plt.imshow(img, cmap="gray")
plt.show()
# load the input image
img = cv2.imread('./imagesStackoverflow/engraved_text.jpg',0);
show(img)
ret, mask = cv2.threshold(img, 60, 120, cv2.THRESH_BINARY) # turn 60, 120 for the best OCR results
kernel = np.ones((5,3),np.uint8)
mask = cv2.erode(mask,kernel,iterations = 1)
show(mask)
# I used a version of OpenCV with Tesseract, you may use your pytesseract and set the modes as:
# OCR Enginer Mode (OEM) = 3 (defualt = 3)
# Page Segmentation mode (PSmode) = 11 (defualt = 3)
tesser = cv2.text.OCRTesseract_create('C:/Program Files/Tesseract 4.0.0/tessdata/','eng','0123456789',11,3)
retval = tesser.run(mask, 0) # return string type
print 'OCR:' + retval
Processed image and OCR output:
It would be great if you can feedback your test results with more sample images.
opencvpythontesseractocr
What I can see from your code is you have used tophat filtering in your Matlab code as the first step. However, I couldn't see the same in your python OpenCV code.
Python has built in tophat filter try applying that for getting similar result
kernel = np.ones((5,5),np.uint8)
tophat = cv2.morphologyEx(img, cv2.MORPH_TOPHAT, kernel)
Also, try using CLAHE it gives better contrast to your image and then apply blackhat to filter out small details.
clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8,8))
cl1 = clahe.apply(img)
I have got better results by applying these transformations.
Tried below, it works to recognize the lighter engraved text sample. Hope it helps.
def show(img):
plt.imshow(img, cmap="gray")
plt.show()
# load the input image
img = cv2.imread('./imagesStackoverflow/engraved_text2.jpg',0);
show(img)
# apply CLAHE to adjust the contrast
clahe = cv2.createCLAHE(clipLimit=5.1, tileGridSize=(5,3))
cl1 = clahe.apply(img)
img = cl1.copy()
show(img)
img = cv2.GaussianBlur(img,(3,3), 1)
ret, mask = cv2.threshold(img, 125, 150, cv2.THRESH_BINARY) # turn 125, 150 for the best OCR results
kernel = np.ones((5,3),np.uint8)
mask = cv2.erode(mask,kernel,iterations = 1)
show(mask)
# I used a version of OpenCV with Tesseract, you may use your pytesseract and set the modes as:
# Page Segmentation mode (PSmode) = 11 (defualt = 3)
# OCR Enginer Mode (OEM) = 3 (defualt = 3)
tesser = cv2.text.OCRTesseract_create('C:/Program Files/Tesseract 4.0.0/tessdata/','eng','0123456789',11,3)
retval = tesser.run(mask, 0) # return string type
print 'OCR:' + retval