Understanding Handwritten digit by computer - python

i would like to ask you one question : wanted to implement a code which clarifies a picture done by hand ( by pen), let us consider such image
it is done by blue pen, which should be converted to the gray scale image using following code
from PIL import Image
user_test = filename
col = Image.open(user_test)
gray = col.convert('L')
bw = gray.point(lambda x: 0 if x<100 else 255, '1')
bw.save("bw_image.jpg")
bw
img_array = cv2.imread("bw_image.jpg", cv2.IMREAD_GRAYSCALE)
img_array = cv2.bitwise_not(img_array)
print(img_array.size)
plt.imshow(img_array, cmap = plt.cm.binary)
plt.show()
img_size = 28
new_array = cv2.resize(img_array, (img_size,img_size))
plt.imshow(new_array, cmap = plt.cm.binary)
plt.show()
idea is that i am taking image from camera directly, but it is losing structure of digit and comes only empty and black picture, like this
therefore computer can't understand which digit it is and neural networks fails to predict its label correctly, could you please tell me which transformation should i apply in order to detect this image much more precisely ?
edit :
i have apply following code
from PIL import Image
user_test = filename
col = Image.open(user_test)
gray = col.convert('L')
plt.hist(img_array)
plt.show()
and got

You have several issues here, and you can methodically address them.
First of all you're having an issue with thresholding properly.
As I suggested in earlier comments, you can easily see why your original thresholding was unsuccessful.
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
from matplotlib import cm
im = Image.open('whatever_path_you_choose.jpg').convert("L")
im = np.asarray(im)
plt.hist(im.flatten(), bins=np.arange(255));
Looking at the image you gave:
Clearly the threshold should be somewhere between 100-200, not as in your original code. Also note that this distribution isn't very bimodal - so I'm not sure otsu's method would work well here.
If we eyeball it (this can be tuned), we can see that thresholding at 145-ish gives decent results in terms of segmentation.
im_thresh = (im >= 145)
plt.imshow(im_thresh, cmap=cm.gray)
Now you might have an additional issue that you have horizontal lines, you can address this by writing on blank paper as suggested. This wasn't exactly your question but I will try to address it anyways (in a naive fashion). You can try a naive solution of using a sobel filter (think of it as the derivative of the image to get the lines), followed by a median filter to get the approximately most common pixel intensity - the size of the filter might have to vary for different digits though. This should clear up some of the lines. For a more rigorous approach try reading up on hough line transform for detecting horizontal lines and try to whiten them out.
This is my very naive approach:
from skimage.filters import sobel
from scipy.ndimage import median_filter
#Sobel filter reverses intensities so subtracting the result from 1.0 turns it back to the original
plt.imshow(1.0 - median_filter(sobel(im_thresh), [10, 3]), cmap=cm.gray)
You can try cropping automatically afterwards. Honestly I think most neural networks that could recognize MNIST-like digits could recognize the result I posted at the end as well.

Try using skimage package like this. This has inbuilt functions for image processing:
from skimage import io
from skimage.restoration import denoise_tv_chambolle
from skimage.filters import threshold_otsu
image = io.imread('path/to/your/image', as_gray=True)
# Denoising
denoised_image = denoise_tv_chambolle(image, weight=0.1, multichannel=True)
# Thresholding
threshold = threshold_otsu(denoised_image)
thresholded_image = denoised_image > threshold

Related

What is the difference between opencv ximgproc.slic and skimage segmentation.slic?

I run the SLIC (Simple Linear Iterative Clustering) superpixels algorithm from opencv and skimage on the same picture with, but got different results, the skimage slic result is better, Shown in the picture below.First one is opencv SLIC, the second one is skimage SLIC. I got several questions hope someonc can help.
Why opencv have the parameter 'region_size' while skimage is 'n_segments'?
Is convert to LAB and a guassian blur necessary?
Is there any trick to optimize the opecv SLIC result?
===================================
OpenCV SLIC
Skimage SLIC
# Opencv
src = cv2.imread('pic.jpg') #read image
# gaussian blur
src = cv2.GaussianBlur(src,(5,5),0)
# Convert to LAB
src_lab = cv.cvtColor(src,cv.COLOR_BGR2LAB) # convert to LAB
# SLIC
cv_slic = ximg.createSuperpixelSLIC(src_lab,algorithm = ximg.SLICO,
region_size = 32)
cv_slic.iterate()
# Skimage
src = io.imread('pic.jpg')
sk_slic = skimage.segmentation.slic(src,n_segments = 256, sigma = 5)
Image with superpixels centroid generated with the code below
# Measure properties of labeled image regions
regions = regionprops(labels)
# Scatter centroid of each superpixel
plt.scatter([x.centroid[1] for x in regions], [y.centroid[0] for y in regions],c = 'red')
but there is one superpixel less(top-left corner), and I found that
len(regions) is 64 while len(np.unique(labels)) is 65 , why?
I'm not sure why you think skimage slic is better (and I maintain skimage! 😂), but:
different parameterizations are common in mathematics and computer science. Whether you use region size or number of segments, you should get the same result. I expect the formula to convert between the two will be something like n_segments = image.size / region_size.
The original paper suggests that for natural images (meaning images of the real world like you showed, rather than e.g. images from a microscope or from astronomy), converting to Lab gives better results.
to me, based on your results, it looks like the gaussian blur used for scikit-image was higher than for openCV. So you could make the results more similar by playing with the sigma. I also think the compactness parameter is probably not identical between the two.

How to Improve OCR on image with text in different colors and fonts?

I'm using the Google Vision API to extract the text from some pictures, however, I have been trying to improve the accuracy (confidence) of the results with no luck.
every time I change the image from the original I lose accuracy in detecting some characters.
I have isolated the issue to have multiple colors for different words with can be seen that words in red for example have incorrect results more often than the other words.
Example:
some variations on the image from gray scale or b&w
What ideas can I try to make this work better, specifically changing the colors of text to a uniform color or just black on a white background since most algorithms expect that?
some ideas I already tried, also some thresholding.
dimg = ImageOps.grayscale(im)
cimg = ImageOps.invert(dimg)
contrast = ImageEnhance.Contrast(dimg)
eimg = contrast.enhance(1)
sharp = ImageEnhance.Sharpness(dimg)
eimg = sharp.enhance(1)
I can only offer a butcher's solution, potentially a nightmare to maintain.
In my own, very limited scenario, it worked like a charm where several other OCR engines either failed or had unacceptable running times.
My prerequisites:
I knew exactly in which area of the screen the text was going to go.
I knew exactly which fonts and colors were going to be used.
the text was semitransparent, so the underlying image interfered, and it was a variable image to boot.
I could not detect reliably text changes to average frames and reduce the interference.
What I did:
- I measured the kerning width of each character. I only had A-Za-z0-9 and a bunch of punctuation characters to worry about.
- The program would start at position (0,0), measure the average color to determine the color, then access the whole set of bitmaps generated from characters in all available fonts in that color. Then it would determine which rectangle was closest to the corresponding rectangle on the screen, and advance to the next one.
(Months later, requiring more performances, I added a varying probability matrix to test first the most likely characters).
In the end, the resulting C program was able to read the subtitles out of the video stream with 100% accuracy in real time.
You tried almost every standard step. I would advise you to try some PIL built-in filters like sharpness filter. Apply sharpness and contrast on the RGB image, then binarise it. Perhaps use Image.split() and Image.merge() to binarise each colour separately and then bring them back together.
Or convert your image to YUV and then use just Y channel for further processing.
Also, if you do not have a monochrome background consider performing some background substraction.
What tesseract likes when detecting scanned text is removed frames, so you can try to destroy as much of non character space from the image. (You might need to keep the picture size though, so you should replace it with white colour). Tesseract also likes straight lines. So some deskewing might be in order if your text is recorded at an angle. Tesseract also sometimes gives better results if you resize the image to twice its original size.
I suspect that Google Vision uses tesseract, or portions of it, but what other preprocessing it does for you I have no idea. So some of my advices here might actually be implemented already and doing them would be unnecessary and repetitive.
You will need to pre-process the image more than once, and use a bitwise_or operation to combine the results. To extract the colors, you could use
import cv2
boundaries = [ #BGR colorspace for opencv, *not* RGB
([15, 15, 100], [50, 60, 200]), #red
([85, 30, 2], [220, 90, 50]), #blue
([25, 145, 190], [65, 175, 250]), #yellow
]
for (low, high) in boundaries:
low = np.array(low, dtype = "uint8")
high = np.array(high, dtype = "uint8")
# find the colors within the specified boundaries and apply
# the mask
mask = cv2.inRange(image, low, high)
bitWise = cv2.bitwise_and(image, image, mask=mask)
#now here is the image masked with the specific color boundary...
Once you have the masked image, you can do another bitwise_or operation on your to-be "final" image, essentially adding this mask to it.
but this specific implementation requires opencv, however the same principle applies for other image packages.
I need a little more context on this.
How many calls are you going to do to the Google Vision API? If you are doing this throughout a whole stream, you'd probably need to get a paid subscription.
What are you going to do with this data? How accurate does the OCR need to be?
Assuming you get this snapshot from another's twitch stream, dealing with the streamer's video compression and network connectivity, you're going to get pretty blurry snapshot, so OCR is going to be pretty tough.
The image is far too blurry because of video compression, so even preprocessing the image to improve quality may not get the image quality high enough for accurate OCR. If you are set on OCR, one approach you could try:
Binarize the image to get the non-red text in white and background black as in your binarized image:
from PIL import Image
def binarize_image(im, threshold):
"""Binarize an image."""
image = im.convert('L') # convert image to monochrome
bin_im = image.point(lambda p: p > threshold and 255)
return bin_im
im = Image.open("game_text.JPG")
binarized = binarize_image(im, 100)
Extract only the red text values with a filter, then binarize it:
import cv2
from matplotlib import pyplot as plt
lower = [15, 15, 100]
upper = [50, 60, 200]
lower = np.array(lower, dtype = "uint8")
upper = np.array(upper, dtype = "uint8")
mask = cv2.inRange(im, lower, upper)
red_binarized = cv2.bitwise_and(im, im, mask = mask)
plt.imshow(cv2.cvtColor(red_binarized, cv2.COLOR_BGR2RGB))
plt.show()
However, even with this filtering, it still doesn't extract red well.
Add images obtained in (1.) and (2.).
combined_image = binarized + red_binarized
Do OCR on (3.)
This is not a full solution but it may drive to something better.
By converting your data from BGR (or RGB) to CIE-Lab you can process a grayscale image as the weighted sum of the colour channels a* and b*.
This grayscale image will enhance colour regions of the text.
But adapting the threshold you can from this grayscale image segment the coloured word in your original image and get the other words from the a L channel thresholding.
A bitwise and operator should be enough to merge to two segmentation image.
If you can have an image with a better contrast a very last step could be a filling based on the contours.
For that take a look to RETR_FLOODFILL of the function 'cv2.findContours'.
Any other hole filing function from other package may also fit for that purpose.
Here is a code that show the first part of my idea.
import cv2
import numpy as np
from matplotlib import pyplot as plt
I = cv2.UMat(cv2.imread('/home/smile/QSKN.png',cv2.IMREAD_ANYCOLOR))
Lab = cv2.cvtColor(I,cv2.COLOR_BGR2Lab)
L,a,b = cv2.split(Lab)
Ig = cv2.addWeighted(cv2.UMat(a),0.5,cv2.UMat(b),0.5,0,dtype=cv2.CV_32F)
Ig = cv2.normalize(Ig,None,0.,255.,cv2.NORM_MINMAX,cv2.CV_8U)
#k = np.ones((3,3),np.float32)
#k[2,2] = 0
#k*=-1
#
#Ig = cv2.filter2D(Ig,cv2.CV_32F,k)
#Ig = cv2.absdiff(Ig,0)
#Ig = cv2.normalize(Ig,None,0.,255.,cv2.NORM_MINMAX,cv2.CV_8U)
_, Ib = cv2.threshold(Ig,0.,255.,cv2.THRESH_OTSU)
_, Lb = cv2.threshold(cv2.UMat(L),0.,255.,cv2.THRESH_OTSU)
_, ax = plt.subplots(2,2)
ax[0,0].imshow(Ig.get(),cmap='gray')
ax[0,1].imshow(L,cmap='gray')
ax[1,0].imshow(Ib.get(),cmap='gray')
ax[1,1].imshow(Lb.get(),cmap='gray')
import numpy as np
from skimage.morphology import selem
from skimage.filters import rank, threshold_otsu
from skimage.util import img_as_float
from PIL import ImageGrab
import matplotlib.pyplot as plt
def preprocessing(image, strelem, s0=30, s1=30, p0=.3, p1=1.):
image = rank.mean_bilateral(image, strelem, s0=s0, s1=s1)
condition = (lambda x: x>threshold_otsu(x))(rank.maximum(image, strelem))
normalize_image = rank.autolevel_percentile(image, strelem, p0=p0, p1=p1)
return np.where(condition, normalize_image, 0)
#Grab image from clipboard
image = np.array(ImageGrab.grabclipboard())
sel = selem.disk(4)
a = sum([img_as_float(preprocessing(image[:, :, x], sel, p0=0.3)) for x in range(3)])/3
fig, ax = plt.subplots(1, 2, sharey=True, sharex=True)
ax[0].imshow(image)
ax[1].imshow(rank.autolevel_percentile(a, sel, p0=.4))
This is my code for clearing text from noise and creating uniform brightness for characters.
With minor modifications, I used it to solve your problem.

Difficulty in detected ellipses in image

I am trying to detect ellipses in some images.
After some functions I got this edges map:
I tried using Hough transform to detect ellipses, but this transform has very high complexity, so my computer didn't finish running the transform command even after 5 hours(!).
I also tried doing connected components and got this:
In last case I also tried continue and binarized the image.
In all cases I am stuck in these steps, and have no idea how continue from here.
My mission is detect tomatoes in the image. I am approaching this by trying to detect circles and ellipses and find the radius (or average radius in ellipses case) for each one.
edited:
I add my code for the first method (the result is edge map from above):
img = cv2.imread(r'../images/assorted_tomatoes.jpg')
gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
imgAfterLight=lightreduce(img)
imgAfterGamma=gamma_correctiom(imgAfterLight,0.8)
th2 = 255 - cv2.adaptiveThreshold(imgAfterGamma,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,cv2.THRESH_BINARY,5,3)
median2 = cv2.medianBlur(th2,3)
where median2 is the result of shown above in edge map
and the code for connected components:
import scipy
from scipy import ndimage
import matplotlib.pyplot as plt
import cv2
import numpy as np
fname=r'../images/assorted_tomatoes.jpg'
blur_radius = 1.0
threshold = 50
img = scipy.misc.imread(fname) # gray-scale image
gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
print(img.shape)
# smooth the image (to remove small objects)
imgf = ndimage.gaussian_filter(gray_img, blur_radius)
threshold = 80
# find connected components
labeled, nr_objects = ndimage.label(imgf > threshold)
where labeled is the result above
another edit:
this is the input image:
Input
The problem is that after edge detection, there are a lot of unnecessary edges in sub regions that disturbing for make smooth edge map
To me this looks like a classic problem for the watershed algorithm. It is designed for segmenting out touching objects like the tomatoes. My example is in Matlab (I'm on the wrong computer today) but it should translate to python easily. First convert to greyscale as you do and then invert the images
I=rgb2gray(img)
I2=imcomplement(I)
The image as is will over segment, so we remove minima that are too shallow. This can be done with the h-minima transform
I3=imhmin(I2,50);
You might need to play with the 50 value which is the height threshold for suppressing shallow minima. Now run the watershed algorithm and we get the following result.
L=watershed(I3);
The results are not perfect. It needs additional logic to remove some of the small regions, but it will give a reasonable estimate. The watershed and h-minima are contained in the skimage.morphology package in python.

Thinning/Skeletenization is distorting my image

I am trying to thin this image but it keeps getting distorted.
This is my relevant code for applying the thinning. I have also tried the 'thin' function instead of 'skeletonize' but the results are similar.
from skimage.morphology import skeletonize, thin
new_im = cv2.imread(im_pth)
gray = cv2.cvtColor(new_im, cv2.COLOR_BGR2GRAY)
ske = (skeletonize(gray//255) * 255).astype(np.uint8)
cv2.imshow("image", gray)
cv2.waitKey(0)
cv2.destroyAllWindows()
My goal is to get a shape similar to this after thinning:
What am I doing wrong? I have read online that sometimes jpg files cause issues however I don't have the experience in this field to confirm that.
I'm not sure if your conversion from input image to binary is correct. Here's a version using scikit-image functions that seems to do what you want:
from skimage import img_as_float
from skimage import io, color, morphology
import matplotlib.pyplot as plt
image = img_as_float(color.rgb2gray(io.imread('char.png')))
image_binary = image < 0.5
out_skeletonize = morphology.skeletonize(image_binary)
out_thin = morphology.thin(image_binary)
f, (ax0, ax1, ax2) = plt.subplots(1, 3, figsize=(10, 3))
ax0.imshow(image, cmap='gray')
ax0.set_title('Input')
ax1.imshow(out_skeletonize, cmap='gray')
ax1.set_title('Skeletonize')
ax2.imshow(out_thin, cmap='gray')
ax2.set_title('Thin')
plt.savefig('/tmp/char_out.png')
plt.show()
From your example, and since your image is binary, I think that what you want to do is better achieved via (binary) erosion. Wikipedia explains the concept well. Intuitively (in case you don't have time to read the wikipedia link), imagine you have a binary image A, like the one you have given, and let's call A_1 the set of pixels of A that have a value of 1. Then, you define a "structuring element" K, which for example can be a square patch of size n*n. Then in pseudocode
for pixel in A_1:
center K at pixel, and call this centered version K_pixel
if(K_pixel is contained in A_1):
keep pixel
else:
discard pixel
So, this has the effect of thinning the connected component in your image.
This function is standard and is implemented in opencv, here are some python examples, and here is a link to the documentation (c++).

Remove features from binarized image

I wrote a little script to transform pictures of chalkboards into a form that I can print off and mark up.
I take an image like this:
Auto-crop it, and binarize it. Here's the output of the script:
I would like to remove the largest connected black regions from the image. Is there a simple way to do this?
I was thinking of eroding the image to eliminate the text and then subtracting the eroded image from the original binarized image, but I can't help thinking that there's a more appropriate method.
Sure you can just get connected components (of certain size) with findContours or floodFill, and erase them leaving some smear. However, if you like to do it right you would think about why do you have the black area in the first place.
You did not use adaptive thresholding (locally adaptive) and this made your output sensitive to shading. Try not to get the black region in the first place by running something like this:
Mat img = imread("desk.jpg", 0);
Mat img2, dst;
pyrDown(img, img2);
adaptiveThreshold(255-img2, dst, 255, ADAPTIVE_THRESH_MEAN_C,
THRESH_BINARY, 9, 10); imwrite("adaptiveT.png", dst);
imshow("dst", dst);
waitKey(-1);
In the future, you may read something about adaptive thresholds and how to sample colors locally. I personally found it useful to sample binary colors orthogonally to the image gradient (that is on the both sides of it). This way the samples of white and black are of equal size which is a big deal since typically there are more background color which biases estimation. Using SWT and MSER may give you even more ideas about text segmentation.
I tried this:
import numpy as np
import cv2
im = cv2.imread('image.png')
gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
grayout = 255*np.ones((im.shape[0],im.shape[1],1), np.uint8)
blur = cv2.GaussianBlur(gray,(5,5),1)
thresh = cv2.adaptiveThreshold(blur,255,1,1,11,2)
contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)
wcnt = 0
for item in contours:
area =cv2.contourArea(item)
print wcnt,area
[x,y,w,h] = cv2.boundingRect(item)
if area>10 and area<200:
roi = gray[y:y+h,x:x+w]
cntd = 0
for i in range(x,x+w):
for j in range(y,y+h):
if gray[j,i]==0:
cntd = cntd + 1
density = cntd/(float(h*w))
if density<0.5:
for i in range(x,x+w):
for j in range(y,y+h):
grayout[j,i] = gray[j,i];
wcnt = wcnt + 1
cv2.imwrite('result.png',grayout)
You have to balance two things, removing the black spots but balance that with not losing the contents of what is on the board. The output I got is this:
Here is a Python numpy implementation (using my own mahotas package) of the method for the top answer (almost the same, I think):
import mahotas as mh
import numpy as np
Imported mahotas & numpy with standard abbreviations
im = mh.imread('7Esco.jpg', as_grey=1)
Load the image & convert to gray
im2 = im[::2,::2]
im2 = mh.gaussian_filter(im2, 1.4)
Downsample and blur (for speed and noise removal).
im2 = 255 - im2
Invert the image
mean_filtered = mh.convolve(im2.astype(float), np.ones((9,9))/81.)
Mean filtering is implemented "by hand" with a convolution.
imc = im2 > mean_filtered - 4
You might need to adjust the number 4 here, but it worked well for this image.
mh.imsave('binarized.png', (imc*255).astype(np.uint8))
Convert to 8 bits and save in PNG format.

Categories