I am trying to extract text from the below image. I tried OCR in python. But it is giving me incorrect results.
I preprocessed the image removed the underline, used canny edge detector increased contrast ratio and then feed it to OCR. Still, I am not getting expected output.
With limited knowledge, I tried to separate characters out of image after increasing contrast.
import cv2
import numpy as np
import os
image_path = os.path.join(os.path.dirname(__file__), "image.png")
im = cv2.imread(image_path)
gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
# converted intermediate pixels to black and white
gray[gray<100] = 0
gray[gray>=100] = 255
gray = gray[~np.all(gray == 255, axis=1)]
gray = gray[:,~np.all(gray == 255, axis=0)]
gray = gray[~np.all(gray == 0, axis=1)]
print (np.where(np.all(gray == 255,axis=0)))
print (gray[:,20:33])
words = np.hsplit(gray, np.where(np.all(gray == 255,axis=0))[0])
i = 0
for word in words:
word = word[:,~np.all(word == 255, axis=0)]
if(word.size):
print (word.shape)
i = i + 1
cv2.imwrite("temp" + str(i) + ".png", word)
It became like this
And again I gave this as input to pytesseract. It gave me blank output.
Here are my doubts.
Can we have a better mechanism to separate characters on white-space from image. Currently it seems highly breakable to me.
How can we pre-process image to be better detected by OCR.
Can we use neural-networks or SVM over here like we used for MNIST Digits dataset
Short pointers are ok if it seems too broad. What is the best approach to tackle this kind of problem?
This answer implements what is said in my comment.
I changed your code a little and refrained form using opencv. The code is written using Python 3.5
To extract the digits, I am summing the image columnwise and scale the resulting array to get check. I am here operating on the gray image that you already cut, effectively getting rid of the underline.
x_sum = np.sum(gray, axis = 0)
check = ((x_sum)/np.max(x_sum)*10)
This array can now be used to compare with a threshold to identify the regions where a letter/digit is located such as:
plt.imshow(gray, cmap='gray')
x_sum = np.sum(gray, axis = 0)
check = ((x_sum)/np.max(x_sum)*10)
plt.plot((check<8).astype(int))
plt.show()
Now we will use this information to modify the image and erase the regions where the check array is valued 0 such as:
for idx,i in enumerate((check<8).astype(int)):
if i < 1:
gray[:,idx] = 255
Therefore we have this image:
Which can be further processed just are you are already doing. This provides seperated letters/digits which can then be postprocessed for learning.
The next step that you would work on is scaling/resizing the letters/images to be described by the same amount of features.
Then finally, you can use a pretrained classifier to predict the most probable letter/digits.
The full code is provided here:
import numpy as np
import os
import matplotlib.pyplot as plt
from scipy.stats import mstats
import scipy
from matplotlib import gridspec
from PIL import Image
image = Image.open("testl.png")
f = image.convert('I')
gray = np.array(f)
gray[gray<200] = 0
gray[gray>=200] = 255
gray = gray[~np.all(gray == 255, axis=1)]
gray = gray[:,~np.all(gray == 255, axis=0)]
gray = gray[~np.all(gray == 0, axis=1)]
plt.imshow(gray, cmap='gray')
x_sum = np.sum(gray, axis = 0)
check = ((x_sum)/np.max(x_sum)*10)
plt.plot((check<8).astype(int))
plt.show()
plt.matshow(gray)
plt.show()
for idx,i in enumerate((check<8).astype(int)):
if i < 1:
gray[:,idx] = 255
plt.matshow(gray)
plt.show()
words = np.hsplit(gray, np.where(np.all(gray >= 200,axis=0))[0])
gs = gridspec.GridSpec(1,len(words))
fig = plt.figure(figsize=(len(words),1))
i = 0
for word in words:
word = word[:,~np.all(word >= 230, axis=0)]
if(word.size):
ax = fig.add_subplot(gs[i])
print (word.shape)
i = i + 1
ax.matshow(word, aspect = 'auto')
plt.show()
This finally yields all seperated letters/digits such as:
Related
I am working with 3D CT images and trying to remove the lines from the bed.
A slice from the original Image:
Following is my code to generate the mask:
segmentation = morphology.dilation(image_norm, np.ones((1, 1, 1)))
labels, label_nb = ndimage.label(segmentation)
label_count = np.bincount(labels.ravel().astype(int))
label_count[0] = 0
mask = labels == label_count.argmax()
mask = morphology.dilation(mask, np.ones((40, 40, 40)))
mask = ndimage.morphology.binary_fill_holes(mask)
mask = morphology.dilation(mask, np.ones((1, 1, 1)))
This results in the following image:
As you can see, in the above image the CT scan as distorted as well.
If I change: mask = morphology.dilation(mask, np.ones((40, 40, 40))) to mask = morphology.dilation(mask, np.ones((100, 100, 100))), the resulting image is as follows:
How can I remove only the two lines under the image without changing the image area? Any help is appreciated.
You've probably found another solution by now. Regardless, I've seen similar CT processing questions on SO, and figured it would be helpful to demonstrate a Scikit-Image solution. Here's the end result.
Here's the code to produce the above images.
from skimage import io, filters, color, morphology
import matplotlib.pyplot as plt
import numpy as np
image = color.rgba2rgb(
io.imread("ctimage.png")[9:-23,32:-9]
)
gray = color.rgb2gray(image)
tgray = gray > filters.threshold_otsu(gray)
keep_mask = morphology.remove_small_objects(tgray,min_size=463)
keep_mask = morphology.remove_small_holes(keep_mask)
maskedimg = np.einsum('ijk,ij->ijk',image,keep_mask)
fig,axes = plt.subplots(ncols=3)
image_list = [image,keep_mask,maskedimg]
title_list = ["Original","Mask","Imgage w/mask"]
for i,ax in enumerate(axes):
ax.imshow(image_list[i])
ax.set_title(title_list[i])
ax.axis("off")
fig.tight_layout()
Notes on code
image = color.rgba2rgb(
io.imread("ctimage.png")[9:-23,32:-9]
)
gray = color.rgb2gray(image)
The image saved as RGBA when I loaded it from SO. It needs to be in grayscale for use in the threshold function.
Your image might already by in grayscale.
Also, the downloaded image showed axis markings. That's why I've trimmed the image.
maskedimg = np.einsum('ijk,ij->ijk',image,keep_mask)
I wanted to apply keep_mask to every channel of the RGB image. The mask is a 2D array, and the image is a 3D array. I referenced this previous question in order to apply the mask to the image.
I'm trying to use Tesseract-OCR to get the readings on below images but having issues getting consistent results with the spotted background. I have below configuration on my pytesseract
CONFIG = f"—psm 6 -c tessedit_char_whitelist=01234567890ABCDEFGHIJKLMNOPQRSTUVWXYZÅÄabcdefghijklmnopqrstuvwxyzåäö.,-"
I have also tried below image pre-processing with some good results, but still not perfect results
blur = cv2.blur(img,(4,4))
(T, threshInv) = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
What I want is to consistently be able to identify the numbers and the decimal separator. What image pre-processing could help in getting consistent results on images as below?
That was a challenge but i think i have an interesting approach: Pattern-matching
If you zoom in, you realize that the pattern in the back only has 4 possible dots, a single full pixle, a double full pixel and a double pixel with a medium left or right. So what i did was grab these 4 patterns from the image with 17.160.000,00 and got to work. Save these to load again, i just grabbed them on the fly
img = cv2.imread('C:/Users/***/17.jpg', cv2.IMREAD_GRAYSCALE)
pattern_1 = img[2:5,1:5]
pattern_2 = img[6:9,5:9]
pattern_3 = img[6:9,11:15]
pattern_4 = img[9:12,22:26]
# just to show it carries over to other pics ;)
img = cv2.imread('C:/Users/****/6.jpg', cv2.IMREAD_GRAYSCALE)
Actual Pattern Matching
Next we match all the patterns and threshold to find all occurrences, i used 0.7 but you can play around with it a little. These patterns take off some pixels on the side and only match a sigle pixel on the left so we pad twice (one with an extra) to hit both for the first 3 patterns. The last one is the single pixel so it doesnt need it
res_1 = cv2.matchTemplate(img,pattern_1,cv2.TM_CCOEFF_NORMED )
thresh_1 = cv2.threshold(res_1,0.7,1,cv2.THRESH_BINARY)[1].astype(np.uint8)
pat_thresh_1 = np.pad(thresh_1,((1,1),(1,2)),'constant')
pat_thresh_15 = np.pad(thresh_1,((1,1),(2,1)), 'constant')
res_2 = cv2.matchTemplate(img,pattern_2,cv2.TM_CCOEFF_NORMED )
thresh_2 = cv2.threshold(res_2,0.7,1,cv2.THRESH_BINARY)[1].astype(np.uint8)
pat_thresh_2 = np.pad(thresh_2,((1,1),(1,2)),'constant')
pat_thresh_25 = np.pad(thresh_2,((1,1),(2,1)), 'constant')
res_3 = cv2.matchTemplate(img,pattern_3,cv2.TM_CCOEFF_NORMED )
thresh_3 = cv2.threshold(res_3,0.7,1,cv2.THRESH_BINARY)[1].astype(np.uint8)
pat_thresh_3 = np.pad(thresh_3,((1,1),(1,2)),'constant')
pat_thresh_35 = np.pad(thresh_3,((1,1),(2,1)), 'constant')
res_4 = cv2.matchTemplate(img,pattern_4,cv2.TM_CCOEFF_NORMED )
thresh_4 = cv2.threshold(res_4,0.7,1,cv2.THRESH_BINARY)[1].astype(np.uint8)
pat_thresh_4 = np.pad(thresh_4,((1,1),(1,2)),'constant')
Editing the Image
Now the only thing left to do is remove all the matches from the image. Since we have a mostly white backround we just set them to 255 to blend in.
img[pat_thresh_1==1] = 255
img[pat_thresh_15==1] = 255
img[pat_thresh_2==1] = 255
img[pat_thresh_25==1] = 255
img[pat_thresh_3==1] = 255
img[pat_thresh_35==1] = 255
img[pat_thresh_4==1] = 255
Output
Edit:
Take a look at Abstracts answer as well for refining this output and tesseract finetuning
You may find a solution using a slightly more complex approach by filtering in the frequency domain instead of the spatial domain. The thresholds might require some tweaking depending on how tesseract performs with the output images.
Implementation:
import cv2
import numpy as np
from matplotlib import pyplot as plt
img = cv2.imread('C:\\Test\\number.jpg', cv2.IMREAD_GRAYSCALE)
# Perform 2D FFT
f = np.fft.fft2(img)
fshift = np.fft.fftshift(f)
magnitude_spectrum = 20*np.log(np.abs(fshift))
# Squash all of the frequency magnitudes above a threshold
for idx, x in np.ndenumerate(magnitude_spectrum):
if x > 195:
fshift[idx] = 0
# Inverse FFT back into the real-spatial-domain
f_ishift = np.fft.ifftshift(fshift)
img_back = np.fft.ifft2(f_ishift)
img_back = np.real(img_back)
img_back = cv2.normalize(img_back, None, alpha=0, beta=255, norm_type=cv2.NORM_MINMAX, dtype=cv2.CV_32F)
out_img = np.copy(img)
# Use the inverted FFT image to keep only the black values below a threshold
for idx, x in np.ndenumerate(img_back):
if x < 100:
out_img[idx] = 0
else:
out_img[idx] = 255
plt.subplot(131),plt.imshow(img, cmap = 'gray')
plt.title('Input Image'), plt.xticks([]), plt.yticks([])
plt.subplot(132),plt.imshow(img_back, cmap = 'gray')
plt.title('Reversed FFT'), plt.xticks([]), plt.yticks([])
plt.subplot(133),plt.imshow(out_img, cmap = 'gray')
plt.title('Output'), plt.xticks([]), plt.yticks([])
plt.show()
Output:
Median Blur Implementation:
import cv2
import numpy as np
img = cv2.imread('C:\\Test\\number.jpg', cv2.IMREAD_GRAYSCALE)
blur = cv2.medianBlur(img, 3)
for idx, x in np.ndenumerate(blur):
if x < 20:
blur[idx] = 0
cv2.imshow("Test", blur)
cv2.waitKey()
Output:
Final Edit:
So using Eumel's solution and combining this bit of code on the bottom of it yields a 100% successful result:
img[pat_thresh_1==1] = 255
img[pat_thresh_15==1] = 255
img[pat_thresh_2==1] = 255
img[pat_thresh_25==1] = 255
img[pat_thresh_3==1] = 255
img[pat_thresh_35==1] = 255
img[pat_thresh_4==1] = 255
# Eumel's code above this line
img = cv2.erode(img, np.ones((3,3)))
cv2.imwrite("out.png", img)
cv2.imshow("Test", img)
print(pytesseract.image_to_string(Image.open("out.png"), lang='eng', config='--psm 10 --oem 3 -c tessedit_char_whitelist=0123456789.,'))
Output Image Examples:
Whitelisting the tesseract characters appears to help quite a bit as well to prevent false identification.
My goal is to detect objects placed on a white surface. From there, count how many there are and calculate the area of each one.
It seems that this algorithm is detecting its edge but counting it as multiple objects.
original picture
picture after edge detection
part of the picture with problems
results
In short, I am using "canny" and "connected components" and I am getting fractional objects instead just a whole object.
Following code should do the job, you might need to tweak minItemArea and maxItemArea to filter objects.
import numpy as np
import cv2
import matplotlib.pyplot as plt
rgb = cv2.imread('/path/to/your/image/items_0001.png')
gray = cv2.cvtColor(rgb, cv2.COLOR_BGR2GRAY)
imh, imw = gray.shape
th = cv2.adaptiveThreshold(gray,255, cv2.ADAPTIVE_THRESH_MEAN_C,cv2.THRESH_BINARY_INV,21,5)
contours, hier = cv2.findContours(th.copy(),cv2.RETR_CCOMP,cv2.CHAIN_APPROX_SIMPLE)
out_img = rgb.copy()
minItemArea = 50
maxItemArea = 4000
for i in range(len(contours)):
if hier[0][i][3] != -1:
continue
x,y,w,h = cv2.boundingRect(contours[i])
if minItemArea < w*h < maxItemArea:
cv2.drawContours(out_img, [contours[i]], -1, 255, 1)
plt.imshow(out_img)
If an image is given , find out the unique colors in that image and write output images corresponding to each unique color.
In that all other pixels which don't have that unique color should me marked white.
for eg , if an image has 3 colors - in the output folder there should be three images where each color is separated. Using Open CV & Python.
I've created the unique color list using my methods. What I want is to give a count of all those unique colors in the sample.png image and give the corresponding images output as per the question.
I believe the code below (with comments) should help you with this!
Feel free to follow up if any of the code is unclear!
import numpy as np
import cv2 as cv
import matplotlib.pyplot as plt
from copy import deepcopy
# Load image and convert it from BGR (opencv default) to RGB
fpath = "dog.png" # TODO: replace with your path
IMG = cv.cvtColor(cv.imread(fpath), cv.COLOR_BGR2RGB)
# Get dimensions and reshape into (H * W, C) vector - i.e. a long vector, where each element is a tuple corresponding to a color!
H, W, C = IMG.shape
IMG_FLATTENED = np.vstack([IMG[:, w, :] for w in range(W)])
# Get unique colors using np.unique function, and their counts
colors, counts = np.unique(IMG_FLATTENED, axis=0, return_counts = True)
# Jointly loop through colors and counts
for color, count in zip(colors, counts):
print("COLOR: {}, COUNT: {}".format(color, count))
# Create placeholder image and mark all pixels as white
SINGLE_COLOR = (255 * np.ones(IMG.shape)).astype(np.uint8) # Make sure casted to uint8
# Compute binary mask of pixel locations where color is, and set color in new image
color_idx = np.all(IMG[..., :] == color, axis=-1)
SINGLE_COLOR[color_idx, :] = color
# Write file to output with color and counts specified
cv.imwrite("color={}_count={}.png".format(color, count), SINGLE_COLOR)
Ack, he beat me to it. Well, here's what I've got.
Oh no, I don't think the line
blank[img == color] = img[img == color]
behaves how I think it does. I think it just coincidentally works for this case. I'll edit the code with a solution I'm more confident works for all cases.
Original Image
import cv2
import numpy as np
# load image
img = cv2.imread("circles.png");
# get uniques
unique_colors, counts = np.unique(img.reshape(-1, img.shape[-1]), axis=0, return_counts=True);
# split off each color
splits = [];
for a in range(len(unique_colors)):
# get the color
color = unique_colors[a];
blank = np.zeros_like(img);
mask = cv2.inRange(img, color, color); # edited line 1
blank[mask == 255] = img[mask == color]; # edited line 2
# show
cv2.imshow("Blank", blank);
cv2.waitKey(0);
# save each color with its count
file_str = "";
for b in range(3):
file_str += str(color[b]) + "_";
file_str += str(counts[a]) + ".png";
cv2.imwrite(file_str, blank);
I am working on a project where I should apply and OCR on some documents.
The first step is to threshold the image and let only the writing (whiten the background).
Example of an input image: (For the GDPR and privacy reasons, this image is from the Internet)
Here is my code:
import cv2
import numpy as np
image = cv2.imread('b.jpg')
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
h = image.shape[0]
w = image.shape[1]
for y in range(0, h):
for x in range(0, w):
if image[y, x] >= 120:
image[y, x] = 255
else:
image[y, x] = 0
cv2.imwrite('output.jpg', image)
Here is the result that I got:
When I applied pytesseract to the output image, the results were not satisfying (I know that an OCR is not perfect). Although I tried to adjust the threshold value (in this code it is equal to 120), the result was not as clear as I wanted.
Is there a way to make a better threshold in order to only keep the writing in black and whiten the rest?
After digging deep in StackOverflow questions, I found this answer which is about removing watermark using opencv.
I adapted the code to my needs and this is what I got:
import numpy as np
import cv2
image = cv2.imread('a.png')
img = image.copy()
alpha =2.75
beta = -160.0
denoised = alpha * img + beta
denoised = np.clip(denoised, 0, 255).astype(np.uint8)
#denoised = cv2.fastNlMeansDenoising(denoised, None, 31, 7, 21)
img = cv2.cvtColor(denoised, cv2.COLOR_BGR2GRAY)
h = img.shape[0]
w = img.shape[1]
for y in range(0, h):
for x in range(0, w):
if img[y, x] >= 220:
img[y, x] = 255
else:
img[y, x] = 0
cv2.imwrite('outpu.jpg', img)
Here is the output image:
The good thing about this code is that it gives good results not only with this image, but also with all the images that I tested.
I hope it helps anyone who had the same problem.
You can use adaptive thresholding. From documentation :
In this, the algorithm calculate the threshold for a small regions of the image. So we get different thresholds for different regions of the same image and it gives us better results for images with varying illumination.
import numpy as np
import cv2
image = cv2.imread('b.jpg')
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
image = cv2.medianBlur(image ,5)
th1 = cv2.adaptiveThreshold(image,255,cv2.ADAPTIVE_THRESH_MEAN_C,\
cv2.THRESH_BINARY,11,2)
th2 = cv2.adaptiveThreshold(image,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,\
cv2.THRESH_BINARY,11,2)
cv2.imwrite('output1.jpg', th1 )
cv2.imwrite('output2.jpg', th2 )