I have this image and I want to read the text on it but pytesseract returns blank
import os
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"
import cv2
import numpy as np
import math
from scipy import ndimage
import easyocr
import pytesseract
img = cv2.imread('cikti.jpg')
scale_percent = 220 # percent of original size
width = int(img.shape[1] * scale_percent / 100)
height = int(img.shape[0] * scale_percent / 100)
dim = (width, height)
# resize image
img = cv2.resize(img, dim, interpolation = cv2.INTER_AREA)
cv2.imshow('img', img)
cv2.waitKey(0)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(gray, 50, 150, apertureSize=3)
cv2.imshow('edges', edges)
cv2.waitKey(0)
angles = []
lines = cv2.HoughLinesP(edges, 1, math.pi / 180.0, 90)
for [[x1, y1, x2, y2]] in lines:
#cv2.line(img, (x1, y1), (x2, y2), (255, 0, 0), 3)
angle = math.degrees(math.atan2(y2 - y1, x2 - x1))
if(angle != 0):
angles.append(angle)
print(angles)
median_angle = np.median(angles)
img = ndimage.rotate(img, median_angle)
print(median_angle)
filiter = np.array([[-1,-1,-1],
[-1,9,-1],
[-1,-1,-1]])
cv2.imshow('filitird', img)
cv2.waitKey(0)
reader = easyocr.Reader(['tr'])
ocr_result = reader.readtext(img,)
print(ocr_result)
cv2.imshow('result', img)
k = cv2.waitKey(0)
cv2.destroyAllWindows()
here is the code i wrote
It may be because of the long distance, but enlarging the picture did not solve my problem.
what should I do
I was able successfully to read this image with tesseract by doing the following:
cropping out the pink border
reducing to grayscale (binarising)
running tesseract with --psm 8 (see this question )
I don't know if the cropping is necessary, but I couldn't get any output at all with any page segregation mode before binarising.
I did the processing manually here, but you will likely want to automate it. A good trick for setting thresholds is to look at the standard deviation of the image in question and use that to scale your thresholds, rather than picking some absolute value and having it fail on you.
Here's the image I got working:
And the run:
$ tesseract img3.png img3 --psm 8 txt
Tesseract Open Source OCR Engine v4.1.1 with Leptonica
$ cat img3.txt
47 F02 43
I've not tried with pytesseract, but you should be able to set the same thing.
Easy ocr was able to read the image immediately, albeit inaccurately, when I tried with the web service
Update: grayscaling
This is a whole subject in itself. You might want to start with this tutorial from the opencv docs. There are basically two approaches--trying properly to binarise the image (convert it to two colour pixels, on or off) and just grayscaling it. Somewhere inbetween is 'posterising', where you reduce the number of tones (binarising is a special case of posterising, where the number of tones is 2). I normally handle grayscaling with the inbuilt function in PIL (pillow); I've had good result with a quick-and-dirty sort-of binarisation algorithm where I first normalise the brightness and contrast of an image and then apply a skewing function like
def filter_point(point: int) -> int:
if point < THRESH:
return round(point/1.2)
else:
return round(point *2)
This drives most pixels to fully white/black but leaves some intermediate values in place. It's a poor solution in that it depends on three magic numbers, but in my application (preparing scanned pdfs for human reading) I got better results than with automated thresholding or posterisation.
Thus sadly the answer is going to be 'play with it'. I'd suggest you start out with an image editor and see what the bare minimum you can do to the image to get tesseract to work is---perhaps just grayscaling (which you do earlier in the code anyway) will be enough; do you need to crop it, etc. Not drawing that pink box is going to help. I provided a very crude example filter to demonstrate that pixels are just numbers and you can do your image processing that way, but you are much better off using built in methods if you possibly can.
import os
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"
import cv2 as cv
import numpy as np
import easyocr
img = cv.imread('result.png',0)
th2 = cv.adaptiveThreshold(img, 255, cv.ADAPTIVE_THRESH_MEAN_C, cv.THRESH_BINARY, 29, 10);
cv.imshow("ADAPTIVE_THRESH_MEAN_C", th2)
cv.waitKey(0)
cv.destroyAllWindows()
reader = easyocr.Reader(['tr'])
ocr_result = reader.readtext(th2,)
print(ocr_result)
it worked like this
Image befor ocr :
Result :
Related
seems resolution of image effect the output is success or not
usually the image's resolution/quality from production line is like test image 1, instead of change camera quality, is there any way to make success rate higher? like improve code make simple AI to help detect or something? I need a hand thanks.
the demo .py code I found from tutorial
from PIL import Image
import pytesseract
img = Image.open('new_003.png')
text = pytesseract.image_to_string(img, lang='eng')
print("size")
print(img.size)
print(text)
(pic) test image 1: https://ibb.co/VLsM9LL
size
(122, 119)
# the output is:
R carac7
(pic) test image 2: https://ibb.co/XyRcf45
size
(329, 249)
# the output is:
R1 oun,
2A
R ca7ac2
(pic) test image 3: https://ibb.co/fNtDRc7
this one just for test but is the only one 100% correct
size
(640, 640)
# the output is:
BREAKING THE STATUE
i have always known
i just didn't understand
the inner conflictions
arresting our hands
gravitating close enough
expansive distamce between
i couldn't give you more
but i meant everything
when the day comes
you find your heart
wants something more
than a viece and a part
your life will change
like astatue set free
to walk among us
to created estiny
we didn't break any rules
we didn't make mistakes
making beauty in loving
making lovine for days
SHILOW
I tried to find out/proof the solution can only be the image resolution or there can be other alternative way to solve this issue
I try Dilation and Erosion to image, hoped can get more clear image for OCR recognize like link demo pic https://ibb.co/3pDgDnF
import cv2
import numpy as np
import matplotlib.pyplot as plt
import glob
from IPython.display import clear_output
def show_img(img, bigger=False):
if bigger:
plt.figure(figsize=(15,15))
image_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
plt.imshow(image_rgb)
plt.show()
def sharpen(img, sigma=100):
# sigma = 5、15、25
blur_img = cv2.GaussianBlur(img, (0, 0), sigma)
usm = cv2.addWeighted(img, 1.5, blur_img, -0.5, 0)
return usm
def img_processing(img):
# do something here
img = sharpen(img)
return img
img = cv2.imread("/home/joy/桌面/all_pic_OCR/simple_pic/03.png")
cv2.imshow('03', img) # Original image
img2 = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (11, 11))
img = cv2.dilate(img, kernel) # tried Dilation
cv2.imshow('image_after_Dilation', img) # image after Dilation
img = cv2.erode(img, kernel) # tried Erosion
cv2.imshow('then_Erosion', img) # image after Erosion
cv2.waitKey(0)
cv2.destroyAllWindows()
result: https://ibb.co/TbZjg3d
so still trying to achieve python OCR recognize image into text with 99.9999% correct
I have few images where I need to increase or decrease the contrast and brightness of the image in a dynamic way so that it is visible clearly. And the program needs to be dynamic so that it even works for new images also. I also want character should be dark.
I was able to increase brightness and contrast but it is not working properly for each image.
import cv2
import numpy as np
img = cv2.imread('D:\Bright.png')
image = cv2.GaussianBlur(img, (5, 5), 0)
#image = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY)[1]
#kernel = np.ones((2,1),np.uint8)
#dilation = cv2.dilate(img,kernel)
cv2.imshow('test', image)
cv2.waitKey(0)
cv2.destroyAllWindows()
imghsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
imghsv[:,:,2] = [[max(pixel - 25, 0) if pixel < 190 else min(pixel + 25, 255) for pixel in row] for row in imghsv[:,:,2]]
cv2.imshow('contrast', cv2.cvtColor(imghsv, cv2.COLOR_HSV2BGR))
#cv2.imwrite('D:\\112.png',cv2.cvtColor(imghsv, cv2.COLOR_HSV2BGR))
cv2.waitKey(0)
cv2.destroyAllWindows()
#raw_input()
I want a program which works fine for every image and words are a little darker so that they are easily visible.
As Tilarion suggested, you could try "Auto Brightness And Contrast" to see if it works well. The theory behind this is explained well here in the solution section. The solution is in C++. I've written a version of it in python which you can directly use, works only on 1 channel at a time for colour images:
def auto_brightandcontrast(input_img, channel, clip_percent=1):
histSize=180
alpha=0
beta=0
minGray=0
maxGray=0
accumulator=[]
if(clip_percent==0):
#min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(hist)
return input_img
else:
hist = cv2.calcHist([input_img],[channel],None,[256],[0, 256])
accumulator.insert(0,hist[0])
for i in range(1,histSize):
accumulator.insert(i,accumulator[i-1]+hist[i])
maxx=accumulator[histSize-1]
minGray=0
clip_percent=clip_percent*(maxx/100.0)
clip_percent=clip_percent/2.0
while(accumulator[minGray]<clip_percent[0]):
minGray=minGray+1
maxGray=histSize-1
while(accumulator[maxGray]>=(maxx-clip_percent[0])):
maxGray=maxGray-1
inputRange=maxGray-minGray
alpha=(histSize-1)/inputRange
beta=-minGray*alpha
out_img=input_img.copy()
cv2.convertScaleAbs(input_img,out_img,alpha,beta)
return out_img
It is a very few lines of code to do it in Python Wand (which is based upon ImageMagick). Here is a script.
#!/bin/python3.7
from wand.image import Image
with Image(filename='task4.jpg') as img:
img.contrast_stretch(black_point=0.02, white_point=0.99)
img.save(filename='task4_stretch2_99.jpg')
Input:
Result:
Increase the black point value to make the text darker and/or decrease the white point value to make the lighter parts brighter.
Thanks to Eric McConville (the Wand developer) for correcting my arguments to make the code work.
I would like to know how to write a function in Python 3 using OpenCV which takes in an image and a threshold and returns either 'dark' or 'light' after heavily blurring it and reducing quality (faster the better). This might sound vague , but anything that just works will do.
You could try this :
import imageio
import numpy as np
f = imageio.imread(filename, as_gray=True)
def img_estim(img, thrshld):
is_light = np.mean(img) > thrshld
return 'light' if is_light else 'dark'
print(img_estim(f, 127))
Personally, I would not bother writing any Python, or loading up OpenCV for such a simple operation. If you absolutely have to use Python, please just disregard this answer and select a different one.
You can just use ImageMagick at the command-line in your Terminal to get the mean brightness of an image as a percentage, where 100 means "fully white" and 0 means "fully black", like this:
convert someImage.jpg -format "%[fx:int(mean*100)]" info:
Alternatively, you can use libvips which is less common, but very fast and very lightweight:
vips avg someImage.png
The vips answer is on a scale of 0..255 for 8-bit images.
Note that both these methods will work for many image types, from PNG, through GIF, JPEG and TIFF.
However, if you really want Python/OpenCV code, I note none of the existing answers do that - some use a different library, some are incomplete, some do superfluous blurring and some read video cameras for some unknown reason, and none handle more than one image or errors. So, here's what you actually asked for:
#!/usr/bin/env python3
import cv2
import sys
import numpy as np
# Iterate over all arguments supplied
for filename in sys.argv[1:]:
# Load image as greyscale
im = cv2.imread(filename, cv2.IMREAD_GRAYSCALE)
if im is None:
print(f'ERROR: Unable to load {filename}')
continue
# Calculate mean brightness as percentage
meanpercent = np.mean(im) * 100 / 255
classification = "dark" if meanpercent < 50 else "light"
print(f'{filename}: {classification} ({meanpercent:.1f}%)')
Sample Output
OpenCVBrightOrDark.py g*png nonexistant
g30.png: dark (30.2%)
g80.png: light (80.0%)
ERROR: Unable to load nonexistant
You could try this, considering image is a grayscale image -
blur = cv2.blur(image, (5, 5)) # With kernel size depending upon image size
if cv2.mean(blur) > 127: # The range for a pixel's value in grayscale is (0-255), 127 lies midway
return 'light' # (127 - 255) denotes light image
else:
return 'dark' # (0 - 127) denotes dark image
Refer to these -
Smoothing, Mean, Thresholding
import numpy as np
import cv2
def algo_findDark(image):
blur = cv2.blur(image, (5, 5))
mean = np.mean(blur)
if mean > 85:
return 'light'
else:
return 'dark'
cam = cv2.VideoCapture(0)
while True:
check, frame = cam.read()
frame_gray = cv2.cvtColor(frame,cv2.COLOR_BGR2GRAY)
ans = algo_findDark(frame_gray)
font = cv2.FONT_HERSHEY_SIMPLEX
cv2.putText(frame, ans, (10, 450), font, 3, (0, 0, 255), 2, cv2.LINE_AA)
cv2.imshow('video', frame)
key = cv2.waitKey(1)
if key == 27:
break
cam.release()
cv2.destroyAllWindows()
I am trying to detect bubbles on an OMR sheet which looks something like this:
My code for edge detection and contour display is referenced from here. However, before finding the actual contours, I am trying to detect the edges but somehow not able to set the correct values of parameters.
This is what I get:
Code:
from imutils.perspective import four_point_transform
from imutils import contours
import numpy as np
import argparse
import imutils
import cv2
def auto_canny(image, sigma=0.50):
# compute the median of the single channel pixel intensities
v = np.median(image)
# apply automatic Canny edge detection using the computed median
lower = int(max(0, (1.0 - sigma) * v))
upper = int(min(255, (1.0 + sigma) * v))
edged = cv2.Canny(image, lower, upper)
# return the edged image
return edged
# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True,
help="path to the input image")
args = vars(ap.parse_args())
image = cv2.imread(args["image"])
r = 500.0 / image.shape[1]
dim = (500, int(image.shape[0] * r))
# perform the actual resizing of the image and show it
image = cv2.resize(image, dim, interpolation = cv2.INTER_AREA)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
equalized_img = cv2.equalizeHist(gray)
cv2.imshow('Equalized', equalized_img)
# cv2.waitKey(0)
blurred = cv2.GaussianBlur(equalized_img, (7, 7), 0)
# edged =cv2.Canny(equalized_img, 30, 160)
edged = auto_canny(blurred)
cv2.imshow('edged', edged)
cv2.waitKey(0)
How can I get all the 90*4 circles?
You should be using Hough to search for circles. This method project every single white pixel as a circle, and tries to get as many overlapping pixels possible. You'll have to specify the predicted radiuses of circles to be found within image.
Left - original image
Top-right - each white pixel is projected as red circle - they are too small to find intersecting point
Bottom-right - green circle is larger, and all the intersecting points meet exactly at the middle of the circle! Both radius and position is returned by cvHoughCircles
This person dealt with blob detection (that's what finding circles is called I think) using cvHoughCircles with cvCanny-ized image (read OPs update).
OpenCV: Error in cvHoughCircles usage
You need to improve your contour detection.
Eventually by not changing it, but by better pre-processing the earlier stage.
Contour detection works better with more contrast and color separation in image. If you don´t have yet need to threshold you image with techniques like Simple Threshold, Adaptive or more smart techniques like Otsu's. Check Open CV document here.
Besides that, for your case eventually need more advanced techniques like "Adaptive Thresholding Using the Integral Image", described here.
I wrote a little script to transform pictures of chalkboards into a form that I can print off and mark up.
I take an image like this:
Auto-crop it, and binarize it. Here's the output of the script:
I would like to remove the largest connected black regions from the image. Is there a simple way to do this?
I was thinking of eroding the image to eliminate the text and then subtracting the eroded image from the original binarized image, but I can't help thinking that there's a more appropriate method.
Sure you can just get connected components (of certain size) with findContours or floodFill, and erase them leaving some smear. However, if you like to do it right you would think about why do you have the black area in the first place.
You did not use adaptive thresholding (locally adaptive) and this made your output sensitive to shading. Try not to get the black region in the first place by running something like this:
Mat img = imread("desk.jpg", 0);
Mat img2, dst;
pyrDown(img, img2);
adaptiveThreshold(255-img2, dst, 255, ADAPTIVE_THRESH_MEAN_C,
THRESH_BINARY, 9, 10); imwrite("adaptiveT.png", dst);
imshow("dst", dst);
waitKey(-1);
In the future, you may read something about adaptive thresholds and how to sample colors locally. I personally found it useful to sample binary colors orthogonally to the image gradient (that is on the both sides of it). This way the samples of white and black are of equal size which is a big deal since typically there are more background color which biases estimation. Using SWT and MSER may give you even more ideas about text segmentation.
I tried this:
import numpy as np
import cv2
im = cv2.imread('image.png')
gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
grayout = 255*np.ones((im.shape[0],im.shape[1],1), np.uint8)
blur = cv2.GaussianBlur(gray,(5,5),1)
thresh = cv2.adaptiveThreshold(blur,255,1,1,11,2)
contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)
wcnt = 0
for item in contours:
area =cv2.contourArea(item)
print wcnt,area
[x,y,w,h] = cv2.boundingRect(item)
if area>10 and area<200:
roi = gray[y:y+h,x:x+w]
cntd = 0
for i in range(x,x+w):
for j in range(y,y+h):
if gray[j,i]==0:
cntd = cntd + 1
density = cntd/(float(h*w))
if density<0.5:
for i in range(x,x+w):
for j in range(y,y+h):
grayout[j,i] = gray[j,i];
wcnt = wcnt + 1
cv2.imwrite('result.png',grayout)
You have to balance two things, removing the black spots but balance that with not losing the contents of what is on the board. The output I got is this:
Here is a Python numpy implementation (using my own mahotas package) of the method for the top answer (almost the same, I think):
import mahotas as mh
import numpy as np
Imported mahotas & numpy with standard abbreviations
im = mh.imread('7Esco.jpg', as_grey=1)
Load the image & convert to gray
im2 = im[::2,::2]
im2 = mh.gaussian_filter(im2, 1.4)
Downsample and blur (for speed and noise removal).
im2 = 255 - im2
Invert the image
mean_filtered = mh.convolve(im2.astype(float), np.ones((9,9))/81.)
Mean filtering is implemented "by hand" with a convolution.
imc = im2 > mean_filtered - 4
You might need to adjust the number 4 here, but it worked well for this image.
mh.imsave('binarized.png', (imc*255).astype(np.uint8))
Convert to 8 bits and save in PNG format.