I have three types of images and want to segment text from them. So I get a clean binarized img like the first image below. The three types of images are below
I've tried various techniques but it always have some cases to fail. I tried first to threshold the img using otsu algorithm but it gave bad results in images below
I tried Guassian, bilateral and normal blur kernel but didn't enhance the results too much
Any one can provide help!
Code as the best I got results from
import cv2
gray = cv2.imread("/home/shrouk/Pictures/f2.png", 0)
thresholded = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
cv2.imshow("img", thresholded)
This is the final result I need
This the first type of images that fail. It fails because the text grey level it lighter in the right of the image
The result of otsu on it is here, I just need a way to enhance the words in the third line from right:
Second type that fails because the darker background
otsu result is not very good as the words to the left look like dilated words
This is the type that correctly thresholded by otsu as there is no noise
Try using cv2.adaptiveThreshold()
import cv2
image = cv2.imread("2.png", 0)
adaptive = cv2.adaptiveThreshold(image,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY,11,5)
cv2.imshow("adaptive", adaptive)
cv2.waitKey()
Related
I have very high resolution Engineering drawings/Circuit diagrams which contain text in a number of different regions. The aim is to extract text from such Images.
I am using pytesseract for this task. Applying pytesseract directly is not possible as in that case text from different regions gets jumbled up in the output. So I am identifying different bounding boxes containing the text and then iteratively passing these regions to pytesseract. The bounding box logic is working fine however sometimes I get no text from the cropped Image or partial text only. I would understand if the cropped images were low res or blurred but this is not the case. Please have a look at the attached couple of examples.
Image 1
Image 2
Here is my code to get the text:
source_img_simple = cv2.imread('image_name.tif')
source_img_simple_gray = cv2.cvtColor(source_img_simple, cv2.COLOR_BGR2GRAY)
img_text = pytesseract.image_to_string(source_img_simple_gray)
# Export the text file
with open('Output_OCR.txt', 'w') as text:
text.write(img_text)
Actual result for first image- No output (Blank text file)
For 2nd Image- Partial text (ALL MISCELLANEOUS PIPING AND CONNECTION SIZES)
I'm trying to know how to improve the quality of the OCR. I'm open to using any other tools as well (apart from pytesseract) if required. But can't use API's (Google, AWS etc.) as that is a restriction. Note: I have gone through the below post and it is not duplicate of my case as I have black text on white background:
Pytesseract dont reconize a very clear image
Since your images already look clean, no preprocessing is needed. A simple approach is to threshold and Gaussian blur to smooth the image before throwing it into Pytesseract. Here's the results after simple processing and the output from Pytesseract
SYSTEM CODE IS 3CAB, EXCEPT AS INDICATED.
For the 2nd image
ALL MISCELLANEOUS PIPING AND CONNECTION SIZES
SHALL BE 1 INCH. EXCEPT AS INDICATED.
We use the --psm 6 config flag since we want to treat the image as a single uniform block of text. Here's some additional configuration flags that could be useful
Code
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
image = cv2.imread('2.jpg',0)
thresh = cv2.threshold(image, 150, 255, cv2.THRESH_BINARY_INV)[1]
result = cv2.GaussianBlur(thresh, (5,5), 0)
result = 255 - result
data = pytesseract.image_to_string(result, lang='eng',config='--psm 6')
print(data)
cv2.imshow('thresh', thresh)
cv2.imshow('result', result)
cv2.waitKey()
I want to read the text from an image.
I use pytesseract in Python.
Here is my code:
import pytesseract
from PIL import Image
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
image = Image.open(r'a.jpg')
image.resize((150, 50),Image.ANTIALIAS).save("pic.jpg")
image = Image.open("pic.jpg")
captcha = pytesseract.image_to_string(image).replace(" ", "").replace("-", "").replace("$", "")
image
However, it returns empty string.
What should be the correct way?
Thanks.
i agree with #Jon Betts
tesseract is not very strong in OCR, only good in binary cases with right settings
CAPTCHAs ment to fool OCRs!
but if you really need to do it, you need to come up with the manual procedure for it,
i created the code below specifically for the type of CAPTCHAs that you gave (but its completely rigid and is not generalized/optimized for all cases)
psudo code
apply median blur
apply a threshold to get Blue colors only (binary image output from this stage)
apply opening to reduce small white pixels in binary image
give the image to tesseract with options:
limited whitelist of output chars
OEM 3 : tesseract + cube
PSM 8 : one word per image
Code
from PIL import Image
import pytesseract
import numpy as np
import cv2
img = cv2.imread('a.jpg')
img = cv2.medianBlur(img, 3)
# extract blue parts
img2 = np.zeros((img.shape[0], img.shape[1]), dtype=np.uint8)
cond = np.bitwise_and(img[:, :, 0] >= 100, img[:, :, 2] < 100)
img2[np.where(cond)] = 255
img = img2
# delete the noise
kernel = cv2.getStructuringElement(cv2.MORPH_CROSS, (3, 3))
img = cv2.morphologyEx(img, cv2.MORPH_OPEN, kernel)
str1 = pytesseract.image_to_string(Image.fromarray(img),
config='-c tessedit_char_whitelist=abcedfghijklmnopqrtuvwxyz0123456789 -oem 3 -psm 8')
cv2.imwrite("frame.png", img)
print(str1)
output
f2e4
image
in order to see full options of tesseract, type the following command tesseract --help-extra or refere to this_link
Tesseract is intended for performing OCR on text documents. In my experience it's good but a bit patchy even with very clean data.
In this case it appears you are trying to solve a CAPTCHA which is specifically designed to defeat OCR software. It's very likely you cannot use Tesseract to solve this issue, because:
It's not really designed for that
The scenario is adversarial:
The example is specifically designed to prevent what you are trying to do
If you could get it to work, the other party would likely change it to break again
If you want to proceed I would suggest:
Working on cleaning up the image before attempting to process it (can you get a nice readable black and white image?)
Train your own recognition network using a lot of examples
I am using pytesseract, pillow,cv2 to OCR an image and get the text present in the image. Since my input is a scanned PDF document, I first converted it into an image (JPEG) format and then tried extracting the text. I am only half way there. The input is a table and the titles are not being displayed, since the titles have a black background. I also tried getstructuringelement but unable to figure out a way Here is what I did-
import cv2
import os
import numpy as np
import pytesseract
#import pillow
#Since scanned PDF can't be handled by pdf2image, convert the scanned PDF into a JPEG format using the below code-
filename = path
from pdf2image import convert_from_path
pages = convert_from_path(filename, 500) for page in pages:
page.save("dest", 'JPEG')
imgname = "path"
oriimg = cv2.imread(imgname,cv2.IMREAD_COLOR)
cv2.imshow("original image", oriimg)
cv2.waitKey(0)
#img = cv2.resize(oriimg,None,fx=0.5,fy=0.5,interpolation=cv2.INTER_CUBIC)
img = cv2.resize(oriimg,(700,1500),interpolation=cv2.INTER_AREA)
#here length height
cv2.imshow("lol", img)
cv2.waitKey(0)
cv2.imwrite("changed_dimensionsimgpath", img)
import PIL.Image
image = cv2.imread(imgname,cv2.IMREAD_COLOR)
grayedimg = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) grayedimg =
cv2.threshold(grayedimg, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
cv2.imwrite("H://newim.jpg", grayedimg)
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files (x86)\Tesseract-
OCR\tesseract.exe"
text = pytesseract.image_to_string(PIL.Image.open("path"))
print(text)
My input table looks like below. The regions which have black background are not being identified by OCR and not being extracted as text. --
I have 3 possible ways from an image-analysis perspective
Splitting
You can split the images in two part. First part is just your normal flow (load image, detect text on it). The second flow you first take the negative of the image (255 - img) and than detect text.
The two results will need to be merged afterwards.
difference filter
You can first apply a difference filter/edge detection this will high everything with a high contrast BUT can alter the shape of the letters if done to extreme or if some letters are way bigger.
contour finding + filling
Again an edge detection but now very thin and followed with an contour detection. This will redraw all letter in one color.
I am building a simple project in Python3, using OpenCV3, trying to match jigsaw pieces to the "finished" jigsaw image. I have started my tests by using SIFT.
I can extract the contour of the jigsaw piece and crop the image but since most of the high frequencies reside, of course, around the piece (where the piece ends and the floor starts), I want to pass a mask to the SIFT detectAndCompute() method, thus forcing the algorithm to look for the keypoints only within the piece.
test_mask = np.ones(img1.shape, np.uint8)
kp1, des1 = sift.detectAndCompute(img1, mask = test_mask)
After passing a test mask (to make sure it's uint8), I get the following error:
kp1, des1 = sift.detectAndCompute(img1,mask = test_mask)
cv2.error: /home/pyimagesearch/opencv_contrib/modules/xfeatures2d/src/sift.cpp:772: error: (-5) mask has incorrect type (!=CV_8UC1) in function detectAndCompute
From my research, uint8 is just an alias for CV_8U, which is the same as CV_8UC1. Couldn't find any code sample passing a mask to any feature detection algorithm in Python.
Thanks to Miki I've managed to find a bug.
It turned out that my original mask that I created using threshold operations, even though looked binary, was a 3-channel image ([rows], [cols], 3). Thus it couldn't be accepted as a mask.
After checking for type and shape (has to be uint8 and [rows,cols,1]):
print(mask.dtype)
print(mask.shape)
Convert the mask to gray if it's still 3-channel:
mask = cv2.cvtColor(mask, cv2.COLOR_BGR2GRAY)
I am trying to do some white blob detection using OpenCV. But my script failed to detect the big white block which is my goal while some small blobs are detected. I am new to OpenCV, and am i doing something wrong when using simpleblobdetection in OpenCV? [Solved partially, please read below]
And here is the script:
#!/usr/bin/python
# Standard imports
import cv2
import numpy as np;
from matplotlib import pyplot as plt
# Read image
im = cv2.imread('whiteborder.jpg', cv2.IMREAD_GRAYSCALE)
imfiltered = cv2.inRange(im,255,255)
#OPENING
kernel = np.ones((5,5))
opening = cv2.morphologyEx(imfiltered,cv2.MORPH_OPEN,kernel)
#write out the filtered image
cv2.imwrite('colorfiltered.jpg',opening)
# Setup SimpleBlobDetector parameters.
params = cv2.SimpleBlobDetector_Params()
params.blobColor= 255
params.filterByColor = True
# Create a detector with the parameters
ver = (cv2.__version__).split('.')
if int(ver[0]) < 3 :
detector = cv2.SimpleBlobDetector(params)
else :
detector = cv2.SimpleBlobDetector_create(params)
# Detect blobs.
keypoints = detector.detect(opening)
# Draw detected blobs as green circles.
# cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS ensures
# the size of the circle corresponds to the size of blob
print str(keypoints)
im_with_keypoints = cv2.drawKeypoints(opening, keypoints, np.array([]), (0,255,0), cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)
# Show blobs
##cv2.imshow("Keypoints", im_with_keypoints)
cv2.imwrite('Keypoints.jpg',im_with_keypoints)
cv2.waitKey(0)
EDIT:
By adding a bigger value of area maximum value, i am able to identify a big blob but my end goal is to identify the big white rectangle exist or not. And the white blob detection i did returns not only the rectangle but also the surrounding areas as well. [This part solved]
EDIT 2:
Based on the answer from #PSchn, i update my code to apply the logic, first set the color filter to only get the white pixels and then remove the noise point using opening. It works for the sample data and i can successfully get the keypoint after blob detection.
If you just want to detect the white rectangle you can try to set a higher threshold, e.g. 253, erase small object with an opening and take the biggest blob. I first smoothed your image, then thresholding it:
and the opening:
now you just have to use findContours and take the boundingRect. If your rectangle is always that white it should work. If you get lower then 251 with your threshold the other small blobs will appear and your region merges with them, like here:
Then you could still do an opening several times and you get this:
But i dont think that it is the fastest idea ;)
You could try setting params.maxArea to something obnoxiously large (somewhere in the tens of thousands): the default may be something lower than the area of the rectangle you're trying to detect. Also, I don't know how true this is or not, but I've heard that detection by color is bugged with a logic error, so it may be worth a try disabling it just in case that is causing problems (this has probably been fixed in later versions, but it could still be worth a try)