Detecting text in an image using python

Detecting text in an image using python - python

I have around 100+ images with 2 different texts on it. The images are below. one is occupied and the other is unoccupied.
So is there any way in python to differentiate these images using some code to detect the text in it?
If so I wanted to identify the occupied images and delete unoccupied images.
Since I am new to python can anyone help me in doing this?

This answer is based on the assumption that that there are only two different texts on the images as you posted in the question. So I assume that the number of characters and the color of the text is always the same ("Room status: Unoccupied" and "Room status" Occupied" in red color). That being said, I would try a more simple way to differentiate between these two different types. These images contain caracters that are very near to each other so in my opinion is that it would be very difficult to seperate each character and identify it with an OCR. I would try a more simple approach like finding the area containg the text and find the pure lenght of the text - "unoccupied" has two more characters in the text as "occupied" and hence has a bigger distance in lenght. So you can transform the image to HSV color space and use the cv2.inRange() function to extract the text (red color). Then you can merge the characters to one contour with cv2.morphologyEx() and get its lenght with cv2.minAreaRect(). Hope it helps or at least gives you a new perspective on how to find your solution. Cheers!
Example code:
import cv2
import numpy as np
# Read the image and transform to HSV colorspace.
img = cv2.imread('ocupied.jpg')
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
# Extract the red text.
lower_red = np.array([0,150,50])
upper_red = np.array([40,255,255])
mask_red = cv2.inRange(hsv, lower_red, upper_red)
# Search for contours on the mask.
_, contours, hierarchy = cv2.findContours(mask_red,cv2.RETR_TREE,cv2.CHAIN_APPROX_NONE)
# Create a new mask for further processing.
mask = np.ones(img.shape, np.uint8)*255
# Draw contours on the mask with size and ratio of borders for threshold (to remove other noises from the image).
for cnt in contours:
size = cv2.contourArea(cnt)
x,y,w,h = cv2.boundingRect(cnt)
if 10000 > size > 50 and w*2.5 > h:
cv2.drawContours(mask, [cnt], -1, (0,0,0), -1)
# Connect neighbour contours and select the biggest one (text).
kernel = np.ones((50,50),np.uint8)
opening = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel)
gray_op = cv2.cvtColor(opening, cv2.COLOR_BGR2GRAY)
_, threshold_op = cv2.threshold(gray_op, 150, 255, cv2.THRESH_BINARY_INV)
_, contours_op, hierarchy_op = cv2.findContours(threshold_op, cv2.RETR_TREE,cv2.CHAIN_APPROX_NONE)
cnt = max(contours_op, key=cv2.contourArea)
# Create rotated rectangle to get the 4 points of the rectangle.
rect = cv2.minAreaRect(cnt)
# Create bounding and calculate the "lenght" of the text.
box = cv2.boxPoints(rect)
a, b, c, d = box = np.int0(box)
bound =[]
bound.append(a)
bound.append(b)
bound.append(c)
bound.append(d)
bound = np.array(bound)
(x1, y1) = (bound[:,0].min(), bound[:,1].min())
(x2, y2) = (bound[:,0].max(), bound[:,1].max())
# Draw the rectangle.
cv2.rectangle(img,(x1,y1),(x2,y2),(0,255,0),1)
# Identify the room status.
if x2 - x1 > 200:
print('unoccupied')
else:
print('occupied')
# Display the result
cv2.imshow('img', img)
Result:
occupied
unoccupied

Using the tesseract OCR Engine and the python wrapper pytesseract, it is only a few lines' task:
import pytesseract
from PIL import Image
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files (x86)\Tesseract-OCR\tesseract.exe"
img = Image.open('D:\\tmp2.jpg').crop((0,0,250,35))
print(pytesseract.image_to_string(img, config='--psm 7'))
I have tested this on Windows 7. Of course, I have assumed that the text appears at the same position in every image (from your example, it does seem to be the case). Else, you need to find a better cropping mechanism.

Related

How to detect the location of a grid of letters in an image with openCV and Python?

I'm writing a program that takes an image that has a grid of 4 by 4 letters somewhere in that image.
E.g.
1
I want to read these letters into my program and for that I'm using pytesseract for the OCR.
Before feeding the image to pytesseract I do some preprocessing with openCV to increase the odds of pytesseract working correctly.
This is the code I use for this:
import cv2
img = cv2.imread('my_image.png')
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img_pre_processed = cv2.threshold(img_gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
And these are some sample outputs of img_pre_processed:
2 3 4
Since the letters in the grid are spaced apart pytesseract has a difficult time to read them when I give the entire image as input. So it would be helpful if I knew the coordinates of every letter, then I could edit the image in such a way that pytesseract can always recognise them.
I started to try and solve this problem on my own and the solution I'm coming up with might work but it's getting rather complicated. So I'm wondering if there is a better way to do it.
At the moment I'm using the cv2.findContours() function to get all the contours of the objects in the image. For every contour I calculate the center coordinates and the area of the box you would be able to draw around it. I then sort these by area to get the largest contours. Now here it starts to get more and more complicated. I can't just say take the biggest 16 contours, because there might be unwanted objects in the picture that have a bigger area than the 16 letters that I want. Also some letters like O, P, Q,... have 2 contours and their inner contour might even be bigger than another letters outer contour like the letter I for example.
E.g. This is an image with the 18 biggest contours marked with a green box. 5
So to continue with my way of attacking the problem I would have to write an algorithm that finds the contours that are most likely part of the grid while ignoring the contours that are unwanted and also the inner contours of letters that have 2 contours.
While this is possible, I'm wondering if there is be a better way of doing this.
Somebody told me that if you can filter the image in such a way that everything gets more blurry so that all the letters become blobs. That it might be possible to do a pattern detection with 4x4 grid of blobs. But I don't know how to do that or if that's possible.
So if somebody knows a better way to tackle this problem or if you know how to execute the plan of attack I mentioned earlier that would be most helpfull.
Thanks in advance!

You can simply filter the bounding rectangles by width and height. As this is a rule based approach, it may need more example images to fine tune the filter rules.
import cv2
# get bounding rectangles of contours
img = cv2.imread('img.png')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
ret, thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
contours, _ = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
bbox = [cv2.boundingRect(c) for c in contours]
# filter rectangles by width and height
for x, y, w, h in bbox:
if (4 < w < 200) and (30 < h < 200):
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
cv2.imshow("img", img)
cv2.waitKey(0)
cv2.destroyAllWindows()
Result:

Detecting and counting blobs/connected objects with opencv

I want to detect and count the objects inside an image that touch while ignoring what could be considered as a single object. I have the basic image, on which i tried applying a cv2.HoughCircles() method to try and identify some circles. I then parsed the returned array and tried using cv2.circle() to draw them on the image.
However, I seem to always get too many circles returned by cv2.HoughCircles() and couldn't figure out how to only count the objects that are touching.
This is the image i was working on
My code so far:
import numpy
import matplotlib.pyplot as pyp
import cv2
segmentet = cv2.imread('photo')
houghCircles = cv2.HoughCircles(segmented, cv2.HOUGH_GRADIENT, 1, 80, param1=450, param2=10, minRadius=30, maxRadius=200)
houghArray = numpy.uint16(houghCircles)[0,:]
for circle in houghArray:
cv2.circle(segmented, (circle[0], circle[1]), circle[2], (0, 250, 0), 3)
And this is the image i get, which is quite a far shot from want i really want.
How can i properly identify and count said objects?

Here is one way in Python OpenCV by getting contour areas and the convex hull area of the contours. The take the ratio (area/convex_hull_area). If small enough, then it is a cluster of blobs. Otherwise it is an isolated blob.
Input:
import cv2
import numpy as np
# read input image
img = cv2.imread('blobs_connected.jpg')
# convert to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# threshold to binary
thresh = cv2.threshold(gray, 128, 255, cv2.THRESH_BINARY)[1]
# find contours
#label_img = img.copy()
contour_img = img.copy()
contours = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours = contours[0] if len(contours) == 2 else contours[1]
index = 1
isolated_count = 0
cluster_count = 0
for cntr in contours:
area = cv2.contourArea(cntr)
convex_hull = cv2.convexHull(cntr)
convex_hull_area = cv2.contourArea(convex_hull)
ratio = area / convex_hull_area
#print(index, area, convex_hull_area, ratio)
#x,y,w,h = cv2.boundingRect(cntr)
#cv2.putText(label_img, str(index), (x,y), cv2.FONT_HERSHEY_COMPLEX_SMALL, 1, (0,0,255), 2)
if ratio < 0.91:
# cluster contours in red
cv2.drawContours(contour_img, [cntr], 0, (0,0,255), 2)
cluster_count = cluster_count + 1
else:
# isolated contours in green
cv2.drawContours(contour_img, [cntr], 0, (0,255,0), 2)
isolated_count = isolated_count + 1
index = index + 1
print('number_clusters:',cluster_count)
print('number_isolated:',isolated_count)
# save result
cv2.imwrite("blobs_connected_result.jpg", contour_img)
# show images
cv2.imshow("thresh", thresh)
#cv2.imshow("label_img", label_img)
cv2.imshow("contour_img", contour_img)
cv2.waitKey(0)
Clusters in Red, Isolated blobs in Green:
Textual Information:
number_clusters: 4
number_isolated: 81

approach it in steps.
label connected components. two connected blobs get the same label because they're connected. so far so good.
now separate your blobs. use watershed (first comment) or whatever other method gives you results. I can't fully predict the watershed approach. it might deal with touching blobs of dissimilar size or it might do something silly. the sample/tutorial also assumes a minimum size (0.7 * max peak); plug in something absolute in pixels maybe.
then, for each separated blob, check which label it sits on (take coordinates of centroid to be safe), and note down a +1 for that label (a histogram).
any label that has more than one separated blob sitting on it, would be what you are looking for.

How to process a binary image to align sparse letters in a row?

I am trying to use tesseract ocr to convert an image to text. The image always have three letters without rotation/skew, but randomly distributed in an 90x50 png file.
By just cleaning and converting to black/white, tesseract could not get the text in the image. After aligning them by hand in Paint, the ocr gives the exact match. I doesn't even need to be exactly aligned.
What I want is some tips on how to automate this alignment of the characters in the image prior to sending it to tesseract.
I am using python with tesseract and opencv.
Original image:
What I have done - turn black and white:
What I want to do - aligned by code:

You can use the following code to achieve this output. Some of the constants may need to be changed to fit your needs:
import cv2
import numpy as np
# Read the image (resize so it is easier to see)
img = cv2.imread("/home/stephen/Desktop/letters.png",0)
h,w = img.shape
img = cv2.resize(img, (w*5,h*5))
# Threshold the image and find the contours
_, thresh = cv2.threshold(img, 123, 255, cv2.THRESH_BINARY_INV);
contours, hierarchy = cv2.findContours(thresh,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
# Create a white background iamge to paste the letters on
bg = np.zeros((200,200), np.uint8)
bg[:] = 255
left = 5
# Iterate through the contours
for contour,h in zip(contours, hierarchy[0]):
# Ignore inside parts (circle in a 'p' or 'b')
if h[3] == -1:
# Get the bounding rectangle
x,y,w,h = cv2.boundingRect(contour)
# Paste it onto the background
bg[5:5+h,left:left+w] = img[y:y+h,x:x+w]
left += (w + 5)
cv2.imshow('thresh', bg)
cv2.waitKey()

Recognizing rectangles in template images

So I'm trying to recognize an region that's already been defined by a bounding box. Example:
Some of the areas within these rectangles in these images are white and some are black, and most of them are completely different sizes. The only common characteristic between these images is the red rectangle:
Essentially what I'm trying to do is create a randomly generated meme bot that places a random source image in the region defined by these rectangles. I have tons of these images already with predefined areas with these red rectangles for use. I want to automate the process somehow, currently every resize shape and offset has to be defined for each template. So what I need to do is recognize the area within the rectangle and have it return the defined resize shape and offset needed to place the source image.
How should I go about this? Should I use something in OpenCV or am I going to have to train a CNN? Just really looking for a push in the right direction because I'm pretty lost as to the best approach to this problem.

I think OpenCV can do it. Below is a short example of the steps for what you need. Read the comments in the code for more details.
import cv2
import numpy as np
img = cv2.imread("1.jpg")
#STEP1: get only red color (or the bounding box color) in the image
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
# define range of red color in HSV
lower_red = np.array([0,50,50])
upper_red = np.array([0,255,255])
# Threshold the HSV image to get only blue colors
mask = cv2.inRange(hsv, lower_red, upper_red)
red_only = cv2.bitwise_and(img,img, mask= mask)
#STEP2: find contour
gray_img = cv2.cvtColor(red_only,cv2.COLOR_BGR2GRAY)
_,thresh = cv2.threshold(gray_img,1,255,cv2.THRESH_BINARY)
_,contours,_ = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
#max contour in the image is the box you want
areas = [cv2.contourArea(c) for c in contours]
sorted_areas = np.sort(areas)
cnt=contours[areas.index(sorted_areas[-1])]
r = cv2.boundingRect(cnt)
cv2.rectangle(img,(r[0],r[1]),(r[0]+r[2],r[1]+r[3]),(0,255,0),3)
cv2.imshow("img",img)
cv2.imshow("red_only",red_only)
cv2.imshow("thresh",thresh)
cv2.waitKey()
cv2.destroyAllWindows()

Techniques to merge overlapping bounding boxes not working

I am trying to detect text regions in an image with OpenCV (3.0) and Python. So far, I am able to detect individual text characters and draw a bounding box or rectangle around them. My ultimate goal is merging individual text characters into words/text lines and eventually into text blocks or paragraphs.
My approach is to finding neighbouring text regions, and then form a bounding box around these regions. So, in my code, I have expanded the bounding boxes or rectangles around each text characters a bit, so that they overlap each other, forming a chain of overlapping bounding boxes. (please refer to the image). Now, I would like to merge these overlapping rectangles to form a single bounding box based on overlap ratio between all bounding box pairs.
I am having very hard time figuring out how to merge overlapping rectangles into a single one. For the last 24 hours, I have tried different techniques without any luck. Among them is cv2.groupRectangles. I think we need an array of rectangles (i.e. rectList) returned by cv2.cascade.detectMultiScale(); we also need haar cascades to detect objects in an image (i.e. in our case rectangles), and there are a number of haar cascades in the "data folder" and I am confused which one to use in my case (detecting rectangles).
If cv2.groupRectangles is not applicable in my case, then what might be other options I would like to know. If you have encountered similar problems, please share.
Here is my working Python code
import numpy as np
import cv2
im = cv2.imread('headintext.png')
grayImage = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
# grayImage = cv2.imread('gates.jpg', cv2.IMREAD_GRAYSCALE)
# cv2.imwrite('gates.png', grayImage)
# cv2.imshow('image', grayImage)
# cv2.waitKey(0)
# cv2.destroyAllWindows()
_,thresh = cv2.threshold(grayImage, 150, 255, cv2.THRESH_BINARY_INV)
kernel = cv2.getStructuringElement(cv2.MORPH_CROSS,(3,3))
dilated = cv2.dilate(thresh, kernel, iterations = 1) # dilate
_,contours0,_ = cv2.findContours(dilated,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE) # get contours
contours = [cv2.approxPolyDP(cnt, 3, True) for cnt in contours0]
# contours, hierarchy = cv2.findContours(thresh, 1, 2)
# for each contour found, draw a rectangle around it on original image
for contour in contours:
# get rectangle bounding contour
[x,y,w,h] = cv2.boundingRect(contour)
# discard areas that are too large
if h>50 and w>50:
continue
# discard areas that are too small
if h<5 or w<5:
continue
# draw rectangle around contour on original image
# slightly expand the rectangles to form a chain of overlapping bounding boxes
pad_w, pad_h = int(0.05*w), int(0.15*h)
cv2.rectangle(im,(x-pad_w,y-pad_h),(x+w+pad_w,y+h+pad_h),(255,0,255),1,shift=0)
# write original image with added contours to disk
cv2.imwrite("rectangle.png", im)
cv2.imshow('image', im)
cv2.waitKey(0)
cv2.destroyAllWindows()
I have tried the following cv2.groupRectangles code, but it is not producing any effects at all. I have noticed cascade.detectMultiScale do not find any rectangles.
import numpy as np
import cv2
cascade = cv2.CascadeClassifier('data/haarcascades/haarcascade_frontalface_alt.xml')
img = cv2.imread('rectangle.png')
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
rawrects = cascade.detectMultiScale(gray, scaleFactor=1.2, minNeighbors=1, minSize=(5, 5), flags = cv2.CASCADE_SCALE_IMAGE)
# rects, weights = cv2.groupRectangles(np.array(rawrects).tolist(), 1, 0.2)
print "nrects %d" % len(rawrects)
for i, r in enumerate(rawrects):
weight = weights[i]
rectline = "%f %f %f %f %d\n" % (r[0], r[1], r[2], r[3], weight)
print rectline
p1 = (r[0], r[1])
p2 = (r[0]+r[2], r[1]+r[3])
cv2.rectangle(img, p1, p2, (0,0,255))
cv2.imshow('image', img)
cv2.waitKey(0)
cv2.destroyAllWindows()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.