How can extract text from video stream? - python

I'm traying to extract some text from video stream coming from my camera using opencv2 and pytesseract. I crop the image to get an other small image. I trayed different image processing to get it work. I inverted the image values, blur it, binarize it, but no one of these is working with tesseract. The data that I want to extract has these form 'float/float' here is example of the small image:
Seems like the characters are not separated and this is the maximum resolution that I can get from my camera. I tried then to filter by color, but no result because it is video and the background is always moving.
I will use any suggested Python module that can work.

not trivial as it seems. i generated 32x32 png image for every character and add a white noise to it. the backgound on the video is moving. and caracters like 8 and 6 are not very different.
here is my code for the moment:
cap = cv2.VideoCapture("rtsp:...")
time.sleep(2)
templates = {}
w=[]
h=[]
for i in range(0,11):
templates["template_"+str(i)]=cv2.imread(str(i)+'.bmp',0)
tmp_w,tmp_h=templates["template_"+str(i)].shape[::-1]
w.append(tmp_w)
h.append(tmp_h)
threshold = 0.70
while(True):
les_points=[[],[],[],[],[],[],[],[],[],[],[]]
ret, frame = cap.read()
if frame==None:
break
crop_image=frame[38:70,11:364]
gray=cv2.cvtColor(crop_image,cv2.COLOR_BGR2GRAY)
for i in range(0,11):
res= cv2.matchTemplate(gray,templates["template_"+str(i)],cv2.TM_CCOEFF_NORMED)
loc = np.where( res >= threshold)
for pt in zip(*loc[::-1]):
les_points[i].append(pt[0])
cv2.rectangle(crop_image, pt, (pt[0] + w[i], pt[1] + h[i]), (0,i*10,255), 2)
print les_points
cv2.imshow('normal',crop_image)
if cv2.waitKey(1)& 0xFF == ord('p'):
threshold=threshold+0.01
print threshold
if cv2.waitKey(1)& 0xFF == ord('m'):
threshold=threshold-0.01
print threshold
if cv2.waitKey(1) & 0xFF == ord('q'):
break
I'm doing other tests by split the image to the exact same size of the caracters in templates. but this is not giving good results

Related

View a raw10 bit usb camera using OpenCV python

I am trying to view the output of an Omnivision OV7251 camera in OpenCV 4.2.0 Python 3.5.6. The camera output is 10 bit raw greyscale data, which I believe is right aligned in 16-bit words.
When I use this OpenCV code:
import cv2
cam2 = cv2.VideoCapture(0)
cam2.set(3, 640) # horizontal pixels
cam2.set(4, 480) # vertical pixels
while True:
b, frame = cam2.read()
if b:
cv2.imshow("Video", frame)
k = cv2.waitKey(5)
if k & 0xFF == 27:
cam2.release()
cv2.destroyAllWindows()
break
This is the image I get:
Presumably what's happening is that OpenCV is using the wrong process to convert from 10-bit raw to RGB, believing it to be some kind of YUV or something.
Is there some way I can either:
Tell OpenCV the camera's correct data format so that it does the conversion properly?
Get hold of the raw camera data so that I can do the conversion manually?
One way to do this is to grab the raw camera data, then use numpy to correct it:
import cv2
import numpy as np
cam2 = cv2.VideoCapture(0)
cam2.set(3, 640) # horizontal pixels
cam2.set(4, 480) # vertical pixels
cam2.set(cv2.CAP_PROP_CONVERT_RGB, False); # Request raw camera data
while True:
b, frame = cam2.read()
if b:
frame_16 = frame.view(dtype=np.int16) # reinterpret data as 16-bit pixels
frame_sh = np.right_shift(frame_16, 2) # Shift away the bottom 2 bits
frame_8 = frame_sh.astype(np.uint8) # Keep the top 8 bits
img = frame_8.reshape(480, 640) # Arrange them into a rectangle
cv2.imshow("Video", img)
k = cv2.waitKey(5)
if k & 0xFF == 27:
cam2.release()
cv2.destroyAllWindows()
break

OpenCV- False Detection

I'm a 1st-grade cs student and I know only a little bit of python. For a project, I need to use OpenCV to detect several traffic signs. I searched a little bit on the web and I decided to use Haar-Cascade classifier. I followed this tutorial : haar-cascade
I trained the code for this sign left-sign
Everything was fine up to this point. However my code (trained with 3000 positive 1500 negative jpgs and finished 8 stages)detects both right and left signs. Code needs to recognize right and left signs separately because my aim is to command my robot to turn left or turn right.
Here is my code:
import numpy as np
import cv2
ok_cascade = cv2.CascadeClassifier('new_kocum.xml')
cap = cv2.VideoCapture(0)
while 1:
ret, img = cap.read()
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
oks = ok_cascade.detectMultiScale(gray,3,5)
for (x,y,w,h) in oks:
cv2.rectangle(img,(x,y),(x+w,y+h),(255,255,0),2)
font = cv2.FONT_HERSHEY_SIMPLEX
cv2.putText(img,'ok',(x-w,y-h), font, 0.5, (11,255,255), 2, cv2.LINE_AA)
cv2.imshow('img',img)
k = cv2.waitKey(30) & 0xff
if k == 27:
break
cap.release()
cv2.destroyAllWindows()
Here is the right sign : right-sign
So my question: Is it possible to fix that just by changing the code? If easier, which method should I use to detect these signs?
The correct way: you need add to the negative set a lot of the right signs.
The best way: don't use haar-cascade.
Simplest way: train the second classifier (for example naive Bayes) for comparing left and right signs after work you cascade. Features: correlation between images, Hu-moments etc.

detect patternts and digits in image with openCV and python

I am trying to create a program that can input an image (I am doing it by imageGrab from PIL) and detect some known symbols in it, and their locations. The good thing is that I am pretty sure I don't need neural networks, because I know the exact shape and size of each symbol. the problem is that I have no idea how much of these will be, and what is the color in the background of each symbol. some of the symbols are numbers, I have an image of each digit 0-9, but there may be up to 3-digit numbers. I think I will be able to find a way to know which digits are part of the same number by their location, but lets talk about it later. right now, I have turned the image into grayscale and imshow it using opencv2.
do you have any idea how can I do it with opencv? some other library?
and I need it to be fast enough, hopefuly 10 frames per second.
this is my current code (modified sentdex's "python plays GTA" code, the most bottom of the page):
import numpy as np
from PIL import ImageGrab
import cv2
def screen_record():
while(True):
global printscreen
image = ImageGrab.grab(bbox=(20,270,430,685))
printscreen = np.array(image)
grayscale_image = cv2.cvtColor(printscreen, cv2.COLOR_BGR2GRAY)
cv2.imshow('window', grayscale_image)
if cv2.waitKey(25) & 0xFF == ord('q'):
cv2.destroyAllWindows()
break
if cv2.waitKey(25) & 0xFF == ord('w'):
image.save("screen_shot.png")
print("Saved current window as image")
screen_record()
EDIT: I managed to get to something with opencv's template match, only with the digit 2 (for now). I found a nice tutorial here. my problem is when there is not exactly 1 match of the template, means no number 2s, or more then 1. when there aren't any it looks like its choosing something random on the image, and when there's more then one, I have only 1 of them detected. is it ossible to apply it in a different way to match my needs?
So, I have a solution to my problem.
For all of those who reach this page in the future to get help, here are the steps to regognize templates in images:
create 2 images, the one you want to detect, and another one for your template.
then, upload the whoever you want using opencv, and copy this function:
def locate_symbol(x, template):
w, h = filter_num2.shape[::-1]
res = cv2.matchTemplate(x, template, cv2.TM_SQDIFF_NORMED)
min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(res)
min_thresh = 0.45
match_locations = np.where(res<=min_thresh)
return w, h, match_locations
and use these lines to draw bounding boxes on the image:
w, h, locs = locate_symbol(grayscale_image, filter_num2)
for (x, y) in zip(locs[1], locs[0]):
cv2.rectangle(printable_image, (x, y), (x+w, y+h), [255, 0, 0], 2)
then you can draw everything with cv2.imshow()

OpenCV how to detect a specific color in a frame (inRange function)

I am able to use the code below to find anything blue within a frame:
How To Detect Red Color In OpenCV Python?
However I want to modify the code to look for a very specific color within a video that I have. I read in a frame from the video file, and convert the frame to HSV, and then print the HSV values at a particular element (that contains the color I want to detect):
print("hsv[x][y]: {}".format(hsv[x][y]))
"hsv" is what I get after I convert the frame from BGR to HSV using cvtColor().
The print command above gives me:
hsv[x][y]: [108 27 207]
I then define my lower and upper HSV values, and pass that to the inRange() function:
lowerHSV = np.array([107,26,206])
upperHSV = np.array([109,28,208])
maskHSV = cv2.inRange(hsv, lowerHSV, upperHSV)
I display maskHSV, but it doesn't seem to identify the item that contains that color. I try to expand the lowerHSV and upperHSV bounds, but that doesn't seem to work either.
I try something something similar using BGR but that doesn't appear to work.
The thing I'm trying to identify can best be described as a lemon-lime sports drink bottle...
Any suggestions would be appreciated.
=====================================================
The complete python code I am running is shown below, along with some relevant images...
import cv2
import numpy as np
import time
video_capture = cv2.VideoCapture("conveyorBeltShort.wmv")
xloc = 460
yloc = 60
dCounter = 0
while(1):
dCounter += 1
grabbed, frame = video_capture.read()
if grabbed == False:
break
time.sleep(2)
hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
lowerBGR = np.array([200,190,180])
upperBGR = np.array([210,200,190])
# HSV [108 27 209] observation
lowerHSV = np.array([105,24,206])
upperHSV = np.array([111,30,212])
maskBGR = cv2.inRange(frame, lowerBGR, upperBGR)
maskHSV = cv2.inRange(hsv, lowerHSV, upperHSV)
cv2.putText(hsv, "HSV:" + str(hsv[xloc][yloc]),(100,280),
cv2.FONT_HERSHEY_SIMPLEX, 3,(0,0,0),8,cv2.LINE_AA)
cv2.putText(frame, "BGR:" + str(frame[xloc][yloc]),(100,480),
cv2.FONT_HERSHEY_SIMPLEX, 3,(0,0,0),8,cv2.LINE_AA)
cv2.rectangle(frame, (0, 0), (xloc-1, yloc-1), (255, 0, 0), 2)
cv2.rectangle(hsv, (0, 0), (xloc-1, yloc-1), (255, 0, 0), 2)
cv2.imwrite("maskHSV-%d.jpg" % dCounter, maskHSV)
cv2.imwrite("maskBGR-%d.jpg" % dCounter, maskBGR)
cv2.imwrite("hsv-%d.jpg" % dCounter, hsv)
cv2.imwrite("frame-%d.jpg" % dCounter, frame)
cv2.imshow('frame',frame)
cv2.imshow('hsv',hsv)
cv2.imshow('maskHSV',maskHSV)
cv2.imshow('maskBGR',maskBGR)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cv2.destroyAllWindows()
video_capture.release()
=====================================================
first image is "frame-6.jpg"
second image is "hsv-6.jpg"
third image is "maskHSV-6.jpg"
fourth image is "maskBGR-6.jpg"
maskHSV-6.jpg and maskBGR-6.jpg do not appear to show the lemon-lime bottle on the conveyor belt. I believe I have the lower and upper HSV/BGR limits set correctly...
I know only C++ for OpenCV, but according to this post you should use img[y][x] or img[x,y] to access pixel value.
Your rgb value are not correct.It should be something like [96,160,165].

cv2.imshow shows 9 screens instead of 1

I'm building some code to adaptively detect skin from webcam video. I have it almost working, however, when outputting the video, it shows 9 screens of the "skin" mask instead of just one. Seems like I'm just missing something simple, but I can't figure it out.
image shown here
Code below:
# first let's train the data
data, labels = ReadData()
classifier = TrainTree(data, labels)
# get the webcam. The input is either a video file or the camera number
# since using laptop webcam (only 1 cam), input is 0. A 2nd cam would be input 1
camera = cv2.VideoCapture(0)
while True:
# reads in the current frame
# .read() returns True if frame read correctly, and False otherwise
ret, frame = camera.read() # frame.shape: (480,640,3)
if ret:
# reshape the frame to follow format of training data (rows*col, 3)
data = np.reshape(frame, (frame.shape[0] * frame.shape[1], 3))
bgr = np.reshape(data, (data.shape[0], 1, 3))
hsv = cv2.cvtColor(np.uint8(bgr), cv2.COLOR_BGR2HSV)
# once we have converted to HSV, we reshape back to original shape of (245057,3)
data = np.reshape(hsv, (hsv.shape[0], 3))
predictedLabels = classifier.predict(data)
# the AND operator applies the skinMask to the image
# predictedLabels consists of 1 (skin) and 2 (non-skin), needs to change to 0 (non-skin) and 255 (skin)
predictedMask = (-(predictedLabels - 1) + 1) * 255 # predictedMask.shape: (307200,)
# resize to match frame shape
imgLabels = np.resize(predictedMask, (frame.shape[0], frame.shape[1], 3)) # imgLabels.shape: (480,640,3)
# masks require 1 channel, not 3, so change from BGR to GRAYSCALE
imgLabels = cv2.cvtColor(np.uint8(imgLabels), cv2.COLOR_BGR2GRAY) # imgLabels.shape: (480,640)
# do bitwsie AND to pull out skin pixels. All skin pixels are anded with 255 and all others are 0
skin = cv2.bitwise_and(frame, frame, mask=imgLabels) # skin.shape: (480,640,3)
# show the skin in the image along with the mask, show images side-by-side
# **********THE BELOW LINE OUTPUTS 9 screens of the skin mask instead of just 1 ****************
cv2.imshow("images", np.hstack([frame, skin]))
# if the 'q' key is pressed, stop the loop
if cv2.waitKey(1) & 0xFF == ord("q"):
break
else:
break
# release the video capture
camera.release()
cv2.destroyAllWindows()
You're working with bitmaps. To get an idea what they hold, cv2.imshow them individually. Then you're going to see (literally) where the data goes wrong.
Now, the culprit is most probably np.resize():
np.resize(a, new_shape)
Return a new array with the specified shape.
If the new array is larger than the original array, then the new array
is filled with repeated copies of a. Note that this behavior is
different from a.resize(new_shape) which fills with zeros instead of
repeated copies of a.
To scale a bitmap (=resize while striving to preserve the same visual image), use cv2.resize() as per OpenCV: Geometric Transformations of Images.

Categories