i have created a model for classification of two types of shoes
now how to deploy it in OpenCv (videoObject detection)??
thanks in advance
You can do that with the help of OpenCV DNN module:
import cv2
# Load a model imported from Tensorflow
tensorflowNet = cv2.dnn.readNetFromTensorflow('card_graph/frozen_inference_graph.pb', 'exported_pbtxt/output.pbtxt')
# Input image
img = cv2.imread('image.jpg')
rows, cols, channels = img.shape
# Use the given image as input, which needs to be blob(s).
tensorflowNet.setInput(cv2.dnn.blobFromImage(img, size=(300, 300), swapRB=True, crop=False))
# Runs a forward pass to compute the net output
networkOutput = tensorflowNet.forward()
# Loop on the outputs
for detection in networkOutput[0,0]:
score = float(detection[2])
if score > 0.9:
left = detection[3] * cols
top = detection[4] * rows
right = detection[5] * cols
bottom = detection[6] * rows
#draw a red rectangle around detected objects
cv2.rectangle(img, (int(left), int(top)), (int(right), int(bottom)), (0, 0, 255), thickness=2)
# Show the image with a rectagle surrounding the detected objects
cv2.imshow('Image', img)
cv2.waitKey()
cv2.destroyAllWindows()
you need frozen inference graph and pbtxt file to run your model in OpenCV
You would save the model to H5 file model.save("modelname.h5") , then load it in OpenCV code load_model("modelname.h5"). Then in a loop detect the objects you find via model.predict(ImageROI)
Related
I am using Cascade Trainer GUI to get an XML file. I have 100 positive images and 400 negative images. The training process only took about 5 minutes, and the results are not accurate. The object I trained the model for is a small screwdriver. The resulting .xml file was only 31.5 KB. Please see image.
enter image description here
Also, the rectangle in the photo is quite small, let alone not accurate.
Besides adding more positive and negative images, what should I do to create a more accurate model? I eventually need to do image tracking as well. Thanks
#import numpy as np
import cv2
import time
"""
This program uses openCV to detect faces, smiles, and eyes. It uses haarcascades which are public domain. Haar cascades rely on
xml files which contain model training data. An xml file can be generated through training many positive and negative images.
Try your built-in camera with 'cap = cv2.VideoCapture(0)' or use any video. cap = cv2.VideoCapture("videoNameHere.mp4")
"""
face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')
eye_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_eye.xml')
smile = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_smile.xml')
screw = cv2.CascadeClassifier('cascade.xml')
cap = cv2.VideoCapture(0)
font = cv2.FONT_HERSHEY_SIMPLEX
prev_frame_time, new_frame_time = 0,0
while 1:
ret, img = cap.read()
img = cv2.resize(img,(1920,1080))
#faces = face_cascade.detectMultiScale(img, 1.5, 5)
#eyes = eye_cascade.detectMultiScale(img,1.5,6)
# smiles = smile.detectMultiScale(img,1.1,400)
screws = screw.detectMultiScale(img,1.2,3)
new_frame = time.time()
try:
fps = 1/(new_frame_time-prev_frame_time)
except:
fps = 0
fps = int(fps)
cv2.putText(img,"FPS: "+str(fps),(10,450), font, 3, (0,0,0), 5, cv2.LINE_AA)
# for (x,y,w,h) in smiles:
#cv2.rectangle(img,(x,y),(x+w,y+h),(0,69,255),2)
# cv2.putText(img,"smile",(int(x-.1*x),int(y-.1*y)),font,1,(255,255,255),2)
for (x,y,w,h) in screws:
cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,255),2)
cv2.putText(img,"screwdriver",(int(x-.1*x),int(y-.1*y)),font,1,(255,0,255),2)
# for (x,y,w,h) in faces:
# cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)
# cv2.putText(img,"FACE",(int(x-.1*x),int(y-.1*y)),font,1,(255,255,255),2)
# roi_color = img[y:y+h, x:x+w]
# eyes = eye_cascade.detectMultiScale(roi_color)
# for (ex,ey,ew,eh) in eyes:
# cv2.rectangle(roi_color,(ex,ey),(ex+ew,ey+eh),(0,255,0),2)
cv2.imshow('img',img)
k = cv2.waitKey(30) & 0xff
if k == 27:
break
prev_frame_time = new_frame_time
cap.release()
cv2.destroyAllWindows()
Most resources on the topic recommend 3000-5000 images for positive and negative each. That might very well be the reason for lower accuracy.
Some resources:
Link 1 - sonots
Link 2 - opencv-user-blog
Link 3 - computer vision software
Link 4 - pythonprogramming.net
if your image above is a 'typical' one, then it cannot work ever, using cascades.
those need reliable texture and pose, your scene lacks both.
(i also guess, that you do not really have 100 positive images, but that you tried to "synthesize" them from a few or a single image only, proven NOT to work in real life)
dont waste more time on this.
get more (real !) images, and read up on object detection cnn's like SSD or YOLO, which are far more robust with your situation.
I'm trying to test my model on video for prediction.
I want to make prediction using my cnn(alexnet)+lstm model on a video that I have, but when it runs, nothing from the video appear.
Here is my code:
vid = cv2.VideoCapture("Data Fix/Data16_133.mp4")
while(vid.isOpened()):
ret, frame = vid.read()
vid.set(3, 480)
vid.set(4, 240)
start = time.time()
if ret == True:
total_frame += 1
draw = frame.copy()
draw = cv2.cvtColor(draw, cv2.COLOR_BGR2RGB)
scale_percent = 20 # percent of original size
width = 224
height = width
dim = (width, height)
frame_set = cv2.resize(draw, dim, interpolation = cv2.INTER_AREA)
frame_set=np.arange(10*width*height*3).reshape(10,width, height, 3)
frame_set.reshape(10, width, height, 3).shape
frame_set = np.expand_dims(frame_set, axis=0)
result=model.predict_on_batch(frame_set)
cv2.imshow('Result', result)
print(result)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
vid.release()
cv2.destroyAllWindows()
when I print the result, it keep printing this value over and over without showing nothing from the cv2.imshow
[[0.0602112 0.01403825 0.3782384 0.5362205 0.01129159]]
does anyone have a clue about this?
any answer would be grateful
currently I'm trying this tutorial to make the model and how I put the dataset too are same, the different are I didn't use the MobileNet transfer learning, I modified it using AlexNet model.
The problem is that your results are not an image that you can display using OpenCV. Your results are output from your model which according to the shared notebook is a classification model and represents the class probabilities. I assume you are trying to predict some class corresponding to the video. If you want to see the frame then you have to use it like this:
cv2.imshow('frame', frame) # to see the frame
# below to see the draw
cv2.imshow('draw', draw)
Edit: If you want to show the predicted class on the image then
do the following
# Get the predicted class from the result using argmax
pred_class = np.argmax(result)
# Here I assume that the index is the desired class like most cases
# Now we will write the class label on the image
# Set the font and place
font = cv2.FONT_HERSHEY_SIMPLEX
org = (50, 50)
cv2.putText(frame, str(pred_class), org, font, .5, (255,255,255),2,cv2.LINE_AA)
# now just show the frame
cv2.imshow('frame', frame)
I'm currently working on a style transfer project and wanted to look at the difference between the salience maps of the content and style image. I've managed to get the actual transfer working but am having issues trying to workout how to minimize the saliency loss between 2 images. The code below is the one used to generate the salience maps.
import cv2
imgpath = r'Content Image.jpg'
image = cv2.imread(imgpath)
saliency = cv2.saliency.StaticSaliencySpectralResidual_create()
(success, saliencyMap) = saliency.computeSaliency(image)
saliencyMap = (saliencyMap * 255).astype("uint8")
cv2.imshow("Image", image)
cv2.imshow("Output", saliencyMap)
cv2.waitKey(0)
cv2.destroyAllWindows()
imgpath = r'Content Image.jpg'
image = cv2.imread(imgpath)
saliency = cv2.saliency.StaticSaliencyFineGrained_create()
(success, saliencyMap) = saliency.computeSaliency(image)
# Set threshold for saliency map
threshMap = cv2.threshold(saliencyMap.astype("uint8"), 0, 255,cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
cv2.imshow("Image", image)
cv2.imshow("Output", saliencyMap)
# cv2.imshow("Thresh", threshMap)
cv2.waitKey(0)
The pictures below are the result of running the above code except the Content Image is replace with 'Style Image'. I can see that the map is working fine, however, I have been struggling on how to work out how to get a value for the salience map or how to subtract one from the other in order to see what the difference is between the 2 if that makes sense.
So my question is, is there a way to compute the numerical difference between the 2 maps? I am looking to minimize this "difference" between the 2 maps but have not figured out how to do it.
Thanks
The issue I had was to make sure that both are the same size and then you can do an absolute difference.
z = cv2.absdiff(g,l)
This gives the resultant difference between the 2.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
for a university project I am programming a face mask recognition. For detecting faces, I use the cv2.CascadeClassifier('face_detector.xml'). As I noticed, this program is taking up way too much of the CPU resulting in a heavily disordered video stream frame rate.
I am running the code on a MacBook Air with a 1.6Hz Dual Core (Intel Core i5).
Can someone explain what I can change to make it smoother? Or maybe recommend another face detection?
Here is my code:
import numpy as np
import os
import tensorflow as tf
import cv2
from matplotlib.pyplot import gray
# Disable tensorflow compilation warnings
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
import cv2
# Load the cascade
face_cascade = cv2.CascadeClassifier('face_detector.xml')
# To capture video from webcam.
cap = cv2.VideoCapture(0)
# To use a video file as input
# cap = cv2.VideoCapture('filename.mp4')
model = tf.keras.models.load_model('checkpoint19.ckpt')
i = 0
while True:
# Read the frame
_, img = cap.read()
# Detect the faces
faces = face_cascade.detectMultiScale(img, 1.3, 4)
# save each frame as image with PNG format
image = cv2.imwrite('database/{index}.png'.format(index=i), img)
i += 1
# cut out the fragment in the box of the image
# Draw the rectangle around each face
for (x, y, w, h) in faces:
crop_img = img[y:y + h, x:x + w]
resizedImg = cv2.resize(crop_img, (224, 224))
gray = cv2.cvtColor(resizedImg, cv2.COLOR_BGR2GRAY)
imgArrNew = gray.reshape(1, 224, 224, 1)
prediction = model.predict(imgArrNew)
print(prediction)
label = np.argmax(prediction)
print(label)
# font
font = cv2.FONT_HERSHEY_SIMPLEX
# org
for (x, y, w, h) in faces:
org = (x, y+h+30)
# fontScale
fontScale = 1
# Blue color in BGR
color = (255, 0, 0)
# Line thickness of 2 px
thickness = 2
# output the predicted label/sign on the live-stream frame
if label == 0:
color = (0,0,225)
label_out = "Mask off"
if label == 1:
color = (50,205,50)
label_out = "Mask on"
if label == 2:
color = (0,255,225)
label_out = "incorrect Mask"
cv2.rectangle(img, (x, y), (x + w, y + h), color, 2)
image1 = cv2.putText(img, label_out, org, font,
fontScale, color, thickness, cv2.LINE_AA)
# Display
cv2.imshow('Face_Regonition', img)
# Stop if escape key is pressed
k = cv2.waitKey(30) & 0xff
if k == 27:
break
# Release the VideoCapture object
cap.release()
Thanks for your help :)
haar cascaded classifier is slow. . To do detection in every single frame is hard for low-end computing devices.
The easiest way is to use a lower resolution image or lower FPS. But it will appear to be cheap
The better way is to use a detection and tracking framework where detection happens at a 1hz interval at a new thread and tracking can happen at 30hz, which human eye cant tell the difference.
For detection of face, you can choose any method such as hear, HOG, CNN and put it in a new thread. In the main tracking thread (which can run in real time) update the model and predict the bounding box and display it.
You may look for the tracking from here. I suggest KCF based method for it is fast and reliable.
https://www.pyimagesearch.com/2018/07/30/opencv-object-tracking/
Just put the detection box rect as input rect box for the tracking. THen it should work directly.
Can any one give an example of edge box detection algorithm to generate proposals for object detection using open cv.
We can get the details from https://docs.opencv.org/3.4.0/d4/d0d/group__ximgproc__edgeboxes.html
Yes. First you will need to download the model file that is used for Edge Boxes here. Once you do that, the following code below (taken from their Github) can be used as an example for running the Edge Boxes algorithm. In short, put the code below into a separate file called edgeboxes_demo.py, then in the terminal type in:
python model.yml.gz image_file
model.yml.gz is the model that you saved from the link above which I assume is in the same directory where the code is. image_file is the path to the image you want to use for testing the algorithm. The code will run the Edge Boxes algorithm then draw the detected boxes on the image in green:
import cv2 as cv
import numpy as np
import sys
if __name__ == '__main__':
model = sys.argv[1]
im = cv.imread(sys.argv[2])
edge_detection = cv.ximgproc.createStructuredEdgeDetection(model)
rgb_im = cv.cvtColor(im, cv.COLOR_BGR2RGB)
edges = edge_detection.detectEdges(np.float32(rgb_im) / 255.0)
orimap = edge_detection.computeOrientation(edges)
edges = edge_detection.edgesNms(edges, orimap)
edge_boxes = cv.ximgproc.createEdgeBoxes()
edge_boxes.setMaxBoxes(30)
boxes = edge_boxes.getBoundingBoxes(edges, orimap)
for b in boxes:
x, y, w, h = b
cv.rectangle(im, (x, y), (x+w, y+h), (0, 255, 0), 1, cv.LINE_AA)
cv.imshow("edges", edges)
cv.imshow("edgeboxes", im)
cv.waitKey(0)
cv.destroyAllWindows()
Test Image
Result