DeepFace for extracting vector information of an image - python

for doing face recognition with Deepface I am trying to extract vector information of an image to store in db. So that next time in order to match, I will extract the new image's vector information and look into the db . If the search has results then its a match. I used verify method of the DeepFace but its comparing between 2 images and returning with this:
from deepface import DeepFace
import os
detected_face = DeepFace.detectFace("sly.jpg")
print (detected_face)
this is the output for above:
result = DeepFace.verify("sly1.jpg","sly2.jpg");
for this I get:
Using VGG-Face model backend and cosine distance.
{'verified': True, 'distance': 1.1920928955078125e-07, 'max_threshold_to_verify': 0.4, 'model': 'VGG-Face', 'similarity_metric': 'cosine'}
This is the comparison result, but I need information of only one image without comparison because I will have lots of records to search(for vector info) when a new face will be tested. Any help will be appreciated.

I am assuming it is this repo among others with same name and installed with setup.py or pip install deepface.
I tested this on google colab. For using locally use cv2.imshow(...) instead of cv2_imshow(...).
Downloading test images
!wget "http://*.jpg" -O "1.jpg"
!wget "https://*.jpg" -O "2.jpg"
Check image
import cv2
from google.colab.patches import cv2_imshow
im1 = cv2.imread("1.jpg")
#cv2.imshow("img", im1)
cv2_imshow(im1)
Face Detection
The output from DeepFace.detectFace returns normalized cropped face. For mtcnn I got image of shape (224, 224, 3). You can verify and view the image with,
from deepface import DeepFace
import cv2
from google.colab.patches import cv2_imshow
#backends = ['opencv', 'ssd', 'dlib', 'mtcnn']
backends = ['mtcnn']
for backend in backends:
#face detection and alignment
detected_face = DeepFace.detectFace("1.jpg", detector_backend = backend)
print(detected_face)
print(detected_face.shape)
im = cv2.cvtColor(detected_face * 255, cv2.COLOR_BGR2RGB)
#cv2.imshow("image", im)
cv2_imshow(im)
Output
[[[0.12156863 0.05882353 0.02352941]
[0.2901961 0.18039216 0.1254902 ]
[0.3137255 0.20392157 0.14901961]
...
[0.06666667 0.01176471 0.01176471]
[0.05882353 0.01176471 0.00784314]
[0.03921569 0.00784314 0.00392157]]
[[0.26666668 0.2 0.16470589]
[0.19215687 0.08235294 0.02745098]
[0.33333334 0.22352941 0.16862746]
...
[0.03921569 0.00392157 0.00392157]
[0.04313726 0.00784314 0.00784314]
[0.04313726 0. 0.00392157]]
[[0.11764706 0.05098039 0.01568628]
[0.21176471 0.10588235 0.05882353]
[0.44313726 0.3372549 0.27058825]
...
[0.02352941 0.00392157 0. ]
[0.02352941 0.00392157 0. ]
[0.02745098 0. 0. ]]
...
[[0.24313726 0.1882353 0.13725491]
[0.24313726 0.18431373 0.13725491]
[0.22745098 0.16470589 0.11372549]
...
[0.654902 0.69803923 0.78431374]
[0.62352943 0.67058825 0.7529412 ]
[0.38431373 0.4117647 0.45882353]]
[[0.23529412 0.18039216 0.12941177]
[0.22352941 0.16862746 0.11764706]
[0.22745098 0.16470589 0.11764706]
...
[0.6392157 0.69803923 0.78039217]
[0.6156863 0.6745098 0.75686276]
[0.36862746 0.40392157 0.4627451 ]]
[[0.21568628 0.16862746 0.10980392]
[0.2 0.15294118 0.09803922]
[0.20784314 0.14901961 0.10196079]
...
[0.6313726 0.6901961 0.77254903]
[0.6039216 0.6627451 0.74509805]
[0.36078432 0.39607844 0.4509804 ]]]
(224, 224, 3)
Face Embedding
Since, you are looking for embedding vector you can get it with below. It is a modified version of verify function. I kept option for two images, distance calculation, verification, but you can modify it generate face embedding for a single face only. I did not remove any unused imports.
"""
Modified verify function for face embedding generation
backends = ['opencv', 'ssd', 'dlib', 'mtcnn']
"""
from keras.preprocessing import image
import warnings
warnings.filterwarnings("ignore")
import time
import os
from os import path
from pathlib import Path
import gdown
import numpy as np
import pandas as pd
from tqdm import tqdm
import json
import cv2
from keras import backend as K
import keras
import tensorflow as tf
import pickle
from deepface import DeepFace
from deepface.basemodels import VGGFace, OpenFace, Facenet, FbDeepFace, DeepID
from deepface.extendedmodels import Age, Gender, Race, Emotion
from deepface.commons import functions, realtime, distance as dst
def FaceEmbeddingAndDistance(img1_path, img2_path = '', model_name ='Facenet', distance_metric = 'cosine', model = None, enforce_detection = True, detector_backend = 'mtcnn'):
#--------------------------------
#ensemble learning disabled.
if model == None:
if model_name == 'VGG-Face':
print("Using VGG-Face model backend and", distance_metric,"distance.")
model = VGGFace.loadModel()
elif model_name == 'OpenFace':
print("Using OpenFace model backend", distance_metric,"distance.")
model = OpenFace.loadModel()
elif model_name == 'Facenet':
print("Using Facenet model backend", distance_metric,"distance.")
model = Facenet.loadModel()
elif model_name == 'DeepFace':
print("Using FB DeepFace model backend", distance_metric,"distance.")
model = FbDeepFace.loadModel()
elif model_name == 'DeepID':
print("Using DeepID2 model backend", distance_metric,"distance.")
model = DeepID.loadModel()
elif model_name == 'Dlib':
print("Using Dlib ResNet model backend", distance_metric,"distance.")
from deepface.basemodels.DlibResNet import DlibResNet #this is not a must because it is very huge.
model = DlibResNet()
else:
raise ValueError("Invalid model_name passed - ", model_name)
else: #model != None
print("Already built model is passed")
#------------------------------
#face recognition models have different size of inputs
#my environment returns (None, 224, 224, 3) but some people mentioned that they got [(None, 224, 224, 3)]. I think this is because of version issue.
if model_name == 'Dlib': #this is not a regular keras model
input_shape = (150, 150, 3)
else: #keras based models
input_shape = model.layers[0].input_shape
if type(input_shape) == list:
input_shape = input_shape[0][1:3]
else:
input_shape = input_shape[1:3]
input_shape_x = input_shape[0]
input_shape_y = input_shape[1]
#------------------------------
#tuned thresholds for model and metric pair
threshold = functions.findThreshold(model_name, distance_metric)
#------------------------------
#----------------------
#crop and align faces
img1 = functions.preprocess_face(img=img1_path, target_size=(input_shape_y, input_shape_x), enforce_detection = enforce_detection, detector_backend = detector_backend)
img2 = functions.preprocess_face(img=img2_path, target_size=(input_shape_y, input_shape_x), enforce_detection = enforce_detection, detector_backend = detector_backend)
#----------------------
#find embeddings
img1_representation = model.predict(img1)[0,:]
img2_representation = model.predict(img2)[0,:]
print("FACE 1 Embedding:")
print(img1_representation)
print("FACE 2 Embedding:")
print(img2_representation)
#----------------------
#find distances between embeddings
if distance_metric == 'cosine':
distance = dst.findCosineDistance(img1_representation, img2_representation)
elif distance_metric == 'euclidean':
distance = dst.findEuclideanDistance(img1_representation, img2_representation)
elif distance_metric == 'euclidean_l2':
distance = dst.findEuclideanDistance(dst.l2_normalize(img1_representation), dst.l2_normalize(img2_representation))
else:
raise ValueError("Invalid distance_metric passed - ", distance_metric)
print("DISTANCE")
print(distance)
#----------------------
#decision
if distance <= threshold:
identified = "true"
else:
identified = "false"
print("IDENTIFIED")
print(identified)
Above function is called via,
FaceEmbeddingAndDistance("1.jpg", "2.jpg", model_name='Facenet', detector_backend = 'mtcnn')
Output
FACE 1 Embedding:
[-0.7229302 -1.766835 -1.5399052 0.59634393 1.203212 -1.693247
-0.90845925 0.5264039 2.148173 -0.9786542 -0.00369854 -1.2710322
-1.5515596 -0.4111185 -0.36896533 -0.30051672 0.35091963 0.5073533
-1.7270111 -0.5230838 0.3376239 -1.0811361 1.5242224 -0.6137103
-1.3100258 0.80050004 -0.7087368 -0.64483845 1.0830203 2.6056807
-0.76527536 -0.83047277 -0.7335422 -0.01964059 -0.86749244 2.9645889
-2.426583 -0.11157394 -2.3535717 -0.65058017 0.30864614 -0.77746457
-0.6233895 0.44898677 2.5578005 -0.583796 0.8406945 1.1105415
-1.652044 -0.6351479 0.07651432 -1.0454555 -1.8752071 0.50948805
-1.6050931 -1.1769634 -0.02965304 1.5107706 0.83292925 -0.5382068
-1.5981512 -0.6405941 0.5521577 0.22957848 0.506649 0.24680384
-0.91464925 -0.18441322 -0.6801975 -1.0448433 0.52288735 -0.79405725
0.5974493 -0.40668172 -0.00640235 -0.742475 0.1928863 0.31236258
-0.37383577 -1.5883486 -1.5336255 -0.74254227 -0.8524561 -1.4625055
-2.718953 -0.7180952 -1.2140683 -0.5232462 1.2576898 -1.1097553
2.3971314 0.8855096 -0.16556528 -0.07307663 -1.8778017 0.8690948
-0.39043528 -0.5494097 -2.2382076 0.7101087 0.15859437 0.2959841
0.8605075 -0.2040207 0.77952844 0.04542177 0.92514265 -1.988945
0.9418363 1.6509243 -0.20324889 0.2974357 0.37681833 1.095943
1.6308782 -1.2553837 -0.10246387 -1.4697052 -0.5832107 -0.34192032
-1.1347024 1.5154309 -0.00527111 -1.165709 -0.7296148 -0.20767921
1.2530949 -0.9487353 ]
FACE 2 Embedding:
[ 0.9399996 1.3996615 -1.2931366 0.6869738 -0.03219241 0.96111965
0.7378809 -0.24804354 -0.8128112 0.19901593 0.48911542 -0.91603553
-1.1671298 0.88576627 0.25427592 1.1395477 0.45400882 -1.4845027
-0.90582514 -1.1371222 0.47669724 1.2933927 1.4533392 -0.46943524
0.10245587 -1.4916894 -2.3223586 -0.10979578 1.7803721 1.0051152
-0.09164213 -0.64848715 -1.4191641 1.811776 0.73174113 0.2582223
-0.26430857 1.7021953 -1.0571098 -1.1215096 0.3606074 1.5136883
-0.30045512 0.26225814 -0.19101554 1.269355 1.0674374 -0.2550623
-1.0582973 1.7474637 -1.7739134 -0.67914337 -0.1877765 1.1581128
-2.281225 1.3955555 -1.2690883 -0.16299461 1.337664 -0.8831901
-0.6862674 2.0526903 -0.6325836 1.333468 -0.10851342 -0.64831966
-1.0277263 1.4572504 -0.29905424 -0.33187118 -0.54727656 1.1528811
0.12454037 -1.5835186 -0.2271783 1.3911225 1.0170195 0.5741334
-1.3088373 -0.5950714 -0.6856393 -0.910367 -2.0136826 -0.73777384
0.319223 -2.1968741 0.9673934 -0.604423 -0.08049382 -1.948634
1.88159 0.20169139 0.7295723 -1.0224706 1.2995481 -0.3402595
1.1711328 -0.64862376 0.42063504 -0.01502114 -0.7048841 1.4360497
-1.2988033 0.31773448 1.534014 0.98858756 1.3450235 -0.9417385
0.26414695 -0.01988658 0.7418235 -0.04945141 -0.44838902 1.5288658
-1.1905407 0.13961646 -0.17101136 -0.18599203 -1.9648114 0.66071814
-0.07431012 1.5870664 1.5989372 -0.21751085 0.78908855 -1.5576671
0.02266342 0.20999858]
DISTANCE
0.807837575674057
IDENTIFIED
false

It becomes easier in DeepFace 0.0.41.
from deepface import DeepFace
from deepface.commons import functions
models = ['VGG-Face', 'Facenet', 'OpenFace', 'DeepFace', 'DeepID', 'Dlib']
model = DeepFace.build_model(models[0])
target_size = model.layers[0].input_shape
img1_path = "img1.jpg"
img2_path = "img2.jpg"
#detect and align
img1 = functions.preprocess_face(img1_path, target_size = target_size)
img2 = functions.preprocess_face(img2_path, target_size = target_size)
#find vector embeddings
img1_embedding = model.predict(img1)
img2_embedding = model.predict(img2)

Related

Image preprocessing code error of target detection neural network based on U-Net network architecture

Recently I found a U-Net-based target detection network code, but there is always a missing "image_index" in the definition position of the image preprocessing part of the code. I tried many methods to solve it, but all failed.
This is code:
import torch
from torch.utils.data import Dataset
import glob
from PIL import Image
from torchvision import transforms
from skimage.segmentation import mark_boundaries
from torchvision.transforms.functional import to_pil_image
from torchvision.transforms.transforms import Grayscale, RandomHorizontalFlip, Resize, ToTensor
import numpy as np
import matplotlib.pyplot as plt
import os
class InfraredDataset(Dataset):
def __init__(self, dataset_dir, image_index):
super(InfraredDataset, self).__init__()
self.dataset_dir = dataset_dir
self.image_inde x = image_index
self.transformer = transforms.Compose([
Resize((256, 256)),
Grayscale(),
ToTensor(),
RandomHorizontalFlip(0.5),
])
def __getitem__(self, index):
image_index = self.image_index[index].strip('\n')
image_path = os.path.join(self.dataset_dir, 'images', '%s.png' % image_index)
label_path = os.path.join(self.dataset_dir, 'masks', '%s_pixels0.png' % image_index)
image = Image.open(image_path)
label = Image.open(label_path)
torch.manual_seed(1024)
tensor_image = self.transformer(image)
torch.manual_seed(1024)
label = self.transformer(label)
label[label > 0] = 1
return tensor_image, label
def __len__(self):
return len(self.image_index)
if __name__ == "__main__":
f = open('../sirst/idx_427/trainval.txt').readlines()
ds = InfraredDataset(f)
# 数据集测试
for i, (image, label) in enumerate(ds):
image, label = to_pil_image(image), to_pil_image(label)
image, label = np.array(image), np.array(label)
print(image.shape, label.shape)
vis = mark_boundaries(image, label, color=(1, 1, 0))
image, label = np.stack([image] * 3, -1), np.stack([label] * 3, -1)
plt.imsave('image_%d.png' % i, vis)
This is error:
Traceback (most recent call last):
File "H:/ProgramData/Infrared-detect-by-segmentation-master/Infrared-detect-by-segmentation-master/utils/dataloader.py", line 55, in <module>
ds = InfraredDataset(f)
TypeError: __init__() missing 1 required positional argument: 'image_index'
I tried a lot of methods, but I didn't find a solution, I hope the big guys can help, thank you!
As you see, your code define two key words are "dataset_dir" and "image_index",but when you test the module of the dataste, you defint the "InfaredDataset" rename as "ds" but the the module's defintion have not change,so if you want test the module, you must define two key words like that:
if __name__ == "__main__":
dataset_dir = 'H:/ProgramData/Infrared-detect-by-segmentation-master/Infrared-detect-by-segmentation-master/sirst'
image_index = open('H:/ProgramData/Infrared-detect-by-segmentation-master/Infrared-detect-by-segmentation-master/sirst/idx_320/val.txt').readlines()
ds = InfraredDataset(dataset_dir, image_index)
then you can slove the problem.

How to Fix AttributeError: 'JpegImageFile' object has no attribute 'load_img'

Iam a beginner and I am learning to code an image classifier. My goal is to create a predict function.
in this project I want to make a car prediction model, I have a problem when I will load_image from the keras.preprocessing function there is an error JpegImageFile' object has no attribute 'load_img'
this is my code
from google.colab.patches import cv2_imshow
import cv2
import glob
from keras.preprocessing import image
import numpy as np
ambil = glob.glob("*.jpg")
for foto in ambil:
lol = cv2.imread(foto)
with open(foto, 'rb') as f:
np_image_string = np.array([f.read()])
image = Image.open(foto)
width, height = image.size
gambar_masuk = np.array(image.getdata()).reshape(height, width, 3).astype(np.uint8)
num_detections, detection_boxes, detection_classes, detection_scores, detection_masks, image_info = session.run(
['NumDetections:0', 'DetectionBoxes:0', 'DetectionClasses:0', 'DetectionScores:0', 'DetectionMasks:0', 'ImageInfo:0'],
feed_dict={'Placeholder:0': np_image_string})
num_detections = np.squeeze(num_detections.astype(np.int32), axis=(0,))
detection_boxes = np.squeeze(detection_boxes * image_info[0, 2], axis=(0,))[0:num_detections]
detection_scores = np.squeeze(detection_scores, axis=(0,))[0:num_detections]
detection_classes = np.squeeze(detection_classes.astype(np.int32), axis=(0,))[0:num_detections]
detection_boxes = detection_boxes[detection_classes==3]
detection_scores = detection_scores[detection_classes==3]
detection_boxes = detection_boxes[detection_scores>0.8]
detection_boxes = detection_boxes.astype(int)
print(detection_boxes)
urut=1
for kotak in detection_boxes:
hasil = lol[kotak[0]:kotak[2],kotak[1]:kotak[3],:]
hasil_potong = 'hasil'+str(urut)+'.jpg'
cv2.imwrite(hasil_potong, hasil)
lihat = cv2.imread(hasil_potong)
cv2_imshow(lihat)
img = image.load_img(lihat, target_size = (size_, size_))
You are overwriting image. You have these two lines:
from keras.preprocessing import image
:
:
image = Image.open(foto)
You import image from keras.processing, but then you overwrite it in the second shown line.
Either import image a different way, or use a different variable name for the opened image...

Realtime yolov5 detection with Desktop screen as input

I have a script that grabs an application's screenshot and displays it. it works quite nicely on my machine like a video with around 60FPS.
import os
os.getcwd()
from PIL import ImageGrab
import numpy as np
import cv2
import pyautogui
import win32gui
import time
from mss import mss
from PIL import Image
import tempfile
os.system('calc')
sct = mss()
xx=1
tstart = time.time()
while xx<10000:
hwnd = win32gui.FindWindow(None, 'Calculator')
left_x, top_y, right_x, bottom_y = win32gui.GetWindowRect(hwnd)
#screen = np.array(ImageGrab.grab( bbox = (left_x, top_y, right_x, bottom_y ) ) )
bbox = {'top': top_y, 'left': left_x, 'width': right_x-left_x, 'height':bottom_y-top_y }
screen = sct.grab(bbox)
scr = np.array(screen)
cv2.imshow('window', scr)
if cv2.waitKey(25) & 0xFF == ord('q'):
cv2.destroyAllWindows()
break
xx+=1
cv2.destroyAllWindows()
tend = time.time()
print(xx/(tend-tstart))
print((tend-tstart))
os.system('taskkill /f /im calculator.exe')
I would like to run yolov5's detect.py on this scr image without having to save to disk all the time. I'd also like to show the images with bounding boxes and have their coordinates saved somewhere.
My python level is not good enough, I tried importing detect and adding arguments, but it doesn't seem like it accepts any function parameter, only command line arguments.
Perhaps I should adapt this line, or use opencv?
parser.add_argument('--source', type=str, default='data/images', help='source') # file/folder, 0 for webcam
Any idea? thanks (this is the detect.py file for yolov5)
import argparse
import time
from pathlib import Path
import cv2
import torch
import torch.backends.cudnn as cudnn
from numpy import random
from models.experimental import attempt_load
from utils.datasets import LoadStreams, LoadImages
from utils.general import check_img_size, non_max_suppression, apply_classifier, scale_coords, xyxy2xywh, \
strip_optimizer, set_logging, increment_path
from utils.plots import plot_one_box
from utils.torch_utils import select_device, load_classifier, time_synchronized
def detect(save_img=False):
source, weights, view_img, save_txt, imgsz = opt.source, opt.weights, opt.view_img, opt.save_txt, opt.img_size
webcam = source.isnumeric() or source.endswith('.txt') or source.lower().startswith(
('rtsp://', 'rtmp://', 'http://'))
# Directories
save_dir = Path(increment_path(Path(opt.project) / opt.name, exist_ok=opt.exist_ok)) # increment run
(save_dir / 'labels' if save_txt else save_dir).mkdir(parents=True, exist_ok=True) # make dir
# Initialize
set_logging()
device = select_device(opt.device)
half = device.type != 'cpu' # half precision only supported on CUDA
# Load model
model = attempt_load(weights, map_location=device) # load FP32 model
imgsz = check_img_size(imgsz, s=model.stride.max()) # check img_size
if half:
model.half() # to FP16
# Second-stage classifier
classify = False
if classify:
modelc = load_classifier(name='resnet101', n=2) # initialize
modelc.load_state_dict(torch.load('weights/resnet101.pt', map_location=device)['model']).to(device).eval()
# Set Dataloader
vid_path, vid_writer = None, None
if webcam:
view_img = True
cudnn.benchmark = True # set True to speed up constant image size inference
dataset = LoadStreams(source, img_size=imgsz)
else:
save_img = True
dataset = LoadImages(source, img_size=imgsz)
# Get names and colors
names = model.module.names if hasattr(model, 'module') else model.names
colors = [[random.randint(0, 255) for _ in range(3)] for _ in names]
# Run inference
t0 = time.time()
img = torch.zeros((1, 3, imgsz, imgsz), device=device) # init img
_ = model(img.half() if half else img) if device.type != 'cpu' else None # run once
for path, img, im0s, vid_cap in dataset:
img = torch.from_numpy(img).to(device)
img = img.half() if half else img.float() # uint8 to fp16/32
img /= 255.0 # 0 - 255 to 0.0 - 1.0
if img.ndimension() == 3:
img = img.unsqueeze(0)
# Inference
t1 = time_synchronized()
pred = model(img, augment=opt.augment)[0]
# Apply NMS
pred = non_max_suppression(pred, opt.conf_thres, opt.iou_thres, classes=opt.classes, agnostic=opt.agnostic_nms)
t2 = time_synchronized()
# Apply Classifier
if classify:
pred = apply_classifier(pred, modelc, img, im0s)
# Process detections
for i, det in enumerate(pred): # detections per image
if webcam: # batch_size >= 1
p, s, im0 = Path(path[i]), '%g: ' % i, im0s[i].copy()
else:
p, s, im0 = Path(path), '', im0s
save_path = str(save_dir / p.name)
txt_path = str(save_dir / 'labels' / p.stem) + ('_%g' % dataset.frame if dataset.mode == 'video' else '')
s += '%gx%g ' % img.shape[2:] # print string
gn = torch.tensor(im0.shape)[[1, 0, 1, 0]] # normalization gain whwh
if len(det):
# Rescale boxes from img_size to im0 size
det[:, :4] = scale_coords(img.shape[2:], det[:, :4], im0.shape).round()
# Print results
for c in det[:, -1].unique():
n = (det[:, -1] == c).sum() # detections per class
s += '%g %ss, ' % (n, names[int(c)]) # add to string
# Write results
for *xyxy, conf, cls in reversed(det):
if save_txt: # Write to file
xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist() # normalized xywh
line = (cls, *xywh, conf) if opt.save_conf else (cls, *xywh) # label format
with open(txt_path + '.txt', 'a') as f:
f.write(('%g ' * len(line)).rstrip() % line + '\n')
if save_img or view_img: # Add bbox to image
label = '%s %.2f' % (names[int(cls)], conf)
plot_one_box(xyxy, im0, label=label, color=colors[int(cls)], line_thickness=3)
# Print time (inference + NMS)
print('%sDone. (%.3fs)' % (s, t2 - t1))
# Stream results
if view_img:
cv2.imshow(str(p), im0)
if cv2.waitKey(1) == ord('q'): # q to quit
raise StopIteration
# Save results (image with detections)
if save_img:
if dataset.mode == 'images':
cv2.imwrite(save_path, im0)
else:
if vid_path != save_path: # new video
vid_path = save_path
if isinstance(vid_writer, cv2.VideoWriter):
vid_writer.release() # release previous video writer
fourcc = 'mp4v' # output video codec
fps = vid_cap.get(cv2.CAP_PROP_FPS)
w = int(vid_cap.get(cv2.CAP_PROP_FRAME_WIDTH))
h = int(vid_cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
vid_writer = cv2.VideoWriter(save_path, cv2.VideoWriter_fourcc(*fourcc), fps, (w, h))
vid_writer.write(im0)
if save_txt or save_img:
s = f"\n{len(list(save_dir.glob('labels/*.txt')))} labels saved to {save_dir / 'labels'}" if save_txt else ''
print(f"Results saved to {save_dir}{s}")
print('Done. (%.3fs)' % (time.time() - t0))
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--weights', nargs='+', type=str, default='yolov5s.pt', help='model.pt path(s)')
parser.add_argument('--source', type=str, default='data/images', help='source') # file/folder, 0 for webcam
parser.add_argument('--img-size', type=int, default=640, help='inference size (pixels)')
parser.add_argument('--conf-thres', type=float, default=0.25, help='object confidence threshold')
parser.add_argument('--iou-thres', type=float, default=0.45, help='IOU threshold for NMS')
parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
parser.add_argument('--view-img', action='store_true', help='display results')
parser.add_argument('--save-txt', action='store_true', help='save results to *.txt')
parser.add_argument('--save-conf', action='store_true', help='save confidences in --save-txt labels')
parser.add_argument('--classes', nargs='+', type=int, help='filter by class: --class 0, or --class 0 2 3')
parser.add_argument('--agnostic-nms', action='store_true', help='class-agnostic NMS')
parser.add_argument('--augment', action='store_true', help='augmented inference')
parser.add_argument('--update', action='store_true', help='update all models')
parser.add_argument('--project', default='runs/detect', help='save results to project/name')
parser.add_argument('--name', default='exp', help='save results to project/name')
parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')
opt = parser.parse_args()
print(opt)
with torch.no_grad():
if opt.update: # update all models (to fix SourceChangeWarning)
for opt.weights in ['yolov5s.pt', 'yolov5m.pt', 'yolov5l.pt', 'yolov5x.pt']:
detect()
strip_optimizer(opt.weights)
else:
detect()
EDIT I already have weights saved somewhere and am able to run detect on images that are saved on disc, just would like to skip this step to keep those FPS.
The Yolov5 repo is here
For standalone inference in 3rd party projects or repos importing your model into the python workspace with PyTorch Hub is the recommended method. See YOLOv5 PyTorch Hub tutorial here, specifically the section on loading custom models.
https://github.com/ultralytics/yolov5#tutorials
Custom Models
This example loads a custom 20-class VOC-trained YOLOv5s model 'yolov5s_voc_best.pt' with PyTorch Hub.
import torch
model = torch.hub.load('ultralytics/yolov5', 'custom', path_or_model='yolov5s_voc_best.pt')
model = model.autoshape() # for PIL/cv2/np inputs and NMS
Then once the model is loaded:
from PIL import Image
# Images
img1 = Image.open('zidane.jpg')
img2 = Image.open('bus.jpg')
imgs = [img1, img2] # batched list of images
# Inference
result = model(imgs, size=640) # includes NMS
result.print()
import cv2
import torch
from mss import mss
import numpy as np
model = torch.hub.load("/yolov5", 'custom', path="yolov5/best.pt", source='local')
sct = mss()
while 1:
w, h = 1920, 1080
monitor = {'top': 0, 'left': 0, 'width': w, 'height': h}
img = Image.frombytes('RGB', (w, h), sct.grab(monitor).rgb)
screen = cv2.cvtColor(np.array(img), cv2.COLOR_RGB2BGR)
# set the model use the screen
result = model(screen, size=640)
cv2.imshow('Screen', result.render()[0])
if cv2.waitKey(25) & 0xFF == ord('q'):
cv2.destroyAllWindows()
break
im a noob in programming
and using desktop screen to run inference can be found in yolov5's github page
https://github.com/ultralytics/yolov5/issues/36
import cv2
import numpy
import torch
from mss import mss
from PIL import ImageGrab
im = numpy.array(ImageGrab.grab(bbox=(0,0,1920,1080)))
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')
model.conf = 0.6
image = r'D:\i\test\yolov5-master(original)\yolov5-master\data\images\zidane.jpg'
results = model(im)
results.print()
results.show()
print(results.pandas().xyxy[0])
i have found mss().grab() have an rgb order issue, so use PIL instead

Combine TensorFlow Object Detection API with Keras Model

TensorFlow version: 1.14
Python version: 3.6.9
My purpose is to build an object detection system with classification. I used Object Detection API and I want to feed its output bounding boxes to another neural networks (there are 6 different objects to detect and then I want to classify these object with Keras neural networks by object's features).
When I use Object Detection API only its OK, but if I want to use model.predict() script crashes. As I've read there's a problem with graph and sessions.
I'm pretty fresh to all these stuff, so I want to ask: is this possible to use multiple models simultaneously?
I've read about creating two sessions and graphs but the input of Object Detection model is a live video from the webcam and I don't want to lose performance of a script. I tried to start session with each frame, but it's very slow.
Also maybe upgrading script to Tensorflow 2.0 will be helpful?
EDIT:
I want to detect fruits and pass them to another Keras models which will predict their state. Detecting fruits works good, but I cannot use additional Keras model, because of the following error:
Tensor Tensor("dense_3/Sigmoid:0", shape=(?, 1), dtype=float32) is not an element of this graph.
Code provided:
import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile
from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image
from keras import models
from keras.preprocessing import image
import cv2
if 'cap' in globals():
cap.release()
cap = cv2.VideoCapture(0)
sys.path.append("..")
graph = tf.get_default_graph()
from utils import label_map_util
from utils import visualization_utils as vis_util
def limit(value, max_val, min_val):
if(value > max_val):
value = max_val
elif(value < min_val):
value = min_val
return value
# What model to download.
MODEL_NAME = 'inference_graph'
MODEL_FILE = MODEL_NAME + '.tar.gz'
DOWNLOAD_BASE = 'http://download.tensorflow.org/models/object_detection/'
# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'
# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = 'training/labelmap.pbtxt'
NUM_CLASSES = 6
detection_graph = tf.Graph()
with detection_graph.as_default():
od_graph_def = tf.GraphDef()
with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
serialized_graph = fid.read()
od_graph_def.ParseFromString(serialized_graph)
tf.import_graph_def(od_graph_def, name='')
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)
def load_image_into_numpy_array_updated(image):
return np.array(image).astype(np.uint8)
# PATH_TO_TEST_IMAGES_DIR = 'test_images'
# TEST_IMAGE_PATHS = [ os.path.join(PATH_TO_TEST_IMAGES_DIR, 'image{}.jpg'.format(i)) for i in range(1, 3) ]
# Size, in inches, of the output images.
IMAGE_SIZE = (12, 8)
# Loading a keras model
model = models.load_model('new_banana.h5')
with detection_graph.as_default():
with tf.Session(graph=detection_graph) as sess:
while True:
ret, image_np = cap.read()
# Expand dimensions since the model expects images to have shape: [1, None, None, 3]
image_np_expanded = np.expand_dims(image_np, axis=0)
image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
# Each box represents a part of the image where a particular object was detected.
boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
# Each score represent how level of confidence for each of the objects.
# Score is shown on the result image, together with the class label.
scores = detection_graph.get_tensor_by_name('detection_scores:0')
classes = detection_graph.get_tensor_by_name('detection_classes:0')
num_detections = detection_graph.get_tensor_by_name('num_detections:0')
# Actual detection.
(boxes, scores, classes, num_detections) = sess.run(
[boxes, scores, classes, num_detections],
feed_dict={image_tensor: image_np_expanded})
image_np_copy = image_np.copy()
# Visualization of the results of a detection.
vis_util.visualize_boxes_and_labels_on_image_array(
image_np,
np.squeeze(boxes),
np.squeeze(classes).astype(np.int32),
np.squeeze(scores),
category_index,
use_normalized_coordinates=True,
line_thickness=8,
min_score_thresh=0.7)
# Code what are used to get thresholded bounding boxes from image
# enlarge them about compenser value, limitates them
# print them and send them to another script
# 0 - apple, 2 - banana, 3 - orange, 4 - pear, 5 - pepper, 6 - tomato
min_score_thresh = 0.7
bboxes = boxes[scores > min_score_thresh]
bclasses = classes[scores > min_score_thresh]
image_np_new = cv2.resize(image_np_copy, (800,600))
im_width, im_height = (800, 600)
if bclasses.size > 0:
final_box = []
cropped_images = []
compenser = 30
if(bclasses[0] == 2): #if any of detected classes stands for 'banana'
for box in bboxes:
ymin, xmin, ymax, xmax = box
ymin0 = int(im_height * ymin) - compenser
ymax0 = int(im_height * ymax) + compenser
xmin0 = int(im_width * xmin) - compenser
xmax0 = int(im_width * xmax) + compenser
ymin1 = limit(ymin0, im_height, 0)
ymax1 = limit(ymax0, im_height, 0)
xmax1 = limit(xmax0, im_width, 0)
xmin1 = limit(xmin0, im_width, 0)
image_cropped = image_np_new[ymin1:ymax1, xmin1:xmax1]
height, width, _ = image_cropped.shape
if width > height:
image_cropped = cv2.resize(image_cropped, (200, 150))
image_cropped = cv2.rotate(image_cropped, cv2.ROTATE_90_CLOCKWISE)
else:
image_cropped = cv2.resize(image_cropped, (150, 200))
image_cropped = load_image_into_numpy_array_updated(image_cropped)
image_cropped = image_cropped.reshape((1,) + image_cropped.shape)
image_cropped = image_cropped/255
cropped_images.append(image_cropped)
if (len(cropped_images) > 0):
for image in cropped_images:
print(image.shape)
# input tensor 200, 150, 3
classes = model.predict_classes(image, batch_size=10)
print(classes)
cv2.imshow('object detection', image_np)
if cv2.waitKey(25) & 0xFF == ord('q'):
cv2.destroyAllWindows()
cap.release()
break

How to use two models in Tensorflow object Detection API

In tensorflow Object Detection API we are using ssd_mobilenet_v1_coco_2017_11_17 model to detect 90 general objects. I want to use this model for detection.
Next, I have trained faster_rcnn_inception_v2_coco_2018_01_28 model to detect a custom object. I wish to use this in the same code where I will be able to detect those 90 objects as well as my new trained custom object. How to achieve this with single code?
I have achieved this by doing the following code in detect_object.py
import numpy as np
import tensorflow as tf
import sys
from PIL import Image
import cv2
from utils import label_map_util
from utils import visualization_utils as vis_util
# ------------------ Knife Model Initialization ------------------------------ #
knife_label_map = label_map_util.load_labelmap('training/labelmap.pbtxt')
knife_categories = label_map_util.convert_label_map_to_categories(
knife_label_map, max_num_classes=1, use_display_name=True)
knife_category_index = label_map_util.create_category_index(knife_categories)
knife_detection_graph = tf.Graph()
with knife_detection_graph.as_default():
knife_od_graph_def = tf.GraphDef()
with tf.gfile.GFile('inference_graph_3/frozen_inference_graph.pb', 'rb') as fid:
knife_serialized_graph = fid.read()
knife_od_graph_def.ParseFromString(knife_serialized_graph)
tf.import_graph_def(knife_od_graph_def, name='')
knife_session = tf.Session(graph=knife_detection_graph)
knife_image_tensor = knife_detection_graph.get_tensor_by_name('image_tensor:0')
knife_detection_boxes = knife_detection_graph.get_tensor_by_name(
'detection_boxes:0')
knife_detection_scores = knife_detection_graph.get_tensor_by_name(
'detection_scores:0')
knife_detection_classes = knife_detection_graph.get_tensor_by_name(
'detection_classes:0')
knife_num_detections = knife_detection_graph.get_tensor_by_name(
'num_detections:0')
# ---------------------------------------------------------------------------- #
# ------------------ General Model Initialization ---------------------------- #
general_label_map = label_map_util.load_labelmap('data/mscoco_label_map.pbtxt')
general_categories = label_map_util.convert_label_map_to_categories(
general_label_map, max_num_classes=90, use_display_name=True)
general_category_index = label_map_util.create_category_index(
general_categories)
general_detection_graph = tf.Graph()
with general_detection_graph.as_default():
general_od_graph_def = tf.GraphDef()
with tf.gfile.GFile('ssd_mobilenet_v1_coco_2017_11_17/frozen_inference_graph.pb', 'rb') as fid:
general_serialized_graph = fid.read()
general_od_graph_def.ParseFromString(general_serialized_graph)
tf.import_graph_def(general_od_graph_def, name='')
general_session = tf.Session(graph=general_detection_graph)
general_image_tensor = general_detection_graph.get_tensor_by_name(
'image_tensor:0')
general_detection_boxes = general_detection_graph.get_tensor_by_name(
'detection_boxes:0')
general_detection_scores = general_detection_graph.get_tensor_by_name(
'detection_scores:0')
general_detection_classes = general_detection_graph.get_tensor_by_name(
'detection_classes:0')
general_num_detections = general_detection_graph.get_tensor_by_name(
'num_detections:0')
# ---------------------------------------------------------------------------- #
def knife(image_path):
try:
image = cv2.imread(image_path)
image_expanded = np.expand_dims(image, axis=0)
(boxes, scores, classes, num) = knife_session.run(
[knife_detection_boxes, knife_detection_scores,
knife_detection_classes, knife_num_detections],
feed_dict={knife_image_tensor: image_expanded})
classes = np.squeeze(classes).astype(np.int32)
scores = np.squeeze(scores)
boxes = np.squeeze(boxes)
for c in range(0, len(classes)):
class_name = knife_category_index[classes[c]]['name']
if class_name == 'knife' and scores[c] > .80:
confidence = scores[c] * 100
break
else:
confidence = 0.00
except:
print("Error occurred in knife detection")
confidence = 0.0 # Some error has occurred
return confidence
def general(image_path):
try:
image = cv2.imread(image_path)
image_expanded = np.expand_dims(image, axis=0)
(boxes, scores, classes, num) = general_session.run(
[general_detection_boxes, general_detection_scores,
general_detection_classes, general_num_detections],
feed_dict={general_image_tensor: image_expanded})
classes = np.squeeze(classes).astype(np.int32)
scores = np.squeeze(scores)
boxes = np.squeeze(boxes)
object_name = []
object_score = []
for c in range(0, len(classes)):
class_name = general_category_index[classes[c]]['name']
if scores[c] > .30: # If confidence level is good enough
object_name.append(class_name)
object_score.append(str(scores[c] * 100)[:5])
except:
print("Error occurred in general detection")
object_name = ['']
object_score = ['']
return object_name, object_score
if __name__ == '__main__':
print(' in main')
I can do
import detect_object
detect_object.knife("image.jpg") # to detect whether knife is present in image(this is custom trained model)
detect_object.general("image.jpg") # to detect those 90 objects from TF API
I know there is knife model in TF API but it is not that much accurate so I retrained it for only knife. Finally I have two models
1. First model is to detect only knife,
2. Second model is to detect general object as usual
You cant combine both the models. Have two sections of code which will load one model at a time and identify whatever it can see in the image.
Other option is to re-train a single model that can identify all objects you are interested in

Categories