Related
currently I'm trying to train a Matterport Mask R-CNN with custom classes and a custom dataset on Colab. I followed this tutorial:
https://github.com/TannerGilbert/MaskRCNN-Object-Detection-and-Segmentation
Instead of using images with the same size, my images do have different sizes. I spend hours to match the masks(.json) to the image size. But finally it is working:
The following lines are loading two sample images and displaying the segmentation mask:
# Load and display random samples
image_ids = np.random.choice(dataset_train.image_ids, 2)
for image_id in image_ids:
image, img_height, img_width = dataset_train.load_image(image_id)
mask, class_ids = dataset_train.load_mask(image_id, img_height, img_width)
visualize.display_top_masks(image, mask, class_ids, dataset_train.class_names)
load_image() in utils.py is looking like this:
def load_image(self, image_id):
"""Load the specified image and return a [H,W,3] Numpy array.
"""
# Load image
image = skimage.io.imread(self.image_info[image_id]['path'])
img_height, img_width, num_channels = image.shape
# If grayscale. Convert to RGB for consistency.
if image.ndim != 3:
image = skimage.color.gray2rgb(image)
# If has an alpha channel, remove it for consistency
if num_channels == 4:
image = image[..., :3]
return image, img_height, img_width
load_mask() is looking like this:
def load_mask(self, image_id, img_height, img_width):
# get details of image
info = self.image_info[image_id]
# define box file location
path = info['annotation']
# load XML
masks, classes = self.extract_masks(path, img_height, img_width)
return masks, np.asarray(classes, dtype='int32')
And extract_mask() is looking like this:
def extract_masks(self, filename, img_height, img_width):
json_file = os.path.join(filename)
with open(json_file) as f:
img_anns = json.load(f)
masks = np.zeros([img_height, img_width, len(img_anns['shapes'])], dtype='uint8')
classes = []
for i, anno in enumerate(img_anns['shapes']):
mask = np.zeros([img_height, img_width], dtype=np.uint8)
cv2.fillPoly(mask, np.array([anno['points']], dtype=np.int32), 1)
masks[:, :, i] = mask
classes.append(self.class_names.index(anno['label']))
return masks, classes
Now we are getting to the curious part...
After going on with my code creating the model
# Create model in training mode
model = modellib.MaskRCNN(mode="training", config=config,
model_dir=MODEL_DIR)
...and choosing the weights to start with
# Which weights to start with?
init_with = "coco" # imagenet, coco, or last
if init_with == "imagenet":
model.load_weights(model.get_imagenet_weights(), by_name=True)
elif init_with == "coco":
# Load weights trained on MS COCO, but skip layers that
# are different due to the different number of classes
# See README for instructions to download the COCO weights
model.load_weights(COCO_MODEL_PATH, by_name=True,
exclude=["mrcnn_class_logits", "mrcnn_bbox_fc",
"mrcnn_bbox", "mrcnn_mask"])
elif init_with == "last":
# Load the last model you trained and continue training
model.load_weights(model.find_last(), by_name=True)
...I get to the point where training should start:
model.train(dataset_train, dataset_val,
learning_rate=config.LEARNING_RATE,
epochs=5,
layers='heads')
I'm receiving following error:
/content/drive/My Drive/Colab/Mask_RCNN/mrcnn/model.py in load_image_gt()
1210 # Load image and mask
1211 image, img_height, img_width = dataset.load_image(image_id)
-> 1212 mask, class_ids = dataset.load_mask(image_id, img_height, img_width)
1213 original_shape = image.shape
1214 image, window, scale, padding, crop = utils.resize_image(
TypeError: load_mask() missing 2 required positional arguments: 'img_height' and 'img_width'
I noticed, that load_image_gt() is different to the above mentioned load_image(). I've already adjusted load_image_gt() as followed (I added #<------- to the relevant lines):
def load_image_gt(dataset, config, image_id, augment=False, augmentation=None,
use_mini_mask=False):
"""Load and return ground truth data for an image (image, mask, bounding boxes).
augment: (deprecated. Use augmentation instead). If true, apply random
image augmentation. Currently, only horizontal flipping is offered.
augmentation: Optional. An imgaug (https://github.com/aleju/imgaug) augmentation.
For example, passing imgaug.augmenters.Fliplr(0.5) flips images
right/left 50% of the time.
use_mini_mask: If False, returns full-size masks that are the same height
and width as the original image. These can be big, for example
1024x1024x100 (for 100 instances). Mini masks are smaller, typically,
224x224 and are generated by extracting the bounding box of the
object and resizing it to MINI_MASK_SHAPE.
Returns:
image: [height, width, 3]
shape: the original shape of the image before resizing and cropping.
class_ids: [instance_count] Integer class IDs
bbox: [instance_count, (y1, x1, y2, x2)]
mask: [height, width, instance_count]. The height and width are those
of the image unless use_mini_mask is True, in which case they are
defined in MINI_MASK_SHAPE.
"""
# Load image and mask
image, img_height, img_width = dataset.load_image(image_id) **#<-------**
mask, class_ids = dataset.load_mask(image_id, img_height, img_width) **#<-------**
original_shape = image.shape
image, window, scale, padding, crop = utils.resize_image(
image,
min_dim=config.IMAGE_MIN_DIM,
min_scale=config.IMAGE_MIN_SCALE,
max_dim=config.IMAGE_MAX_DIM,
mode=config.IMAGE_RESIZE_MODE)
mask = utils.resize_mask(mask, scale, padding, crop)
# Random horizontal flips.
# TODO: will be removed in a future update in favor of augmentation
if augment:
logging.warning("'augment' is deprecated. Use 'augmentation' instead.")
if random.randint(0, 1):
image = np.fliplr(image)
mask = np.fliplr(mask)
# Augmentation
# This requires the imgaug lib (https://github.com/aleju/imgaug)
if augmentation:
import imgaug
# Augmenters that are safe to apply to masks
# Some, such as Affine, have settings that make them unsafe, so always
# test your augmentation on masks
MASK_AUGMENTERS = ["Sequential", "SomeOf", "OneOf", "Sometimes",
"Fliplr", "Flipud", "CropAndPad",
"Affine", "PiecewiseAffine"]
def hook(images, augmenter, parents, default):
"""Determines which augmenters to apply to masks."""
return augmenter.__class__.__name__ in MASK_AUGMENTERS
# Store shapes before augmentation to compare
image_shape = image.shape
mask_shape = mask.shape
# Make augmenters deterministic to apply similarly to images and masks
det = augmentation.to_deterministic()
image = det.augment_image(image)
# Change mask to np.uint8 because imgaug doesn't support np.bool
mask = det.augment_image(mask.astype(np.uint8),
hooks=imgaug.HooksImages(activator=hook))
# Verify that shapes didn't change
assert image.shape == image_shape, "Augmentation shouldn't change image size"
assert mask.shape == mask_shape, "Augmentation shouldn't change mask size"
# Change mask back to bool
mask = mask.astype(np.bool)
# Note that some boxes might be all zeros if the corresponding mask got cropped out.
# and here is to filter them out
_idx = np.sum(mask, axis=(0, 1)) > 0
mask = mask[:, :, _idx]
class_ids = class_ids[_idx]
# Bounding boxes. Note that some boxes might be all zeros
# if the corresponding mask got cropped out.
# bbox: [num_instances, (y1, x1, y2, x2)]
bbox = utils.extract_bboxes(mask)
# Active classes
# Different datasets have different classes, so track the
# classes supported in the dataset of this image.
active_class_ids = np.zeros([dataset.num_classes], dtype=np.int32)
source_class_ids = dataset.source_class_ids[dataset.image_info[image_id]["source"]]
active_class_ids[source_class_ids] = 1
# Resize masks to smaller size to reduce memory usage
if use_mini_mask:
mask = utils.minimize_mask(bbox, mask, config.MINI_MASK_SHAPE)
# Image meta data
image_meta = compose_image_meta(image_id, original_shape, image.shape,
window, scale, active_class_ids)
return image, img_height, img_width, image_meta, class_ids, bbox, mask **#<-------**
I don't know why the two requied arguments are missing, because img_height and img_with are defined, arent't they?
In my oppion the code is exactly the same like the "# Load and display random samples"-code mentioned above.
I would be very gratefull, if someone could help.
Many thanks in advance!
You must set width and height value in load_yourdatasetname() by self.add_imge and get in load_mask function
Example:
class Covid19Dataset(utils.Dataset):
def load_covid19(self, dataset_dir, subset):
"""Load a subset of the covid-19 dataset.
dataset_dir: Root directory of the dataset.
subset: Subset to load: train or val
"""
# Add classes. We have three class to add.
self.add_class("covid19", 1, "lung right")
self.add_class("covid19", 2, "lung left")
self.add_class("covid19", 3, "infection")
# Train or validation dataset?
assert subset in ["train", "val"]
dataset_dir = os.path.join(dataset_dir, subset)
dataset_Image_dir = os.path.join(dataset_dir, "Images")
dataset_Labels_dir = os.path.join(dataset_dir, "Labels")
# get image names in dataset directory
filenamesInDir = [f for f in listdir(dataset_Image_dir) if isfile(join(dataset_Image_dir, f))]
png_filenames = []
for name in filenamesInDir:
# Skip if file does not end with .png
if not name.endswith(".png") : continue
png_filenames.append(name)
# Add images
for a in png_filenames:
image_path = os.path.join(dataset_Image_dir, a)
label_path = os.path.join(dataset_Labels_dir, a)
image = skimage.io.imread(image_path)
height, width = image.shape[:2]
self.add_image(
"covid19",
image_id=a, # use file name as a unique image id
path=image_path,
label_path=label_path,
width=width, height=height)
def load_mask(self, image_id):
"""Generate instance masks for an image.
Returns:
masks: A bool array of shape [height, width, instance count] with
one mask per instance.
class_ids: a 1D array of class IDs of the instance masks.
"""
# If not a covid19 dataset image, delegate to parent class.
image_info = self.image_info[image_id]
if image_info["source"] != "covid19":
return super(self.__class__, self).load_mask(image_id)
# [height, width, instance_count]
info = self.image_info[image_id]
label = skimage.io.imread(info["label_path"])
Class_names = np.unique(label)
Class_names = Class_names[1:]
mask = np.zeros([info["height"], info["width"], len(Class_names)],
dtype=np.uint8)
class_ids = []
for i,p in enumerate(Class_names):
mask[label==p,i]=1
class_ids.append(i+1)
# Return mask, and array of class IDs of each instance. Since we have
class_ids = np.array(class_ids, dtype=np.int32)
mask = np.rot90(mask)
return mask.astype(np.bool), class_ids
def image_reference(self, image_id):
"""Return the path of the image."""
info = self.image_info[image_id]
if info["source"] == "covid19":
return info["path"]
else:
super(self.__class__, self).image_reference(image_id)
def load_image(self, image_id):
"""Load the specified image and return a [H,W,3] Numpy array.
"""
# Load image
image = skimage.io.imread(self.image_info[image_id]['path'])
#image = rotate(image, 90, resize=False)
image = np.rot90(image)
# If grayscale. Convert to RGB for consistency.
if image.ndim != 3:
image = skimage.color.gray2rgb(image)
# If has an alpha channel, remove it for consistency
if image.shape[-1] == 4:
image = image[..., :3]
return image
I am a novice in Computer Vision. I am trying to implement Real Time Face Recognition with Local Binary Patterns with its Face Detection part based on Deep Learning dnn module. I am using the caltech_faces dataset and have added a folder with my 20 photos to it.
So, this is my code. I basically transformed the code of the Face Recognition of sample images to a Real Time Face Recognition by making some changes and additions.
I get the following error when executing the below code:
predName = le.inverse_transform([predictions[i]])[0]
^
TabError: inconsistent use of tabs and spaces in indentation
I checked all the tabs and indentations, and cant find what and where to fix. I kindly ask you to give me a hint on what to do. Thank you very much!
# import the necessary packages
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from imutils.video import VideoStream
from imutils import paths
import matplotlib.pyplot as plt
import numpy as np
import argparse
import imutils
import time
import cv2
import os
#Creating our face detector
def detect_faces(net, frame, minConfidence=0.5):
# grab the dimensions of the image and then construct a blob
# from it
(h, w) = frame.shape[:2]
blob = cv2.dnn.blobFromImage(frame, 1.0, (300, 300),
(104.0, 177.0, 123.0))
# pass the blob through the network to obtain the face detections,
# then initialize a list to store the predicted bounding boxes
net.setInput(blob)
detections = net.forward()
boxes = []
# loop over the detections
for i in range(0, detections.shape[2]):
# extract the confidence (i.e., probability) associated with
# the detection
confidence = detections[0, 0, i, 2]
# filter out weak detections by ensuring the confidence is
# greater than the minimum confidence
if confidence > minConfidence:
# compute the (x, y)-coordinates of the bounding box for
# the object
box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
(startX, startY, endX, endY) = box.astype("int")
# update our bounding box results list
boxes.append((startX, startY, endX, endY))
# return the face detection bounding boxes
return boxes
#Loading the CALTECH Faces dataset
def load_face_dataset(inputPath, net, minConfidence=0.5,
minSamples=15):
# grab the paths to all images in our input directory, extract
# the name of the person (i.e., class label) from the directory
# structure, and count the number of example images we have per
# face
imagePaths = list(paths.list_images(inputPath))
names = [p.split(os.path.sep)[-2] for p in imagePaths]
(names, counts) = np.unique(names, return_counts=True)
names = names.tolist()
# initialize lists to store our extracted faces and associated
# labels
faces = []
labels = []
# loop over the image paths
for imagePath in imagePaths:
# load the image from disk and extract the name of the person
# from the subdirectory structure
frame = cv2.imread(imagePath)
name = imagePath.split(os.path.sep)[-2]
# only process images that have a sufficient number of
# examples belonging to the class
if counts[names.index(name)] < minSamples:
continue
# perform face detection
boxes = detect_faces(net, frame, minConfidence)
# loop over the bounding boxes
for (startX, startY, endX, endY) in boxes:
# extract the face ROI, resize it, and convert it to
# grayscale
faceROI = frame[startY:endY, startX:endX]
faceROI = cv2.resize(faceROI, (47, 62))
faceROI = cv2.cvtColor(faceROI, cv2.COLOR_BGR2GRAY)
# update our faces and labels lists
faces.append(faceROI)
labels.append(name)
# convert our faces and labels lists to NumPy arrays
faces = np.array(faces)
labels = np.array(labels)
# return a 2-tuple of the faces and labels
return (faces, labels)
#Implementing Local Binary Patterns for face recognition
# # construct the argument parser and parse the arguments
# ap = argparse.ArgumentParser()
# ap.add_argument("-i", "--input", type=str, required=True,
# help="path to input directory of images")
# ap.add_argument("-f", "--face", type=str,
# default="face_detector",
# help="path to face detector model directory")
# ap.add_argument("-c", "--confidence", type=float, default=0.5,
# help="minimum probability to filter weak detections")
# args = vars(ap.parse_args())
# since we are using Jupyter Notebooks we can replace our argument
# parsing code with *hard coded* arguments and values
args = {
"input": "caltech_faces",
"face": "face_detector",
"confidence": 0.5,
}
# load our serialized face detector model from disk
print("[INFO] loading face detector model...")
prototxtPath = os.path.sep.join([args["face"], "deploy.prototxt"])
weightsPath = os.path.sep.join([args["face"],
"res10_300x300_ssd_iter_140000.caffemodel"])
net = cv2.dnn.readNet(prototxtPath, weightsPath)
# load the CALTECH faces dataset
print("[INFO] loading dataset...")
(faces, labels) = load_face_dataset(args["input"], net,
minConfidence=0.5, minSamples=20)
print("[INFO] {} images in dataset".format(len(faces)))
# encode the string labels as integers
le = LabelEncoder()
labels = le.fit_transform(labels)
# construct our training and testing split
(trainX, testX, trainY, testY) = train_test_split(faces,
labels, test_size=0.25, stratify=labels, random_state=42)
# train our LBP face recognizer
print("[INFO] training face recognizer...")
recognizer = cv2.face.LBPHFaceRecognizer_create(
radius=2, neighbors=16, grid_x=8, grid_y=8)
start = time.time()
recognizer.train(trainX, trainY)
end = time.time()
print("[INFO] training took {:.4f} seconds".format(end - start))
# initialize the list of predictions and confidence scores
print("[INFO] gathering predictions...")
predictions = []
confidence = []
start = time.time()
# loop over the test data
for i in range(0, len(testX)):
# classify the face and update the list of predictions and
# confidence scores
(prediction, conf) = recognizer.predict(testX[i])
predictions.append(prediction)
confidence.append(conf)
# measure how long making predictions took
end = time.time()
print("[INFO] inference took {:.4f} seconds".format(end - start))
# show the classification report
print(classification_report(testY, predictions,
target_names=le.classes_))
# initialize the video stream and allow the cammera sensor to warmup
print("[INFO] starting video stream...")
vs = VideoStream(src=0).start()
time.sleep(2.0)
# loop over the frames from the video stream
while True:
# grab the frame from the threaded video stream and resize it
# to have a maximum width of 400 pixels
face = vs.read()
face = imutils.resize(face, width=400)
# loop over the detections
for i in range(0, detections.shape[2]):
# grab the predicted name and actual name
predName = le.inverse_transform([predictions[i]])[0]
actualName = le.classes_[testY[i]]
# draw the predicted name and actual name on the image
cv2.putText(face, "pred: {}".format(predName), (5, 25),
cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 255, 0), 2)
cv2.putText(face, "actual: {}".format(actualName), (5, 60),
cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 0, 255), 2)
# display the predicted name, actual name, and confidence of the
# prediction (i.e., chi-squared distance; the *lower* the distance
# is the *more confident* the prediction is)
print("[INFO] prediction: {}, actual: {}, confidence: {:.2f}".format(predName, actualName, confidence[i]))
# show the output frame
cv2.imshow("Face", face)
key = cv2.waitKey(1) & 0xFF
# if the `q` key was pressed, break from the loop
if key == ord("q"):
break
You have a for loop without any line of code but a comment just before the line that cause the problem:
# loop over the detections
for i in range(0, detections.shape[2]):
# grab the predicted name and actual name
predName = le.inverse_transform([predictions[i]])[0]
actualName = le.classes_[testY[i]]
The problem comes from this empty loop; if you have a loop, you must have at least one line of code inside. So delete it or add the pass keyword inside.
I am using google collab for this and first of all, make sure you have OpenCV installed. You can install it using pip:
pip install opencv-python
Before detecting the face we should have to open the web camera using google collab.
from IPython.display import display, Javascript
from google.colab.output import eval_js
from base64 import b64decode
def take_photo(filename='photo.jpg', quality=0.8):
js = Javascript('''
async function takePhoto(quality) {
const div = document.createElement('div');
const capture = document.createElement('button');
capture.textContent = 'Capture';
div.appendChild(capture);
const video = document.createElement('video');
video.style.display = 'block';
const stream = await navigator.mediaDevices.getUserMedia({video: true});
document.body.appendChild(div);
div.appendChild(video);
video.srcObject = stream;
await video.play();
// Resize the output to fit the video element. google.colab.output.setIframeHeight(document.documentElement.scrollHeight, true);
// Wait for Capture to be clicked.
await new Promise((resolve) => capture.onclick = resolve);
const canvas = document.createElement('canvas');
canvas.width = video.videoWidth;
canvas.height = video.videoHeight;
canvas.getContext('2d').drawImage(video, 0, 0);
stream.getVideoTracks()[0].stop();
div.remove();
return canvas.toDataURL('image/jpeg', quality);
}
''')
display(js)
data = eval_js('takePhoto({})'.format(quality))
binary = b64decode(data.split(',')[1])
with open(filename, 'wb') as f:
f.write(binary)
return filename
You have to run the below code as the second step.
from IPython.display import Image
try:
filename = take_photo()
print('Saved to {}'.format(filename))
# Show the image which was just taken.
display(Image(filename))
except Exception as err:
# Errors will be thrown if the user does not have a webcam or if they do
not
# grant the page permission to access it.
print(str(err))
After running these two codes, the web camera is opened and you can capture a photo.
The photo is saved as photo.jpg.
Face detection using Haar cascades is a machine learning-based approach where a cascade function is trained with a set of input data. OpenCV already contains many pre-trained classifiers for face, eyes, smiles, etc. Today we will be using the face classifier. You can experiment with other classifiers as well.
I am using PyTorch for object detection and refining an existing model (transfer learning) as described in the following link -
https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html
While different transformations are used for image augmentation (horizontal flip in this tutorial), the tutorial doesnt mention anything on transforming the bounding box/annotation to ensure they are in line with the transformed image. Am I missing something basic here?
In the training phase, the transforms are indeed applied on both images and targets, while loading the data. In the PennFudanDataset class, we have these two lines:
if self.transforms is not None:
img, target = self.transforms(img, target)
where target is a dictionary containing:
target = {}
target["boxes"] = boxes
target["labels"] = labels
target["masks"] = masks
target["image_id"] = image_id
target["area"] = area
target["iscrowd"] = iscrowd
self.transforms() in PennFudanDataset class is set to a list of transforms comprising [transforms.ToTensor(), transforms.Compose()], the return value from get_transform() while instantiating the dataset with:
dataset = PennFudanDataset('PennFudanPed', get_transform(train=True))
The transforms transforms.Compose() comes from T, a custom transform written for object detection task. Specifically, in the __call__ of RandomHorizontalFlip(), we process both the image and target (e.g., mask, keypoints):
For the sake of completeness, I borrow the code from the github repo:
def __call__(self, image, target):
if random.random() < self.prob:
height, width = image.shape[-2:]
image = image.flip(-1)
bbox = target["boxes"]
bbox[:, [0, 2]] = width - bbox[:, [2, 0]]
target["boxes"] = bbox
if "masks" in target:
target["masks"] = target["masks"].flip(-1)
if "keypoints" in target:
keypoints = target["keypoints"]
keypoints = _flip_coco_person_keypoints(keypoints, width)
target["keypoints"] = keypoints
return image, target
Here, we can understand how they perform the flipping on the masks and keypoints in accordance with the image.
I made a convolutional neural network, that predicts faces and returns coordinates (y1, x1, y2, x2). Iam able to create rectangle that serves as mask that covers the desired coordinates. I need a way to cover the images in real time. Is there a way to get live image sequence without saving the frames, just overwriting them, and how do i extract the coordinates in openCV? I was using pyplot and was saving the images, it is slow and ineffective.
Yeah, so I managed to come up with a solution, but I found out that, 1 frame takes about 0.54s to compute, so 2FPS, not great for live streaming, so I am switching to haarcascade.
Code below is used to configure and call the model.
from numpy import expand_dims
from mrcnn.config import Config
from mrcnn.model import MaskRCNN
from mrcnn.model import mold_image
import cv2
import time
# define the prediction configuration
class PredictionConfig(Config):
# define the name of the configuration
NAME = "face_cfg"
# number of classes (background + face)
NUM_CLASSES = 1 + 1
# simplify GPU config
GPU_COUNT = 1
IMAGES_PER_GPU = 1
def classify_image(image,model,cfg):
# convert pixel values (e.g. center)
scaled_image = mold_image(image, cfg)
# convert image into one sample
sample = expand_dims(scaled_image, 0)
# make prediction
tic = time.time()
yhat = model.detect(sample, verbose=0)[0]
print(time.time() - tic)
return yhat['rois']
def image_bnd_highlight(image,coordinates):
for box in coordinates:
# get coordinates
y1, x1, y2, x2 = box
# create the shape
new_img = cv2.rectangle(image,(x1,y1),(x2,y2),(255,255,255),5)
return new_img
# create config
cfg: PredictionConfig = PredictionConfig()
# define the model
model = MaskRCNN(mode='inference', model_dir='./', config=cfg)
# load model weights
model_path = 'mask_rcnn_face_cfg_0029.h5'
model.load_weights(model_path, by_name=True)
definitive_model = model
Then I call my functions, that I created above.
import cv2 as cv
import acapture
from RealTime import definitive_model
from RealTime import cfg
from RealTime import classify_image
from RealTime import image_bnd_highlight
import time
# cap = acapture.open(0)
cap = cv.VideoCapture(0)
cap.set(3,128) #set frame width
cap.set(4,128) #set frame height
cap.set(cv.CAP_PROP_FPS, 2) #adjusting fps to 2
# cap.set(cv.CAP_PROP_BUFFERSIZE,3)
# if not cap.isOpened():
# print("Cannot open camera")
# exit()
while True:
# Capture frame-by-frame
ret, frame = cap.read()
# if frame is read correctly ret is True
if not ret:
print("Can't receive frame (stream end?). Exiting ...")
break
# let's resize our image to be 150 pixels wide, but in order to
# prevent our resized image from being skewed/distorted, we must
# first calculate the ratio of the *new* width to the *old* width
r = 150.0 / frame.shape[1]
dim = (150, int(frame.shape[0] * r))
# perform the actual resizing of the image
resized = cv.resize(frame, dim, interpolation=cv.INTER_AREA)
# tic = time.time()
coords = classify_image(resized,definitive_model,cfg)
# print(time.time() - tic)
image = image_bnd_highlight(resized,coords)
# Display the resulting frame
cv.imshow('frame', image)
if cv.waitKey(1) == ord('q'):
break
# When everything done, release the capture
cap.release()
cv.destroyAllWindows()
I am trying to feed my image roi into the Tensorflow classifier I took from here. The idea is to first run a simple filter, get rectangle candidates, and then check (using the network) whether each rectangle(roi) is actually what I am looking for.
class ScrewDetector:
def __init__(self):
self.session = None # an internal variable needed for inception network
# to keep the screw data in
self.screw_data = dict()
# load the labels of the classification: screw / non-screw
self.class_labels = [line.rstrip() for line in tf.gfile.GFile(home + "/imagine_weights/screw_detector/retrained_labels.txt")]
# prepare the network
with tf.gfile.FastGFile(home + "/weights/screw_detector/retrained_graph.pb", 'rb') as f:
graph_def = tf.GraphDef() ## the graph-graph_def is a saved copy of a TensorFlow graph, object initialization
graph_def.ParseFromString(f.read()) # parse serialized protocol buffer data into variable
_ = tf.import_graph_def(graph_def, name='') # import a serialized TensorFlow GraphDef protocol buffer, extract objects in the GraphDef as tf.Tensor
# start the session
with tf.Session() as self.session:
self.softmax_tensor = self.session.graph.get_tensor_by_name('final_result:0')
def detect_screw(self):
# get a copy and resize it
img_raw = self.cv_image.copy()
resized_img = cv2.resize(img_raw, (0,0), fx=RESIZE_FACTOR, fy=RESIZE_FACTOR)
# grayscale it
gray = cv2.cvtColor(resized_img, cv2.COLOR_BGR2GRAY)
# detect circles in the image
circles = cv2.HoughCircles(gray, cv2.HOUGH_GRADIENT, 1, 100, param1=50,param2=35,minRadius=15,maxRadius=30)
# ensure at least some circles were found
if circles is not None:
# convert the (x, y) coordinates and radius of the circles to integers
circles = np.round(circles[0, :]).astype("int")
# get a counter
screw_id = 0
# loop over the (x, y) coordinates and radius of the circles
for (x, y, r) in circles:
# draw the circle in the output image, then draw a rectangle corresponding to the center of the circle
#cv2.circle(resized_img, (x, y), r, (0, 255, 0), 4)
cv2.rectangle(resized_img, (x - r, y - r), (x + r, y + r), (0, 0, 255), 5)
# get the above rectangle as ROI
screw_roi = resized_img[y:y+r, x:x+r]
# feed it into the network
#import IPython; IPython.embed()
predictions = self.session.run(self.softmax_tensor, feed_dict={screw_id: [screw_roi.flatten()]})
# get prediction values in array back
top_k = predictions[0].argsort()[-len(predictions[0]):][::-1]
# output
for node_id in top_k:
human_string = self.class_labels[node_id]
score = predictions[0][node_id]
print('%s (score = %.5f)' % (human_string, score))
# if it is a screw, go on, save its coordinates and append into the network
# remap in the original image
scaled_point = (round(x * (1/RESIZE_FACTOR)), round(y * (1/RESIZE_FACTOR)))
# append to the dict
self.screw_data[scaled_point] = r * RESIZE_FACTOR
# iterate the counter
screw_id += screw_id
# publish the result, which is an image (scaled)
result_image_msg = Image()
try:
result_image_msg = self.bridge.cv2_to_imgmsg(resized_img, "bgr8")
#print(self.screw_data)
except CvBridgeError as e:
print("Could not make it through the cv bridge of death.")
self.result_image_pub.publish(result_image_msg)
else:
print("No detection of circles.")
but I get:
TypeError: Cannot interpret feed_dict key as Tensor: Can not convert a int into a Tensor.
I do know that the variables screw_id and screw_roi are not empty. And I do know that one needs to feed a dictionary in, which is why in the first place I was trying to do that. But I can't get it running for the reason above.
Any thoughts?
EDIT: So normally, this code loads the image and conducts the prediction as follows:
image_data = tf.gfile.FastGFile(image_path, 'rb').read()
softmax_tensor = sess.graph.get_tensor_by_name('final_result:0')
predictions = sess.run(softmax_tensor, {'DecodeJpeg/contents:0': image_data})
All I want is to turn this into a form which operates with the image ROI provided during the operation. It can't be too complicated.
It's not a rocket science, it turns out.
One somehow needs to convert image so that he can pass a string of image bytes, because that's what the function sess.run() expects.
If you don't have a file that you want to load from a file system, then the following is the way:
image_data = cv2.imencode('.jpg', screw_roi)[1].tostring() # pass a string of image bytes
after this, you simply can run:
predictions = self.session.run(self.softmax_tensor, {'DecodeJpeg/contents:0': image_data})
That's it.
feed_dict expect a dictionary with tensors as keys, to populate the placeholders with the specified valued. It's not in your code snippet how does the screw_id is initiated, but I bet it's not a tensor of any kind, hence, your error.