Related
So I have now a running code to detect each components by using template matching, I just used the code that the author has provided in this link (https://www.sicara.fr/blog-technique/object-detection-template-matching).
This code:
import cv2
import numpy as np
DEFAULT_TEMPLATE_MATCHING_THRESHOLD = 0.9
class Template:
"""
A class defining a template
"""
def __init__(self, image_path, label, color, matching_threshold=DEFAULT_TEMPLATE_MATCHING_THRESHOLD):
"""
Args:
image_path (str): path of the template image path
label (str): the label corresponding to the template
color (List[int]): the color associated with the label (to plot detections)
matching_threshold (float): the minimum similarity score to consider an object is detected by template
matching
"""
self.image_path = image_path
self.label = label
self.color = color
self.template = cv2.imread(image_path)
self.template_height, self.template_width = self.template.shape[:2]
self.matching_threshold = matching_threshold
image = cv2.imread("PCB_reference.jpg")
templates = [
Template(image_path="T1.png", label="1", color=(0, 0, 255), matching_threshold=0.88),
Template(image_path="T2.jpg", label="2", color=(0, 255, 0,), matching_threshold=0.8),
Template(image_path="T3.png", label="3", color=(255, 0, 0,), matching_threshold=0.85),
Template(image_path="T4.png", label="4", color=(0, 180, 200,), matching_threshold=0.81),
Template(image_path="T5.png", label="5", color=(110, 180, 200,), matching_threshold=0.91),
Template(image_path="T6.png", label="6", color=(150, 100, 150,), matching_threshold=0.83),
Template(image_path="T7.png", label="7", color=(0, 100, 150,), matching_threshold=0.84),
Template(image_path="T8.png", label="8", color=(110, 100, 200,), matching_threshold=0.96),
]
detections = []
for template in templates:
template_matching = cv2.matchTemplate(template.template, image, cv2.TM_CCOEFF_NORMED)
match_locations = np.where(template_matching >= template.matching_threshold)
for (x, y) in zip(match_locations[1], match_locations[0]):
match = {
"TOP_LEFT_X": x,
"TOP_LEFT_Y": y,
"BOTTOM_RIGHT_X": x + template.template_width,
"BOTTOM_RIGHT_Y": y + template.template_height,
"MATCH_VALUE": template_matching[y, x],
"LABEL": template.label,
"COLOR": template.color
}
detections.append(match)
def compute_iou(boxA, boxB):
xA = max(boxA["TOP_LEFT_X"], boxB["TOP_LEFT_X"])
yA = max(boxA["TOP_LEFT_Y"], boxB["TOP_LEFT_Y"])
xB = min(boxA["BOTTOM_RIGHT_X"], boxB["BOTTOM_RIGHT_X"])
yB = min(boxA["BOTTOM_RIGHT_Y"], boxB["BOTTOM_RIGHT_Y"])
interArea = max(0, xB - xA + 1) * max(0, yB - yA + 1)
boxAArea = (boxA["BOTTOM_RIGHT_X"] - boxA["TOP_LEFT_X"] + 1) * (boxA["BOTTOM_RIGHT_Y"] - boxA["TOP_LEFT_Y"] + 1)
boxBArea = (boxB["BOTTOM_RIGHT_X"] - boxB["TOP_LEFT_X"] + 1) * (boxB["BOTTOM_RIGHT_Y"] - boxB["TOP_LEFT_Y"] + 1)
iou = interArea / float(boxAArea + boxBArea - interArea)
return iou
def non_max_suppression(objects, non_max_suppression_threshold=0.5, score_key="MATCH_VALUE"):
"""
Filter objects overlapping with IoU over threshold by keeping only the one with maximum score.
Args:
objects (List[dict]): a list of objects dictionaries, with:
{score_key} (float): the object score
{top_left_x} (float): the top-left x-axis coordinate of the object bounding box
{top_left_y} (float): the top-left y-axis coordinate of the object bounding box
{bottom_right_x} (float): the bottom-right x-axis coordinate of the object bounding box
{bottom_right_y} (float): the bottom-right y-axis coordinate of the object bounding box
non_max_suppression_threshold (float): the minimum IoU value used to filter overlapping boxes when
conducting non-max suppression.
score_key (str): score key in objects dicts
Returns:
List[dict]: the filtered list of dictionaries.
"""
sorted_objects = sorted(objects, key=lambda obj: obj[score_key], reverse=True)
filtered_objects = []
for object_ in sorted_objects:
overlap_found = False
for filtered_object in filtered_objects:
iou = compute_iou(object_, filtered_object)
if iou > non_max_suppression_threshold:
overlap_found = True
break
if not overlap_found:
filtered_objects.append(object_)
return filtered_objects
NMS_THRESHOLD = 0.2
detections = non_max_suppression(detections, non_max_suppression_threshold=NMS_THRESHOLD)
image_with_detections = image.copy()
for detection in detections:
cv2.rectangle(
image_with_detections,
(detection["TOP_LEFT_X"], detection["TOP_LEFT_Y"]),
(detection["BOTTOM_RIGHT_X"], detection["BOTTOM_RIGHT_Y"]),
detection["COLOR"],
2,)
cv2.imshow("res", image_with_detections)
cv2.waitKey(0)
cv2.destroyAllWindows()
This is now my output where it detects the 8 components that I have as a template:
Current code output
I wanted to make an algorithm where I could detect a missing component. For example, supposedly there are four (4) A3 resistors in the reference image but what if my input reading (real-time camera feed) has only three A3 resistors the same goes to the missing one resistor (3220). I wanted to ask what is the best approach/method to use to make this work.
Example this is my desired output:
Desired output
Should I save the position of each template where they are supposed to be placed and make a database containing the template and their specific location? This is to tell that in this location (e.g x= 20. y = 50) a resistor should be present. I am thinking of having csv file but I don't think it's possible or efficient to store the templates image directory and also the position of each template.
I have read that SSIM could be use for the missing components detection and there are also some that uses deep-learning like YOLO for neural networks but I think that is too complex for me to begin with.
I have image with with many cars, every car has coordinates of polygon and keypoints. I use this code to crop object by polygon and get new keypoints.
x,y,w,h = cv2.boundingRect(points_poly_int)
cropped_img = img[y:y+h,x:x+w]
head_coords_after_crop = np.asarray([head_coords_old[0] - x, head_coords_old[1] -y])
center_coords_after_crop = np.asarray([center_coords_old[0] - x, center_coords_old[1] -y])
Here example of cropped image and keypoints:
What I need is rotate the whole image by any angle and remap coordinates of polygons and keypoints for every object
Here method which return rotated image and matrix of transformation:
def rotate_image(mat, angle):
"""
Rotates an image (angle in degrees) and expands image to avoid cropping
"""
height, width = mat.shape[:2] # image shape has 3 dimensions
image_center = (width/2, height/2) # getRotationMatrix2D needs coordinates in reverse order (width, height) compared to shape
rotation_mat = cv2.getRotationMatrix2D(image_center, angle, 1.)
# rotation calculates the cos and sin, taking absolutes of those.
abs_cos = abs(rotation_mat[0,0])
abs_sin = abs(rotation_mat[0,1])
# find the new width and height bounds
bound_w = int(height * abs_sin + width * abs_cos)
bound_h = int(height * abs_cos + width * abs_sin)
# subtract old image center (bringing image back to origo) and adding the new image center coordinates
rotation_mat[0, 2] += bound_w/2 - image_center[0]
rotation_mat[1, 2] += bound_h/2 - image_center[1]
# rotate image with the new bounds and translated rotation matrix
rotated_mat = cv2.warpAffine(mat, rotation_mat, (bound_w, bound_h))
return rotated_mat, rotation_mat
What I do next is multiplying old coordinates with matrix of transformation. Here code:
img_roated, C = rotate_image(img, 180)
#Remap polygons coordinates
ones = np.ones((points_poly.shape[0], 1))
new_poly = np.hstack((points_poly,ones))
new_poly = (C # new_poly.T).T
new_poly = new_poly.astype(np.int32)
#Crop by new polygons
x,y,w,h = cv2.boundingRect(new_poly)
cropped_img = img_roated[y:y+h,x:x+w]
#Reamp keypoints coordinates
head_coords_new = np.asarray([756.600, 1687.900, 1])
center_coords_new = np.asarray([762.300, 1708.400, 1])
head_coords_new = (C # head_coords_new.T).T
center_coords_new = (C # center_coords_new.T).T
head_coords_new = np.asarray([head_coords_old[0] - x, head_coords_old[1] - y])
center_coords_new = np.asarray([center_coords_old[0] - x, center_coords_old[1] - y])
head_coords_new = head_coords_new.astype(np.int32)
center_coords_new = center_coords_new.astype(np.int32)
But result is differnt from first picture, Here new picture:
Somehow keypoints shift, and it happens with every angle. And I don't know how to fix it.
Here the source image: https://drive.google.com/file/d/14K_MQHMwtWlw-QCQbaB5ecrREbWwyKhO/view?usp=sharing
And polygons with keypoints:
{'keypoints': [{'id': 'head', 'pos': '756.600;1687.900'},
{'id': 'roof_center', 'pos': '762.300;1708.400'}],
'polygon': '{(759.700;1717.300);(770.000;1714.200);(762.000;1687.400);(756.600;1687.900);(751.200;1690.700);(759.700;1717.300)}'}
If you wish to reproduce the issue.
Thanks in advnced
Here the differnce. Right pic is first image rotated in pic viewer. Left is transformed pic
help meee TT i received error in my coding of social distancing detection system using webcam. i done search the error but there is nothing difference with my code TT i wite my coding using notepad++ and run using command prompt. below is my error :
C:\Users\User\Downloads\Social_Distancing_Detection_Real_Time>python Run.py
[INFO] loading YOLO from disk...
[INFO] setting preferable backend and target to CUDA...
[INFO] accessing video stream...
[ WARN:0] global D:\a\opencv-python\opencv-python\opencv\modules\dnn\src\dnn.cpp (1447) cv::dnn::dnn4_v20211004::Net::Impl::setUpNet DNN module was not built with CUDA backend; switching to CPU
Traceback (most recent call last):
File "C:\Users\User\Downloads\Social_Distancing_Detection_Real_Time\Run.py", line 77, in <module>
results = detect_people(frame, net, ln,
File "C:\Users\User\Downloads\Social_Distancing_Detection_Real_Time\mylib\detection.py", line 58, in detect_people
idxs = cv2.dnn.NMSBoxes(boxes, confidence, MIN_CORP, NMS_THRESH)
TypeError: Can't parse 'scores'. Input argument doesn't provide sequence protocol
[ WARN:1] global D:\a\opencv-python\opencv-python\opencv\modules\videoio\src\cap_msmf.cpp (438) `anonymous-namespace'::SourceReaderCB::~SourceReaderCB terminating async callback
my error
below here is my full code of file detection.py
#import the necessary packages
from .config import NMS_THRESH, MIN_CORP, People_Counter
import numpy as np
import cv2
def detect_people(frame, net, In, personIdx = 0):
#grab the dimensions of the frame and initialize the list of results
(H, W) = frame.shape[:2]
results = []
#construct a blob from the input frame and then perform a forward
#pass of the YOLO object detector, giving us our boarding boxes
#add associated probabilities
blob = cv2.dnn.blobFromImage(frame, 1 / 255.0, (416, 416),
swapRB=True, crop=False)
net.setInput(blob)
layerOutputs = net.forward(In)
#initialize out lists of detected bounding boxes, centroids and
#confidence, respectively
boxes = []
centroids = []
confidences = []
#loop over each of the layer outputs
for output in layerOutputs:
#for detection in output;
for detection in output:
#extract the class ID and confidence[i.e., probability)
#of the current object detection
scores = detection[5:]
classID = np.argmax(scores)
confidence = scores[classID]
#filter detections by (1) ensuring that the object
#detected was a person and (2) that the minimum
#confidence is met
if classID == personIdx and confidence > MIN_CORP:
#scale the bounding box coordinates back relative to
#the size of the image, keeping in mind that YOLO
#actually returns the center (x,y) -coordinates of
#the bounding box followed by the boxes' width and height
box = detection[0:4] * np.array([W, H, W, H])
(centerX, centerY, width, height) = box.astype("int")
#use the center (x,y) -coordinates to derive the top
#and left corner of the bounding box
x = int(centerX - (width / 2))
y = int(centerY - (height / 2))
#update our list of bounding box coordinates,
#centroids and confidences
boxes.append([x, y, int(width), int(height)])
centroids.append((centerX, centerY))
confidences.append(float(confidence))
#apply non-maxim suppression to suppress weak, overlapping bounding boxes
idxs = cv2.dnn.NMSBoxes(boxes, confidence, MIN_CORP, NMS_THRESH)
#print('Total people count:', len(idxs))
#compute the total people counter
#if People_Counter:
#human_count = "Human count: {}".format(len(idxs))
#cv2.putText(frame, human_count, (470, frame.shape[0] - 75), cv2.FONT_HERSHEY_SIMPLEX, 0.70, (0, 0, 0), 2)
#ensure at least one detection exists
if len(idxs) > 0:
#loop over the indexes we are keeping
for i in idxs.flatten():
#extract the bounding box coordinates
(x, y) = (boxes[i][0], boxes[i][1])
(w, h) = (boxes[i][2], boxes[i][3])
#update our results list to consist of the person
#prediction probability, bounding box coordinates,
#and the centroids
r = (confidences[i], (x, y, x + w, y + h), centroids[i])
results.append(r)
#return the list of the results
return results
The answer to your problem (as usually) likes in response from the interpreter:
TypeError: Can't parse 'scores'. Input argument doesn't provide sequence protocol
scores is the second argument to cv2.dnn.NMSBoxes which in your case is confidence. confidence is a single number, you can't iterate over it. You've made a typo and probably you wanted to pass confidences which is a list.
Change your code to:
idxs = cv2.dnn.NMSBoxes(boxes, confidences, MIN_CORP, NMS_THRESH)
I am trying to detect some objects with Tensorflow. However, I only want to detect objects in a specific area in the main image. For example:
Here my example area:
The large red framed area is my main image. But I only want to detect objects which in the "content" field.
enter image description here
Actually, I developed a method like this. I find the center point of the detected object and check if the center point is within this area. If the center of the object is within the specified region, tensorflow takes the detected object into a rectangle. This method works when I'm going to detect a single object. However, I am faced with the following situation: For example, one object is inside the desired region and another object is outside the desired region. Tensorflow actually detects these two objects and adds them to a list. When Tensorflow wants to show the object in the desired region, it necessarily frames all the detected objects in the list. In short, when a single object is detected, objects outside the marked area are also detected and added to the list, but they are ignored and a rectangle is not drawn. However, if an object is detected within the marked area, this time a rectangle is drawn on the object outside the marked area. How do I prevent this?
There is a part for draw boxes:
vis_util.visualize_boxes_and_labels_on_image_array(
image_np,
np.squeeze(boxes),
np.squeeze(classes).astype(np.int32),
np.squeeze(scores),
category_index,
use_normalized_coordinates=True,
line_thickness=3,
min_score_thresh=0.2)
In this part, I actually send all detected objects to the function because I don't know which object is the correct one in the list.
And here the visualize_boxes_and_labels_on_image_array function:
def visualize_boxes_and_labels_on_image_array(
image,
boxes,
classes,
scores,
category_index,
instance_masks=None,
instance_boundaries=None,
keypoints=None,
use_normalized_coordinates=False,
max_boxes_to_draw=2,
min_score_thresh=.5,
agnostic_mode=False,
line_thickness=4,
groundtruth_box_visualization_color='black',
skip_scores=False,
skip_labels=False):
"""Overlay labeled boxes on an image with formatted scores and label names.
This function groups boxes that correspond to the same location
and creates a display string for each detection and overlays these
on the image. Note that this function modifies the image in place, and returns
that same image.
Args:
image: uint8 numpy array with shape (img_height, img_width, 3)
boxes: a numpy array of shape [N, 4]
classes: a numpy array of shape [N]. Note that class indices are 1-based,
and match the keys in the label map.
scores: a numpy array of shape [N] or None. If scores=None, then
this function assumes that the boxes to be plotted are groundtruth
boxes and plot all boxes as black with no classes or scores.
category_index: a dict containing category dictionaries (each holding
category index `id` and category name `name`) keyed by category indices.
instance_masks: a numpy array of shape [N, image_height, image_width] with
values ranging between 0 and 1, can be None.
instance_boundaries: a numpy array of shape [N, image_height, image_width]
with values ranging between 0 and 1, can be None.
keypoints: a numpy array of shape [N, num_keypoints, 2], can
be None
use_normalized_coordinates: whether boxes is to be interpreted as
normalized coordinates or not.
max_boxes_to_draw: maximum number of boxes to visualize. If None, draw
all boxes.
min_score_thresh: minimum score threshold for a box to be visualized
agnostic_mode: boolean (default: False) controlling whether to evaluate in
class-agnostic mode or not. This mode will display scores but ignore
classes.
line_thickness: integer (default: 4) controlling line width of the boxes.
groundtruth_box_visualization_color: box color for visualizing groundtruth
boxes
skip_scores: whether to skip score when drawing a single detection
skip_labels: whether to skip label when drawing a single detection
Returns:
uint8 numpy array with shape (img_height, img_width, 3) with overlaid boxes.
"""
# Create a display string (and color) for every box location, group any boxes
# that correspond to the same location.
box_to_display_str_map = collections.defaultdict(list)
box_to_color_map = collections.defaultdict(str)
box_to_instance_masks_map = {}
box_to_instance_boundaries_map = {}
box_to_keypoints_map = collections.defaultdict(list)
if not max_boxes_to_draw:
max_boxes_to_draw = boxes.shape[0]
for i in range(min(max_boxes_to_draw, boxes.shape[0])):
if scores is None or scores[i] > min_score_thresh:
box = tuple(boxes[i].tolist())
if instance_masks is not None:
box_to_instance_masks_map[box] = instance_masks[i]
if instance_boundaries is not None:
box_to_instance_boundaries_map[box] = instance_boundaries[i]
if keypoints is not None:
box_to_keypoints_map[box].extend(keypoints[i])
if scores is None:
box_to_color_map[box] = groundtruth_box_visualization_color
else:
display_str = ''
if not skip_labels:
if not agnostic_mode:
if classes[i] in category_index.keys():
class_name = category_index[classes[i]]['name']
else:
class_name = 'N/A'
display_str = str(class_name)
if not skip_scores:
if not display_str:
display_str = '{}%'.format(int(100*scores[i]))
else:
display_str = '{}: {}%'.format(display_str, int(100*scores[i]))
box_to_display_str_map[box].append(display_str)
if agnostic_mode:
box_to_color_map[box] = 'DarkOrange'
else:
box_to_color_map[box] = STANDARD_COLORS[
classes[i] % len(STANDARD_COLORS)]
# Draw all boxes onto image.
for box, color in box_to_color_map.items():
ymin, xmin, ymax, xmax = box
if instance_masks is not None:
draw_mask_on_image_array(
image,
box_to_instance_masks_map[box],
color=color
)
if instance_boundaries is not None:
draw_mask_on_image_array(
image,
box_to_instance_boundaries_map[box],
color='red',
alpha=1.0
)
draw_bounding_box_on_image_array(
image,
ymin,
xmin,
ymax,
xmax,
color=color,
thickness=line_thickness,
display_str_list=box_to_display_str_map[box],
use_normalized_coordinates=use_normalized_coordinates)
if keypoints is not None:
draw_keypoints_on_image_array(
image,
box_to_keypoints_map[box],
color=color,
radius=line_thickness / 2,
use_normalized_coordinates=use_normalized_coordinates)
return image
In the above function the name "draw_bounding_box_on_image_array" calls another function.
def draw_bounding_box_on_image_array(image,
ymin,
xmin,
ymax,
xmax,
color='red',
thickness=4,
display_str_list=(),
use_normalized_coordinates=True):
"""Adds a bounding box to an image (numpy array).
Bounding box coordinates can be specified in either absolute (pixel) or
normalized coordinates by setting the use_normalized_coordinates argument.
Args:
image: a numpy array with shape [height, width, 3].
ymin: ymin of bounding box.
xmin: xmin of bounding box.
ymax: ymax of bounding box.
xmax: xmax of bounding box.
color: color to draw bounding box. Default is red.
thickness: line thickness. Default value is 4.
display_str_list: list of strings to display in box
(each to be shown on its own line).
use_normalized_coordinates: If True (default), treat coordinates
ymin, xmin, ymax, xmax as relative to the image. Otherwise treat
coordinates as absolute.
"""
image_pil = Image.fromarray(np.uint8(image)).convert('RGB')
draw_bounding_box_on_image(image_pil, ymin, xmin, ymax, xmax, color,
thickness, display_str_list,
use_normalized_coordinates)
np.copyto(image, np.array(image_pil))
Finally, this function calls another function called "draw_bounding_box_on_image".
def draw_bounding_box_on_image(image,
ymin,
xmin,
ymax,
xmax,
color='red',
thickness=4,
display_str_list=(),
use_normalized_coordinates=True):
"""Adds a bounding box to an image.
Bounding box coordinates can be specified in either absolute (pixel) or
normalized coordinates by setting the use_normalized_coordinates argument.
Each string in display_str_list is displayed on a separate line above the
bounding box in black text on a rectangle filled with the input 'color'.
If the top of the bounding box extends to the edge of the image, the strings
are displayed below the bounding box.
Args:
image: a PIL.Image object.
ymin: ymin of bounding box.
xmin: xmin of bounding box.
ymax: ymax of bounding box.
xmax: xmax of bounding box.
color: color to draw bounding box. Default is red.
thickness: line thickness. Default value is 4.
display_str_list: list of strings to display in box
(each to be shown on its own line).
use_normalized_coordinates: If True (default), treat coordinates
ymin, xmin, ymax, xmax as relative to the image. Otherwise treat
coordinates as absolute.
"""
draw = ImageDraw.Draw(image)
im_width, im_height = image.size
if use_normalized_coordinates:
(left, right, top, bottom) = (xmin * im_width, xmax * im_width,
ymin * im_height, ymax * im_height)
else:
(left, right, top, bottom) = (xmin, xmax, ymin, ymax)
draw.line([(left, top), (left, bottom), (right, bottom),
(right, top), (left, top)], width=thickness, fill=color)
try:
font = ImageFont.truetype('arial.ttf', 24)
except IOError:
font = ImageFont.load_default()
# If the total height of the display strings added to the top of the bounding
# box exceeds the top of the image, stack the strings below the bounding box
# instead of above.
display_str_heights = [font.getsize(ds)[1] for ds in display_str_list]
# Each display_str has a top and bottom margin of 0.05x.
total_display_str_height = (1 + 2 * 0.05) * sum(display_str_heights)
if top > total_display_str_height:
text_bottom = top
else:
text_bottom = bottom + total_display_str_height
# Reverse list and print from bottom to top.
for display_str in display_str_list[::-1]:
text_width, text_height = font.getsize(display_str)
margin = np.ceil(0.05 * text_height)
draw.rectangle(
[(left, text_bottom - text_height - 2 * margin), (left + text_width,
text_bottom)],
fill=color)
draw.text(
(left + margin, text_bottom - text_height - margin),
display_str,
fill='black',
font=font)
text_bottom -= text_height - 2 * margin
As far as I understand, this function draws a rectangle using the coordinates of the detected object. Can you please help with this issue?
I'm trying to understand how to find the location of the bounding box when an object is detected. I used the Tensorflow Object Detection API to detect a mouse in a box. Just for testing purposes of how to retrieve the bounding box coordinates, when the mouse is detected, I want to print "THIS IS A MOUSE" right above its head. However, mine currently prints several inches off-kilter. For example, here is a screenshot from a video of my object detection.
Here is the relevant code snippet:
with detection_graph.as_default():
with tf.Session(graph=detection_graph) as sess:
start = time.time()
while True:
# Read frame from camera
ret, image_np = cap.read()
cv2.putText(image_np, "Time Elapsed: {}s".format(int(time.time() - start)), (50,50),cv2.FONT_HERSHEY_PLAIN,3, (0,0,255),3)
# Expand dimensions since the model expects images to have shape: [1, None, None, 3]
image_np_expanded = np.expand_dims(image_np, axis=0)
# Extract image tensor
image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
# Extract detection boxes
boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
# Extract detection scores
scores = detection_graph.get_tensor_by_name('detection_scores:0')
# Extract detection classes
classes = detection_graph.get_tensor_by_name('detection_classes:0')
# Extract number of detectionsd
num_detections = detection_graph.get_tensor_by_name(
'num_detections:0')
# Actual detection.
(boxes, scores, classes, num_detections) = sess.run(
[boxes, scores, classes, num_detections],
feed_dict={image_tensor: image_np_expanded})
# Visualization of the results of a detection.
vis_util.visualize_boxes_and_labels_on_image_array(
image_np,
np.squeeze(boxes),
np.squeeze(classes).astype(np.int32),
np.squeeze(scores),
category_index,
use_normalized_coordinates=True,
line_thickness=8)
for i, b in enumerate(boxes[0]):
if classes[0][i] == 1:
if scores[0][i] >= .5:
mid_x = (boxes[0][i][3] + boxes[0][i][1]) / 2
mid_y = (boxes[0][i][2] + boxes[0][i][0]) / 2
cv2.putText(image_np, 'FOUND A MOUSE', (int(mid_x*600), int(mid_y*800)), cv2.FONT_HERSHEY_PLAIN, 2, (0,255,0), 3)
# Display output
cv2.imshow(vid_name, cv2.resize(image_np, (800, 600)))
#Write to output
video_writer.write(image_np)
if cv2.waitKey(25) & 0xFF == ord('q'):
cv2.destroyAllWindows()
break
cap.release()
cv2.destroyAllWindows()
It's not really clear to me how boxes works. Can someone explain this line to me: mid_x = (boxes[0][i][3] + boxes[0][i][1]) / 2? I understand that the 3 and 1 indices represent x_min, x_max, but I'm not sure why I'm iterating through boxes[0] only and what i represents.
Solution Just as ievbu suggested, I needed to convert the midpoint calculation from its normalized values to values for the frame. I found a cv2 function that returns the width and height and used those values to convert my midpoint to pixel location.
frame_h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
frame_w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
...
cv2.putText(image_np, '.', (int(mid_x*frame_w), int(mid_y*frame_h)), cv2.FONT_HERSHEY_PLAIN, 2, (0,255,0), 3)
Boxes are returned in higher dimension, because you can give multiple images and then that dimension would represent every separate image (for one input image you expand dimension with np.expand_dims). You can see that for visualization it is removed using np.squeeze and you can remove it manually just by taking boxes[0] if you process only 1 image. i represents index of box in boxes array, you need that index to access class and score of the box that you analyze.
The text is not in correct position because returned boxes coordinates are normalized and you have to convert them to match full image size. Here is example how you can convert them:
(im_width, im_height, _) = frame.shape
xmin, ymin, xmax, ymax = box
(xmin, xmax, ymin, ymax) = (xmin * im_width, xmax * im_width,
ymin * im_height, ymax * im_height)