How can I detect object which in specific area with tensorflow? - python

I am trying to detect some objects with Tensorflow. However, I only want to detect objects in a specific area in the main image. For example:
Here my example area:
The large red framed area is my main image. But I only want to detect objects which in the "content" field.
enter image description here
Actually, I developed a method like this. I find the center point of the detected object and check if the center point is within this area. If the center of the object is within the specified region, tensorflow takes the detected object into a rectangle. This method works when I'm going to detect a single object. However, I am faced with the following situation: For example, one object is inside the desired region and another object is outside the desired region. Tensorflow actually detects these two objects and adds them to a list. When Tensorflow wants to show the object in the desired region, it necessarily frames all the detected objects in the list. In short, when a single object is detected, objects outside the marked area are also detected and added to the list, but they are ignored and a rectangle is not drawn. However, if an object is detected within the marked area, this time a rectangle is drawn on the object outside the marked area. How do I prevent this?
There is a part for draw boxes:
vis_util.visualize_boxes_and_labels_on_image_array(
image_np,
np.squeeze(boxes),
np.squeeze(classes).astype(np.int32),
np.squeeze(scores),
category_index,
use_normalized_coordinates=True,
line_thickness=3,
min_score_thresh=0.2)
In this part, I actually send all detected objects to the function because I don't know which object is the correct one in the list.
And here the visualize_boxes_and_labels_on_image_array function:
def visualize_boxes_and_labels_on_image_array(
image,
boxes,
classes,
scores,
category_index,
instance_masks=None,
instance_boundaries=None,
keypoints=None,
use_normalized_coordinates=False,
max_boxes_to_draw=2,
min_score_thresh=.5,
agnostic_mode=False,
line_thickness=4,
groundtruth_box_visualization_color='black',
skip_scores=False,
skip_labels=False):
"""Overlay labeled boxes on an image with formatted scores and label names.
This function groups boxes that correspond to the same location
and creates a display string for each detection and overlays these
on the image. Note that this function modifies the image in place, and returns
that same image.
Args:
image: uint8 numpy array with shape (img_height, img_width, 3)
boxes: a numpy array of shape [N, 4]
classes: a numpy array of shape [N]. Note that class indices are 1-based,
and match the keys in the label map.
scores: a numpy array of shape [N] or None. If scores=None, then
this function assumes that the boxes to be plotted are groundtruth
boxes and plot all boxes as black with no classes or scores.
category_index: a dict containing category dictionaries (each holding
category index `id` and category name `name`) keyed by category indices.
instance_masks: a numpy array of shape [N, image_height, image_width] with
values ranging between 0 and 1, can be None.
instance_boundaries: a numpy array of shape [N, image_height, image_width]
with values ranging between 0 and 1, can be None.
keypoints: a numpy array of shape [N, num_keypoints, 2], can
be None
use_normalized_coordinates: whether boxes is to be interpreted as
normalized coordinates or not.
max_boxes_to_draw: maximum number of boxes to visualize. If None, draw
all boxes.
min_score_thresh: minimum score threshold for a box to be visualized
agnostic_mode: boolean (default: False) controlling whether to evaluate in
class-agnostic mode or not. This mode will display scores but ignore
classes.
line_thickness: integer (default: 4) controlling line width of the boxes.
groundtruth_box_visualization_color: box color for visualizing groundtruth
boxes
skip_scores: whether to skip score when drawing a single detection
skip_labels: whether to skip label when drawing a single detection
Returns:
uint8 numpy array with shape (img_height, img_width, 3) with overlaid boxes.
"""
# Create a display string (and color) for every box location, group any boxes
# that correspond to the same location.
box_to_display_str_map = collections.defaultdict(list)
box_to_color_map = collections.defaultdict(str)
box_to_instance_masks_map = {}
box_to_instance_boundaries_map = {}
box_to_keypoints_map = collections.defaultdict(list)
if not max_boxes_to_draw:
max_boxes_to_draw = boxes.shape[0]
for i in range(min(max_boxes_to_draw, boxes.shape[0])):
if scores is None or scores[i] > min_score_thresh:
box = tuple(boxes[i].tolist())
if instance_masks is not None:
box_to_instance_masks_map[box] = instance_masks[i]
if instance_boundaries is not None:
box_to_instance_boundaries_map[box] = instance_boundaries[i]
if keypoints is not None:
box_to_keypoints_map[box].extend(keypoints[i])
if scores is None:
box_to_color_map[box] = groundtruth_box_visualization_color
else:
display_str = ''
if not skip_labels:
if not agnostic_mode:
if classes[i] in category_index.keys():
class_name = category_index[classes[i]]['name']
else:
class_name = 'N/A'
display_str = str(class_name)
if not skip_scores:
if not display_str:
display_str = '{}%'.format(int(100*scores[i]))
else:
display_str = '{}: {}%'.format(display_str, int(100*scores[i]))
box_to_display_str_map[box].append(display_str)
if agnostic_mode:
box_to_color_map[box] = 'DarkOrange'
else:
box_to_color_map[box] = STANDARD_COLORS[
classes[i] % len(STANDARD_COLORS)]
# Draw all boxes onto image.
for box, color in box_to_color_map.items():
ymin, xmin, ymax, xmax = box
if instance_masks is not None:
draw_mask_on_image_array(
image,
box_to_instance_masks_map[box],
color=color
)
if instance_boundaries is not None:
draw_mask_on_image_array(
image,
box_to_instance_boundaries_map[box],
color='red',
alpha=1.0
)
draw_bounding_box_on_image_array(
image,
ymin,
xmin,
ymax,
xmax,
color=color,
thickness=line_thickness,
display_str_list=box_to_display_str_map[box],
use_normalized_coordinates=use_normalized_coordinates)
if keypoints is not None:
draw_keypoints_on_image_array(
image,
box_to_keypoints_map[box],
color=color,
radius=line_thickness / 2,
use_normalized_coordinates=use_normalized_coordinates)
return image
In the above function the name "draw_bounding_box_on_image_array" calls another function.
def draw_bounding_box_on_image_array(image,
ymin,
xmin,
ymax,
xmax,
color='red',
thickness=4,
display_str_list=(),
use_normalized_coordinates=True):
"""Adds a bounding box to an image (numpy array).
Bounding box coordinates can be specified in either absolute (pixel) or
normalized coordinates by setting the use_normalized_coordinates argument.
Args:
image: a numpy array with shape [height, width, 3].
ymin: ymin of bounding box.
xmin: xmin of bounding box.
ymax: ymax of bounding box.
xmax: xmax of bounding box.
color: color to draw bounding box. Default is red.
thickness: line thickness. Default value is 4.
display_str_list: list of strings to display in box
(each to be shown on its own line).
use_normalized_coordinates: If True (default), treat coordinates
ymin, xmin, ymax, xmax as relative to the image. Otherwise treat
coordinates as absolute.
"""
image_pil = Image.fromarray(np.uint8(image)).convert('RGB')
draw_bounding_box_on_image(image_pil, ymin, xmin, ymax, xmax, color,
thickness, display_str_list,
use_normalized_coordinates)
np.copyto(image, np.array(image_pil))
Finally, this function calls another function called "draw_bounding_box_on_image".
def draw_bounding_box_on_image(image,
ymin,
xmin,
ymax,
xmax,
color='red',
thickness=4,
display_str_list=(),
use_normalized_coordinates=True):
"""Adds a bounding box to an image.
Bounding box coordinates can be specified in either absolute (pixel) or
normalized coordinates by setting the use_normalized_coordinates argument.
Each string in display_str_list is displayed on a separate line above the
bounding box in black text on a rectangle filled with the input 'color'.
If the top of the bounding box extends to the edge of the image, the strings
are displayed below the bounding box.
Args:
image: a PIL.Image object.
ymin: ymin of bounding box.
xmin: xmin of bounding box.
ymax: ymax of bounding box.
xmax: xmax of bounding box.
color: color to draw bounding box. Default is red.
thickness: line thickness. Default value is 4.
display_str_list: list of strings to display in box
(each to be shown on its own line).
use_normalized_coordinates: If True (default), treat coordinates
ymin, xmin, ymax, xmax as relative to the image. Otherwise treat
coordinates as absolute.
"""
draw = ImageDraw.Draw(image)
im_width, im_height = image.size
if use_normalized_coordinates:
(left, right, top, bottom) = (xmin * im_width, xmax * im_width,
ymin * im_height, ymax * im_height)
else:
(left, right, top, bottom) = (xmin, xmax, ymin, ymax)
draw.line([(left, top), (left, bottom), (right, bottom),
(right, top), (left, top)], width=thickness, fill=color)
try:
font = ImageFont.truetype('arial.ttf', 24)
except IOError:
font = ImageFont.load_default()
# If the total height of the display strings added to the top of the bounding
# box exceeds the top of the image, stack the strings below the bounding box
# instead of above.
display_str_heights = [font.getsize(ds)[1] for ds in display_str_list]
# Each display_str has a top and bottom margin of 0.05x.
total_display_str_height = (1 + 2 * 0.05) * sum(display_str_heights)
if top > total_display_str_height:
text_bottom = top
else:
text_bottom = bottom + total_display_str_height
# Reverse list and print from bottom to top.
for display_str in display_str_list[::-1]:
text_width, text_height = font.getsize(display_str)
margin = np.ceil(0.05 * text_height)
draw.rectangle(
[(left, text_bottom - text_height - 2 * margin), (left + text_width,
text_bottom)],
fill=color)
draw.text(
(left + margin, text_bottom - text_height - margin),
display_str,
fill='black',
font=font)
text_bottom -= text_height - 2 * margin
As far as I understand, this function draws a rectangle using the coordinates of the detected object. Can you please help with this issue?

Related

How to find the xmin, xmax, ymin, ymax of a mask

I have a mask drawn over an apple using segmentation. The mask layer has 1's where the pixel is part of the apple and 0's everywhere else. How do i find the extreme pixels in the mask to find the bounding box coordinates around this mask? I am using pytorch and yolact edge to perform the segmentation as shown in Yolact
Relevant stackoverflow answer with nice explanation.
TL;DR
Proposed code snippets (second is faster):
def bbox1(img):
a = np.where(img != 0)
bbox = np.min(a[0]), np.max(a[0]), np.min(a[1]), np.max(a[1])
return bbox
def bbox2(img):
rows = np.any(img, axis=1)
cols = np.any(img, axis=0)
rmin, rmax = np.where(rows)[0][[0, -1]]
cmin, cmax = np.where(cols)[0][[0, -1]]
return rmin, rmax, cmin, cmax
But in more general case (e.g. if you have more than one "instance" on image and each mask is separated from others) it may be worth to consider using OpenCV.
Specifically cv2.connectedComponentsWithStats.
Some brilliant description of this function can be found in another relevant answer.
num_labels, labels, stats, centroids = cv2.connectedComponentsWithStats(mask)
Labels is a matrix the size of the input image where each element has a value equal to its label.
Stats is a matrix of the stats that the function calculates. It has a length equal to the number of labels and a width equal to the
number of stats. It can be used with the OpenCV documentation for it:
Statistics output for each label, including the background label, see
below for available statistics. Statistics are accessed via
stats[label, COLUMN] where available columns are defined below.
cv2.CC_STAT_LEFT The leftmost (x) coordinate which is the inclusive start of the bounding box in the horizontal direction.
cv2.CC_STAT_TOP The topmost (y) coordinate which is the inclusive start of the bounding box in the vertical direction.
cv2.CC_STAT_WIDTH The horizontal size of the bounding box
cv2.CC_STAT_HEIGHT The vertical size of the bounding box
cv2.CC_STAT_AREA The total area (in pixels) of the connected component
Centroids is a matrix with the x and y locations of each centroid. The row in this matrix corresponds to the label number.
So, basically each item in stats (first 4 values) determine the bounding box of each connected component (instance) in mask.
Possible function that you can use to return just bounding boxes:
def get_bounding_boxes(mask, min_size=None):
num_components, labeled_image, bboxes, centroids = cv2.connectedComponentsWithStats(image)
# return bboxes in cv2 format [x, y, w, h] without background bbox and component size
return bboxes[1:, :-1]
# (x, y, x+w, y+h) are 4 points that you are looking for
And of course in case of one instance this approach still works.

How to make tensorflow bounding boxes to show up every "n" seconds?

Currently I'm playing with Tensorflow Object Detection Api with my own dataset. I want to "hide" detection bounding boxes for every 5 frames. Thus, those bounding boxes will be displayed as "blinking" and they will get much more attention while using detection framework.
I've already messed up with visualization_utils.py and tried to have another method for visualizing bounding boxes for this purpose and used it with a while loop:
def draw_bounding_box_every5(image,
ymin,
xmin,
ymax,
xmax,
count,
use_normalized_coordinates=True):
#while True:
count+=1
draw = ImageDraw.Draw(image)
im_width, im_height = image.size
## with the "if" below, I'm aiming to have only 1 bounding box to display from every 5 frames.##
if(count %5 == 0):
if use_normalized_coordinates:
(left, right, top, bottom) = (xmin * im_width, xmax * im_width,
ymin * im_height, ymax * im_height)
else:
(left, right, top, bottom) = (xmin, xmax, ymin, ymax)
draw.line([(left, top), (left, bottom), (right, bottom),
(right, top), (left, top)])
print("line drawed")
If I don't use while True loop at the beginning, Tensorflow will continue to display detected objects as expected. But when I use that, it crashes. I guess creating an infinite loop before displaying bounding boxes disabling some callback functions when drawing.
If anyone knows how to make blinking bounding boxes, i'm all ears.
Thanks in advance.
I gave up on this. Instead of getting rid of bounding boxes, I've just added another color into them. Bounding boxes are now changing color within a fixed time.

Convert YoloV3 output to coordinates of bounding box, label and confidence

I run YoloV3 model and get detections - dictionary of 3 entries:
"detector/yolo-v3/Conv_22/BiasAdd/YoloRegion" : numpy.ndarray with
shape (1,255,52,52),
"detector/yolo-v3/Conv_6/BiasAdd/YoloRegion" : numpy.ndarray with
shape (1,255,13,13),
"detector/yolo-v3/Conv_14/BiasAdd/YoloRegion" : numpy.ndarray with
shape (1,255,26,26).
I know that each entry in dictionary is other size of object detection.
Conv_22 is for small objects
Conv_14 is for medium objects
Conv_6 is for big objects
How can I convert this dictionary output to coordinates of bounding box, label and confidence?
Presuming you use python and opencv,
Pelase find the below code with comments where ever required, to extract the output using cv2.dnn module.
net.setInput(blob)
layerOutputs = net.forward(ln)
boxes = []
confidences = []
classIDs = []
for output in layerOutputs:
# loop over each of the detections
for detection in output:
# extract the class ID and confidence (i.e., probability) of
# the current object detection
scores = detection[5:]
classID = np.argmax(scores)
confidence = scores[classID]
# filter out weak predictions by ensuring the detected
# probability is greater than the minimum probability
if confidence > threshold:
# scale the bounding box coordinates back relative to the
# size of the image, keeping in mind that YOLO actually
# returns the center (x, y)-coordinates of the bounding
# box followed by the boxes' width and height
box = detection[0:4] * np.array([W, H, W, H])
(centerX, centerY, width, height) = box.astype("int")
# use the center (x, y)-coordinates to derive the top and
# and left corner of the bounding box
x = int(centerX - (width / 2))
y = int(centerY - (height / 2))
# update our list of bounding box coordinates, confidences,
# and class IDs
boxes.append([x, y, int(width), int(height)])
confidences.append(float(confidence))
classIDs.append(classID)
idxs = cv2.dnn.NMSBoxes(boxes, confidences, confidence, threshold)
#results are stored in idxs,boxes,confidences,classIDs

Rectangles overlapping

I am using opencv to draw rectangles over images with the xmin, ymin, xmax, ymax value of the rectangles given in a list.
List of point is
points = [(1707.0, 1865.0, 2331.0, 2549.0),(1348.0, 1004.0, 1987.0, 1746.0),(749.0, 2129.0, 1674.0, 2939.0)
,(25.0, 1134.0, 1266.0, 2108.0),(253.0, 1731.0, 1403.0, 2449.0)]
image = cv2.imread("pathtoimage")
for point in points:
xmin,ymin,xmax,ymax = point
result_image = cv2.rectangle(image, (int(xmin), int(xmax)), (int(ymin),int(ymax)), (0,255,0), 8)
os.remove("/home/atul/Documents/CarLabel/imagemapping1-wp-BD489663-BD55-484E-9EA7-EB5662B626B9.png")
cv2.imwrite("/home/atul/Documents/CarLabel/imagemapping1-wp-BD489663-BD55-484E-9EA7-EB5662B626B9.png",result_image)
Rectangles are getting overlapped into each other.
How can I resolve this.
Original Image
Resulting image
cv2.rectangle needs the coordintates of top-left and bottom-right points. So you should use:
result_image = cv2.rectangle(image, (int(xmin), int(ymin)), (int(xmax),int(ymax)), (0,255,0), 8)

Converting pixels in grayscale image to black (OpenCV) causes unexpected result?

I'm trying to manually mask an image so that only a circular area of the image is visible. I've attempted to accomplish this by specifying the center of the circle and the radius- The distance from the center to each pixel in the image, and if that distance is greater than the radius, I turn that pixel black.
My input image is all white (for testing purposes), and is 6000x4000 px.
The center is (3224,2032) and radius is 1810 px.
After processing it, I get this poorly masked image
Here, the blue circle is the area I expected not to be changed: Expected Result
What is going on??
Edit: I switched the indices for xMax and yMax. The change yielded this result
import cv2
def mask (img, Xcenter, Ycenter, radius):
radius2 = float(radius**2.00)
Xcenter = float(Xcenter)
Ycenter = float(Ycenter)
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
imgSize = img.shape
xMax = imgSize[0]
yMax = imgSize[1]
print("Drawing mask...")
for y in range(1,yMax):
for x in range(1,xMax):
dist2 = int(((x - Xcenter)**2 + (y - Ycenter)**2))
try:
if dist2 >= radius2:
img[x, y] = 0
except IndexError:
pass
#print ("Index error: X = ", x, " Y = ", y)
#char = input("Press enter to continue")
cv2.imwrite("maskedImg.tif", img)
print("Completed mask.")
return img
.shape returns the number of rows/columns/channels of the image.
.size returns the number of pixels.
These are most likely going to be different, so when you reference pixels with the circle function, they don't line up as you'd expect.
It looks like you've mixed up x and y. Number of "rows" would be Y, so
yMax = imgSize[0]
xMax = imgSize[1]
Remember, point (0,0) is top left.
Also, for testing, I’d suggest putting your circle in the centre first. It doesn’t REALLY matter but sometimes you can debug these things more easily in pictures.

Categories