YOLOv4 annotations saves dimensions in a [0,1] float interval - python

This is from an annotations file for an image:
0 0.6142131979695431 0.336 0.467005076142132 0.392
The first 0 is the class label. 0.6142131979695431 and 0.336 are the x and y coordinates of the bounding box. 0.467005076142132 and 0.392 are the width and the height of the bounding box. However, what I don't understand is why the x, y, width and height are in a [0,1] float interval. Someone told me that it is a percentage, but a percentage relative to what?
For example, I am writing software that builds a synthetic dataset. This is one training image that I have produced. It has the bounding boxes around the objects that I want to detect.
The bounding boxes wrap the Wizards and Ubuntu logos perfectly. So, how can I annotate them like the format above?

The width/height in YOLO format is the fraction of total width/height of the entire image. So the top-left corner is always (0,0) and bottom-right corner is always (1,1) irrespective of the size of the image.
See this question for the conversion of bounding box (x1, y1, x2, y2) to YOLO style (x, y, w, h).

Related

How to crop an image given proportional coordinates with Python PIL?

I have an image with dimension (1920x1080) with proportional coordinates provided with a description of the detected person region. I want to crop only the detected person from the image using provided proportional coordinates. I looked up PIL crop documentation and tried the following:
Provided in the integration documentation:
x0, y0 The x, y coordinates corresponding to the lower-right corner of
the person detection box. They are proportional distances from the
upper-left of the image.
x1, y1 The x, y coordinates corresponding to
the upper-left corner of the person detection box. They are
proportional distances from the upper-left of the image.
Sample integration description provided:
def img_crop(url, box):
box = {
'x0': 0.974,
'x1': 0.922,
'y0': 0.502,
'y1': 0.315
}
img = Image.open(requests.get(url, stream=True).raw)
h, w = img.size
print(img.size)
return img.crop((box['x0']*h, box['y0']*w, box['x1']*h, box['y1']*w))
This results in the following error
ValueError: Coordinate 'right' is less than 'left'
But your drawing contradict your own description of what x0,y0,x1,y1 are. It is said (in a picture of text btw; it is preferable to avoid that) that x0,y0 is the lower right corner, and x1,y1 the upper left corner.
Just invert x0,y0 and x1,y1.
Also, note fyi, that coordinates system in PIL (and generally speaking in most image processing system. Since this is how images formats are also done), starts from the upper left corner. Like an English text: pixels are organized from left to right, and from top to bottom.
EDIT: (answer to your comment)
One way would be to really just swap them and replace your .crop line by
return img.crop((box['x1']*h, box['y1']*w, box['x0']*h, box['y0']*w))
This would work in your code. Nevertheless, there are some other changes that are preferable. First of all, you call width of the image h, and height of the image w. Of course, it is not a problem from a python point of view, but it doesn't help readibility (I surmise that you did so because when images are np.array, such as opencv images, to get w and h, you would h,w,_=img.shape. But PIL .size return w first and h second. And then, you inverted w and h in the crop line to be consistent.
Secondly, it is quite strange to rely on the fact that x0 and y0 are the biggest x and y of the box, and x1, y1 are the smallest. It would be better to do the inversion in the calling code. You did not provide it, reason why I did not try to show correction: correction has to be done in code that is not provided. (You did provide a box, to override what is passed. So in that box you could do the swap as well)
box = {
'x1': 0.974,
'x0': 0.922,
'y1': 0.502,
'y0': 0.315
}
But the safest way, especially since you seem unsure about where all corners are, and taking into account that sometimes, x0 could be smaller than x1, which y0 is bigger than y1, is to compute which one is min, which one is max.
Like this:
from PIL import Image
import matplotlib.pyplot as plt
def img_crop(url, box):
box = {
'x0': 0.216,
'x1': 0.419,
'y0': 0.237,
'y1': 0.697
}
img = Image.open(requests.get(url, stream=True).raw)
w, h = img.size
print(img.size)
xmin=min(box['x0'], box['x1'])
xmax=max(box['x0'], box['x1'])
ymin=min(box['y0'], box['y1'])
ymax=max(box['y0'], box['y1'])
return img.crop((xmin*w, ymin*h, xmax*w, ymax*h))
There, no problem. Just pass the two x and the two y in the order x,y,x,y without bothering about which x to send first and which y to send first.
On your picture, with by version of box it gets

How to draw a circle on image given float (subpixel) coordinates of it center

I want to visualize results of keypoint tracking algorithm in python. I have a sequence of (Image, Keypoint) pairs (video basically). Tracking algorithm is strong enough to give me subpixel accuracy. But i have no idea, how to visualize it properly.
I tried to round my coordinates and draw a circle by cv2.circle(image, (int(round(x)), int(round(y)))), but it leads to visual jittering of my keypoints due to small image resolution.
I checked OpenCV, Pillow, skimage, Pygame (pygame.draw.circle). All of them cannot properly draw circle with float coordinates.
DIPlib has the function DrawBandlimitedBall(), which draws a disk or a circle with smooth transitions and with floating-point origin coordinates (disclosure: I'm one of the authors). You might need to draw the circle in an empty image, then blend it in to get the effect you are looking for. Code would look something like this:
import diplib as dip
img = dip.ImageRead('/Users/cris/dip/images/flamingo.tif')
p = [366.4, 219.1]
# Create an empty image and draw a circle in it
circle = dip.Image(img.Sizes(), 1, 'SFLOAT')
circle.Fill(0)
dip.DrawBandlimitedBall(circle, diameter=22.3, origin=p, value=1, mode='empty')
circle /= dip.Maximum(circle)
# Blend: img * (1-circle) + circle * color
img *= 1 - circle
img += circle * dip.Create0D([0,255,0]) # we make the circle green here
img.Show()
dip.ImageWrite(img, 'so.jpg')
(Note that the circle actually looks better without the JPEG compression artifacts.)
You could draw the circle directly in the image, but this function adds the circle values to the image, it doesn't attempt to blend, and so you'd get a much worse look for this particular application.

How to find the xmin, xmax, ymin, ymax of a mask

I have a mask drawn over an apple using segmentation. The mask layer has 1's where the pixel is part of the apple and 0's everywhere else. How do i find the extreme pixels in the mask to find the bounding box coordinates around this mask? I am using pytorch and yolact edge to perform the segmentation as shown in Yolact
Relevant stackoverflow answer with nice explanation.
TL;DR
Proposed code snippets (second is faster):
def bbox1(img):
a = np.where(img != 0)
bbox = np.min(a[0]), np.max(a[0]), np.min(a[1]), np.max(a[1])
return bbox
def bbox2(img):
rows = np.any(img, axis=1)
cols = np.any(img, axis=0)
rmin, rmax = np.where(rows)[0][[0, -1]]
cmin, cmax = np.where(cols)[0][[0, -1]]
return rmin, rmax, cmin, cmax
But in more general case (e.g. if you have more than one "instance" on image and each mask is separated from others) it may be worth to consider using OpenCV.
Specifically cv2.connectedComponentsWithStats.
Some brilliant description of this function can be found in another relevant answer.
num_labels, labels, stats, centroids = cv2.connectedComponentsWithStats(mask)
Labels is a matrix the size of the input image where each element has a value equal to its label.
Stats is a matrix of the stats that the function calculates. It has a length equal to the number of labels and a width equal to the
number of stats. It can be used with the OpenCV documentation for it:
Statistics output for each label, including the background label, see
below for available statistics. Statistics are accessed via
stats[label, COLUMN] where available columns are defined below.
cv2.CC_STAT_LEFT The leftmost (x) coordinate which is the inclusive start of the bounding box in the horizontal direction.
cv2.CC_STAT_TOP The topmost (y) coordinate which is the inclusive start of the bounding box in the vertical direction.
cv2.CC_STAT_WIDTH The horizontal size of the bounding box
cv2.CC_STAT_HEIGHT The vertical size of the bounding box
cv2.CC_STAT_AREA The total area (in pixels) of the connected component
Centroids is a matrix with the x and y locations of each centroid. The row in this matrix corresponds to the label number.
So, basically each item in stats (first 4 values) determine the bounding box of each connected component (instance) in mask.
Possible function that you can use to return just bounding boxes:
def get_bounding_boxes(mask, min_size=None):
num_components, labeled_image, bboxes, centroids = cv2.connectedComponentsWithStats(image)
# return bboxes in cv2 format [x, y, w, h] without background bbox and component size
return bboxes[1:, :-1]
# (x, y, x+w, y+h) are 4 points that you are looking for
And of course in case of one instance this approach still works.

How to get width given by a set of points that are not aligned to axis?

I have a depth image but I am manipulating it like a (2d) grayscale image to analyze the shape of the figures I have.
I am trying to get the width (distance) of a shape, as given by this image. The width is shown by the red line, which also follows the direction of vector v2.
I have the vectors shown in the image, resulting of a 2-components PCA to gather the direction of the shape (the shape in the picture is cropped, since I just need the width on red, on this part of the shape).
I have no clue, how to rotate the points to origin, or how to project the points to the line and then to calculate the width, somehow by calculating eucledian distance from min to max.
How to get width given by a set of points that are not aligned to axis?
I managed it using a rotated bounding box from cv2, as described by this solution.

Returning "chunk" of Image using Python and Pillow

This is a very rudimentary question and I'm sure that there's some part of the Pillow library/documentation I've missed...
Let's say you have a 128x128 image, and you want to save the "chunk" of it that is "x" pixels right from the top-left corner of the original image, and "y" pixels down from the top left corner of the original image (so the top left corner of this "chunk" is located at (x,y). If you know that the chunk you want is "a" pixels wide and "b" pixels tall (so the four corners of the chunk you want are known, and they're (x,y),(x+a,y),(x,y+b),(x+a,y+b)) - how would you save this "chunk" of the original image you're given as a separate image file?
More concisely, how can I save pieces of images given their pixel-coordinates using PIL? any help/pointers are appreciated.
Came up with:
"""
The function "crop" takes in large_img, small_img, x, y, w, h and returns the image lying within these restraints:
large_img: the filename of the large image
small_img: the desired filename of the smaller "sub-image"
x: x coordinate of the upper left corner of the bounding box
y: y coordinate of the upper left corner of the bounding box
w: width of the bounding box
h: height of the bounding box
"""
def crop(large_img, small_img, x, y, w, h):
img = Image.open(large_img)
box = (x, y, x+w, y+h)
area = img.crop(box)
area.save(small_img, 'jpeg')

Categories