Extract rectangular boxes with padding - python

I'm trying to extract values From a 2d Tensor inside multiple rectangular regions. I want to crop rectangular regions while setting all values outside the box to zero.
For example from the 9 x 9 image I want to get two separate images with values inside the two rectangular red boxes, while setting the rest of the values to zero. Is there a convenient way to do this with tensorflow slicing?
One way I thought of approaching this is defining a mask array that is 1 inside the box and 0 outside and multiply it with the input array. But this requires looping over the number of boxes, each time changing which values of the mask are set to 0. Is there a faster and more efficient way to do this without using for loops? Is there an equivalent of crop and replace function in tensorflow? Here's the code I have with the for loop. Appreciate any input on this. Thanks
import tensorflow as tf
import matplotlib.pyplot as plt
import matplotlib.patches as patches
tf.reset_default_graph()
size = 9 # size of input image
num_boxes = 2 # number of rectangular boxes
def get_cutout(X, bboxs):
"""Returns copies of X with values only inside bboxs"""
out = []
for i in range(num_boxes):
bbox = bboxs[i] # get rectangular box coordinates
Y = tf.Variable(np.zeros((size, size)), dtype=tf.float32) # define temporary mask
# set values of mask inside box to 1
t = [Y[bbox[0]:bbox[2], bbox[2]:bbox[3]].assign(
tf.ones((bbox[2]-bbox[0], bbox[3]-bbox[2])))]
with tf.control_dependencies(t):
mask = tf.identity(Y)
out.append(X * mask) # get values inside rectangular box
return out, X
#define a 9x9 input image X and convert to tensor
in_x = np.eye(size)
in_x[0:3]=np.random.rand(3,9)
X = tf.constant(in_x , dtype=tf.float32)
bboxs = tf.placeholder(tf.int32, [None, 4]) # placeholder for rectangular box
X_outs = get_cutout(X, bboxs)
# coordintes of box ((bottom left x, bottom left y, top right x, top right y))
in_bbox = [[1,3,3,6], [4,3,7,8]]
feed_dict = {bboxs: in_bbox}
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
x_out= sess.run(X_outs, feed_dict=feed_dict)
# plot results
vmin = np.min(x_out[2])
vmax = np.max(x_out[2])
fig, ax = plt.subplots(nrows=1, ncols=1+len(in_bbox),figsize=(10,2))
im = ax[0].imshow(x_out[2], vmin=vmin, vmax=vmax, origin='lower')
plt.colorbar(im, ax=ax[0])
ax[0].set_title("input X")
for i, bbox in enumerate(in_bbox):
bottom_left = (bbox[2]-0.5, bbox[0]-0.5)
width = bbox[3]-bbox[2]
height = bbox[2]- bbox[0]
rect = patches.Rectangle(bottom_left, width, height,
linewidth=1,edgecolor='r',facecolor='none')
ax[0].add_patch(rect)
ax[i+1].set_title("extract values in box {}".format(i+1))
im = ax[i + 1].imshow(x_out[0][i], vmin=vmin, vmax=vmax, origin='lower')
plt.colorbar(im,ax=ax[i+1])

Thanks for that really nice function #edkevekeh. I've had to modify it slightly to get it to do what I want. One, I couldn't iterate over boxes which is a Tensor object. Plus, the crop size is determined by the box and not always 3x3. Also, tf.boolean_mask returns the crop, but I want to keep the crop, but replace outside the crop with 0. So I replaced the tf.boolean_mask with multiplication.
For my use case num_boxes can be large, so I wanted to know if there was a more efficient way than a for loop, guess not. My modified version of #edkevekeh's solution if anyone else needs it.
def extract_with_padding(image, boxes):
"""
boxes: tensor of shape [num_boxes, 4].
boxes are the coordinates of the extracted part
box is an array [y1, x1, y2, x2]
where [y1, x1] (respectively [y2, x2]) are the coordinates
of the top left (respectively bottom right ) part of the image
image: tensor containing the initial image
"""
extracted = []
shape = tf.shape(image)
for i in range(boxes.shape[0]):
b = boxes[i]
crop = tf.ones([b[2] - b[0], b[3] - b[1]])
mask = tf.pad(crop, [[b[0], shape[0] - b[2]], [b[1] , shape[1] - b[3]]])
extracted.append(image*mask)
return extracted

The mask can be created using tf.pad.
crop = tf.ones([3, 3])
# "before_axis_x" how many padding will be added before cropping zone over the axis x
# "after_axis_x" how many padding will be added after cropping zone over the axis x
mask = tf.pad(crop, [[before_axis_0, after_axis_0], [before_axis_1, after_axis_1]]
tf.mask(image, mask) # creates the extracted image
To have the same behavior as tf.image.crop_and_resize, here is a function that will take an array of boxes and will return an array of extracted images with padding.
def extract_with_padding(image, boxes):
"""
boxes: tensor of shape [num_boxes, 4].
boxes are the coordinates of the extracted part
box is an array [y1, x1, y2, x2]
where [y1, x1] (respectively [y2, x2]) are the coordinates
of the top left (respectively bottom right ) part of the image
image: tensor containing the initial image
"""
extracted = []
shape = tf.shape(image)
for b in boxes:
crop = tf.ones([3, 3])
mask = tf.pad(crop, [[b[0], shape[0] - b[2]], [b[1] , shape[1] - b[3]]])
extracted.append(tf.boolean_mask(image, mask))
return extracted

Related

How to split an image into multiple images based on white borders between them

I need to split an image into multiple images, based on the white borders between them.
for example:
output:
using Python, I don't know how to start this mission.
Here is a solution for the "easy" case where we know the grid configuration. I provide this solution even though I doubt this is what you were asked to do.
In your example image of the cat, if we are given the grid configuration, 2x2, we can do:
from PIL import Image
def subdivide(file, nx, ny):
im = Image.open(file)
wid, hgt = im.size # Size of input image
w = int(wid/nx) # Width of each subimage
h = int(hgt/ny) # Height of each subimage
for i in range(nx):
x1 = i*w # Horicontal extent...
x2 = x1+w # of subimate
for j in range(ny):
y1 = j*h # Certical extent...
y2 = y1+h # of subimate
subim = im.crop((x1, y1, x2, y2))
subim.save(f'{i}x{j}.png')
subdivide("cat.png", 2, 2)
The above will create these images:
My previous answer depended on knowing the grid configuration of the input image. This solution does not.
The main challenge is to detect where the borders are and, thus, where the rectangles that contain the images are located.
To detect the borders, we'll look for (vertical and horizontal) image lines where all pixels are "white". Since the borders in the image are not really pure white, we'll use a value less than 255 as the whiteness threshold (WHITE_THRESH in the code.)
The gist of the algorithm is in the following lines of code:
whitespace = [np.all(gray[:,i] > WHITE_THRESH) for i in range(gray.shape[1])]
Here "whitespace" is a list of Booleans that looks like
TTTTTFFFFF...FFFFFFFFTTTTTTTFFFFFFFF...FFFFTTTTT
where "T" indicates the corresponding horizontal location is part of the border (white).
We are interested in the x-locations where there are transitions between T and F. The call to the function slices(whitespace) returns a list of tuples of indices
[(x1, x2), (x1, x2), ...]
where each (x1, x2) pair indicates the xmin and xmax location of images in the x-axis direction.
The slices function finds the "edges" where there are transitions between True and False using the exclusive-or operator and then returns the locations of the transitions as a list of tuples (pairs of indices).
Similar code is used to detect the vertical location of borders and images.
The complete runnable code below takes as input the OP's image "cat.png" and:
Extracts the sub-images into 4 PNG files "fragment-0-0.png", "fragment-0-1.png", "fragment-1-0.png" and "fragment-1-1.png".
Creates a (borderless) version of the original image by pasting together the above fragments.
The runnable code and resulting images follow. The program runs in about 0.25 seconds.
from PIL import Image
import numpy as np
def slices(lst):
""" Finds the indices where lst changes value and returns them in pairs
lst is a list of booleans
"""
edges = [lst[i-1] ^ lst[i] for i in range(len(lst))]
indices = [i for i,v in enumerate(edges) if v]
pairs = [(indices[i], indices[i+1]) for i in range(0, len(indices), 2)]
return pairs
def extract(xx_locs, yy_locs, image, prefix="image"):
""" Locate and save the subimages """
data = np.asarray(image)
for i in range(len(xx_locs)):
x1,x2 = xx_locs[i]
for j in range(len(yy_locs)):
y1,y2 = yy_locs[j]
arr = data[y1:y2, x1:x2, :]
Image.fromarray(arr).save(f'{prefix}-{i}-{j}.png')
def assemble(xx_locs, yy_locs, prefix="image", result='composite'):
""" Paste the subimages into a single image and save """
wid = sum([p[1]-p[0] for p in xx_locs])
hgt = sum([p[1]-p[0] for p in yy_locs])
dst = Image.new('RGB', (wid, hgt))
x = y = 0
for i in range(len(xx_locs)):
for j in range(len(yy_locs)):
img = Image.open(f'{prefix}-{i}-{j}.png')
dst.paste(img, (x,y))
y += img.height
x += img.width
y = 0
dst.save(f'{result}.png')
WHITE_THRESH = 110 # The original image borders are not actually white
image_file = 'cat.png'
image = Image.open(image_file)
# To detect the (almost) white borders, we make a grayscale version of the image
gray = np.asarray(image.convert('L'))
# Detect location of images along the x axis
whitespace = [np.all(gray[:,i] > WHITE_THRESH) for i in range(gray.shape[1])]
xx_locs = slices(whitespace)
# Detect location of images along the y axis
whitespace = [np.all(gray[i,:] > WHITE_THRESH) for i in range(gray.shape[0])]
yy_locs = slices(whitespace)
extract(xx_locs, yy_locs, image, prefix='fragment')
assemble(xx_locs, yy_locs, prefix='fragment', result='composite')
Individual fragments:
The composite image:

How to find the xmin, xmax, ymin, ymax of a mask

I have a mask drawn over an apple using segmentation. The mask layer has 1's where the pixel is part of the apple and 0's everywhere else. How do i find the extreme pixels in the mask to find the bounding box coordinates around this mask? I am using pytorch and yolact edge to perform the segmentation as shown in Yolact
Relevant stackoverflow answer with nice explanation.
TL;DR
Proposed code snippets (second is faster):
def bbox1(img):
a = np.where(img != 0)
bbox = np.min(a[0]), np.max(a[0]), np.min(a[1]), np.max(a[1])
return bbox
def bbox2(img):
rows = np.any(img, axis=1)
cols = np.any(img, axis=0)
rmin, rmax = np.where(rows)[0][[0, -1]]
cmin, cmax = np.where(cols)[0][[0, -1]]
return rmin, rmax, cmin, cmax
But in more general case (e.g. if you have more than one "instance" on image and each mask is separated from others) it may be worth to consider using OpenCV.
Specifically cv2.connectedComponentsWithStats.
Some brilliant description of this function can be found in another relevant answer.
num_labels, labels, stats, centroids = cv2.connectedComponentsWithStats(mask)
Labels is a matrix the size of the input image where each element has a value equal to its label.
Stats is a matrix of the stats that the function calculates. It has a length equal to the number of labels and a width equal to the
number of stats. It can be used with the OpenCV documentation for it:
Statistics output for each label, including the background label, see
below for available statistics. Statistics are accessed via
stats[label, COLUMN] where available columns are defined below.
cv2.CC_STAT_LEFT The leftmost (x) coordinate which is the inclusive start of the bounding box in the horizontal direction.
cv2.CC_STAT_TOP The topmost (y) coordinate which is the inclusive start of the bounding box in the vertical direction.
cv2.CC_STAT_WIDTH The horizontal size of the bounding box
cv2.CC_STAT_HEIGHT The vertical size of the bounding box
cv2.CC_STAT_AREA The total area (in pixels) of the connected component
Centroids is a matrix with the x and y locations of each centroid. The row in this matrix corresponds to the label number.
So, basically each item in stats (first 4 values) determine the bounding box of each connected component (instance) in mask.
Possible function that you can use to return just bounding boxes:
def get_bounding_boxes(mask, min_size=None):
num_components, labeled_image, bboxes, centroids = cv2.connectedComponentsWithStats(image)
# return bboxes in cv2 format [x, y, w, h] without background bbox and component size
return bboxes[1:, :-1]
# (x, y, x+w, y+h) are 4 points that you are looking for
And of course in case of one instance this approach still works.

Do OpenCV operations impede NumPy arithmetic?

I'm attempting to use NumPy and OpenCV to perform object anonymization. The objective is to anonymize faces by isolating them with a mask, blurring said content, and then joining the result to another image. The other image is gotten by inverting the mask and applying it to the source image. The two pieces are combined by addition.
def apply_mask(image, mask):
return np.multiply(image, mask)
Let's test this on a toy example:
image = np.random.randn(256,256)
mask = Mask(256, 256, [64, 64, 192, 192])
Here we've created an object such that mask.inverted_array_label is a 256x256 monochrome image with all pixel intensities zero, except for the rectangle with upper left corner (64,64) and lower right corner (192,192). All pixels in this rectangle have intensity 1.
kernel_size = (121,121)
X = apply_mask(image, mask.inverted_array_label)
X = cv2.GaussianBlur(X, kernel_size, 0)
plt.imshow(X, cmap="gray")
Now we apply a Gaussian blur to the inverted mask, then apply the result to the original image . . .
opp_mask = cv2.GaussianBlur(mask.array_label, kernel_size, 0)
Y = apply_mask(image, opp_mask)
plt.imshow(Y, cmap="gray")
Now I want to add the first image to the second.
Z = np.zeros(X.shape)
Z = np.add(X,Y)
plt.imshow(Z, cmap="gray")
num = 190
print(X[num,num], Y[num,num], Z[num,num])
w1 = X[num, num]
w2 = Y[num, num]
w3 = w1+w2
print(w1,w2,w3)
The arithmetic checks out:
0.006876066498679399 0.7493522694977256 0.756228335996405
0.006876066498679399 0.7493522694977256 0.756228335996405
However, the image does not change as expected:
What am I missing? How do I properly add these two arrays?

How can I create a circular mask for a numpy array?

I am trying to circular mask an image in Python. I found some example code on the web, but I'm not sure how to change the maths to get my circle in the correct place.
I have an image image_data of type numpy.ndarray with shape (3725, 4797, 3):
total_rows, total_cols, total_layers = image_data.shape
X, Y = np.ogrid[:total_rows, :total_cols]
center_row, center_col = total_rows/2, total_cols/2
dist_from_center = (X - total_rows)**2 + (Y - total_cols)**2
radius = (total_rows/2)**2
circular_mask = (dist_from_center > radius)
I see that this code applies euclidean distance to calculate dist_from_center, but I don't understand the X - total_rows and Y - total_cols part. This produces a mask that is a quarter of a circle, centered on the top-left of the image.
What role are X and Y playing on the circle? And how can I modify this code to produce a mask that is centered somewhere else in the image instead?
The algorithm you got online is partly wrong, at least for your purposes. If we have the following image, we want it masked like so:
The easiest way to create a mask like this is how your algorithm goes about it, but it's not presented in the way that you want, nor does it give you the ability to modify it in an easy way. What we need to do is look at the coordinates for each pixel in the image, and get a true/false value for whether or not that pixel is within the radius. For example, here's a zoomed in picture showing the circle radius and the pixels that were strictly within that radius:
Now, to figure out which pixels lie inside the circle, we'll need the indices of each pixel in the image. The function np.ogrid() gives two vectors, each containing the pixel locations (or indices): there's a column vector for the column indices and a row vector for the row indices:
>>> np.ogrid[:4,:5]
[array([[0],
[1],
[2],
[3]]), array([[0, 1, 2, 3, 4]])]
This format is useful for broadcasting so that if we use them in certain functions, it will actually create a grid of all the indices instead of just those two vectors. We can thus use np.ogrid() to create the indices (or pixel coordinates) of the image, and then check each pixel coordinate to see if it's inside or outside the circle. In order to tell whether it's inside the center, we can simply find the Euclidean distance from the center to every pixel location, and then if that distance is less than the circle radius, we'll mark that as included in the mask, and if it's greater than that, we'll exclude it from the mask.
Now we've got everything we need to make a function that creates this mask. Furthermore we'll add a little bit of nice functionality to it; we can send in the center and the radius, or have it automatically calculate them.
def create_circular_mask(h, w, center=None, radius=None):
if center is None: # use the middle of the image
center = (int(w/2), int(h/2))
if radius is None: # use the smallest distance between the center and image walls
radius = min(center[0], center[1], w-center[0], h-center[1])
Y, X = np.ogrid[:h, :w]
dist_from_center = np.sqrt((X - center[0])**2 + (Y-center[1])**2)
mask = dist_from_center <= radius
return mask
In this case, dist_from_center is a matrix the same height and width that is specified. It broadcasts the column and row index vectors into a matrix, where the value at each location is the distance from the center. If we were to visualize this matrix as an image (scaling it into the proper range), then it would be a gradient radiating from the center we specify:
So when we compare it to radius, it's identical to thresholding this gradient image.
Note that the final mask is a matrix of booleans; True if that location is within the radius from the specified center, False otherwise. So we can then use this mask as an indicator for a region of pixels we care about, or we can take the opposite of that boolean (~ in numpy) to select the pixels outside that region. So using this function to color pixels outside the circle black, like I did up at the top of this post, is as simple as:
h, w = img.shape[:2]
mask = create_circular_mask(h, w)
masked_img = img.copy()
masked_img[~mask] = 0
But if we wanted to create a circular mask at a different point than the center, we could specify it (note that the function is expecting the center coordinates in x, y order, not the indexing row, col = y, x order):
center = (int(w/4), int(h/4))
mask = create_circular_mask(h, w, center=center)
Which, since we're not giving a radius, would give us the largest radius so that the circle would still fit in the image bounds:
Or we could let it calculate the center but use a specified radius:
radius = h/4
mask = create_circular_mask(h, w, radius=radius)
Giving us a centered circle with a radius that doesn't extend exactly to the smallest dimension:
And finally, we could specify any radius and center we wanted, including a radius that extends outside the image bounds (and the center can even be outside the image bounds!):
center = (int(w/4), int(h/4))
radius = h/2
mask = create_circular_mask(h, w, center=center, radius=radius)
What the algorithm you found online does is equivalent to setting the center to (0, 0) and setting the radius to h:
mask = create_circular_mask(h, w, center=(0, 0), radius=h)
I'd like to offer a way to do this that doesn't involve the np.ogrid() function. I'll crop an image called "robot.jpg", which is 491 x 491 pixels. For readability I'm not going to define as many variables as I would in a real program:
Import libraries:
import matplotlib.pyplot as plt
from matplotlib import image
import numpy as np
Import the image, which I'll call "z". This is a color image so I'm also pulling out just a single color channel. Following that, I'll display it:
z = image.imread('robot.jpg')
z = z[:,:,1]
zimg = plt.imshow(z,cmap="gray")
plt.show()
robot.jpg as displayed by matplotlib.pyplot
To wind up with a numpy array (image matrix) with a circle in it to use as a mask, I'm going to start with this:
x = np.linspace(-10, 10, 491)
y = np.linspace(-10, 10, 491)
x, y = np.meshgrid(x, y)
x_0 = -3
y_0 = -6
mask = np.sqrt((x-x_0)**2+(y-y_0)**2)
Note the equation of a circle on that last line, where x_0 and y_0 are defining the center point of the circle in a grid which is 491 elements tall and wide. Because I defined the grid to go from -10 to 10 in both x and y, it is within that system of units that x_0 and x_y set the center point of the circle with respect to the center of the image.
To see what that produces I run:
maskimg = plt.imshow(mask,cmap="gray")
plt.show()
Our "proto" masking circle
To turn that into an actual binary-valued mask, I'm just going to take every pixel below a certain value and set it to 0, and take every pixel above a certain value and set it to 256. The "certain value" will determine the radius of the circle in the same units defined above, so I'll call that 'r'. Here I'll set 'r' to something and then loop through every pixel in the mask to determine if it should be "on" or "off":
r = 7
for x in range(0,490):
for y in range(0,490):
if mask[x,y] < r:
mask[x,y] = 0
elif mask[x,y] >= r:
mask[x,y] = 256
maskimg = plt.imshow(mask,cmap="gray")
plt.show()
The mask
Now I'll just multiply the mask by the image element-wise, then display the result:
z_masked = np.multiply(z,mask)
zimg_masked = plt.imshow(z_masked,cmap="gray")
plt.show()
To invert the mask I can just swap the 0 and the 256 in the thresholding loop above, and if I do that I get:
Masked version of robot.jpg
The other answers work, but they are slow, so I will propose an answer using skimage.draw.disk. Using this is faster and I find it simple to use. Simply specify the center of the circle and radius then use the output to create a mask
from skimage.draw import disk
mask = np.zeros((10, 10), dtype=np.uint8)
row = 4
col = 5
radius = 5
rr, cc = disk(row, col, radius)
mask[rr, cc] = 1

Set masked pixels in a 3D RGB numpy array

I'd like to set all pixels matching some condition in a 3d numpy array (RGB image) using a mask. I have something like this:
def make_dot(img, color, radius):
"""Make a dot of given color in the center of img (rgb numpy array)"""
(ydim,xdim,dummy) = img.shape
# make an open grid of x,y
y,x = np.ogrid[0:ydim, 0:xdim, ]
y -= ydim/2 # centered at the origin
x -= xdim/2
# now make a mask
mask = x**2+y**2 <= radius**2 # start with 2d
mask.shape = mask.shape + (1,) # make it 3d
print img[mask].shape
img[mask] = color
img = np.zeros((100, 200, 3))
make_dot(img, np.array((.1, .2, .3)), 25)
but that gives ValueError: array is not broadcastable to correct shape in this line:
img[mask] = color
because the shape of img[mask] is (1961,); i.e. it's flattened to contain only the "valid" pixels, which makes sense; but how can I make it "write through the mask" as it were to set only the pixels where the mask is 1? Note that I want to write three values at once to each pixel (the last dim).
You almost have it right.
(ydim,xdim,dummy) = img.shape
# make an open grid of x,y
y,x = np.ogrid[0:ydim, 0:xdim, ]
y -= ydim/2 # centered at the origin
x -= xdim/2
# now make a mask
mask = x**2+y**2 <= radius**2 # start with 2d
img[mask,:] = color
the extra ",:" at the end of the assignment lets you assign the color throughout the 3 channels in one shot.

Categories