I have a mask drawn over an apple using segmentation. The mask layer has 1's where the pixel is part of the apple and 0's everywhere else. How do i find the extreme pixels in the mask to find the bounding box coordinates around this mask? I am using pytorch and yolact edge to perform the segmentation as shown in Yolact
Relevant stackoverflow answer with nice explanation.
TL;DR
Proposed code snippets (second is faster):
def bbox1(img):
a = np.where(img != 0)
bbox = np.min(a[0]), np.max(a[0]), np.min(a[1]), np.max(a[1])
return bbox
def bbox2(img):
rows = np.any(img, axis=1)
cols = np.any(img, axis=0)
rmin, rmax = np.where(rows)[0][[0, -1]]
cmin, cmax = np.where(cols)[0][[0, -1]]
return rmin, rmax, cmin, cmax
But in more general case (e.g. if you have more than one "instance" on image and each mask is separated from others) it may be worth to consider using OpenCV.
Specifically cv2.connectedComponentsWithStats.
Some brilliant description of this function can be found in another relevant answer.
num_labels, labels, stats, centroids = cv2.connectedComponentsWithStats(mask)
Labels is a matrix the size of the input image where each element has a value equal to its label.
Stats is a matrix of the stats that the function calculates. It has a length equal to the number of labels and a width equal to the
number of stats. It can be used with the OpenCV documentation for it:
Statistics output for each label, including the background label, see
below for available statistics. Statistics are accessed via
stats[label, COLUMN] where available columns are defined below.
cv2.CC_STAT_LEFT The leftmost (x) coordinate which is the inclusive start of the bounding box in the horizontal direction.
cv2.CC_STAT_TOP The topmost (y) coordinate which is the inclusive start of the bounding box in the vertical direction.
cv2.CC_STAT_WIDTH The horizontal size of the bounding box
cv2.CC_STAT_HEIGHT The vertical size of the bounding box
cv2.CC_STAT_AREA The total area (in pixels) of the connected component
Centroids is a matrix with the x and y locations of each centroid. The row in this matrix corresponds to the label number.
So, basically each item in stats (first 4 values) determine the bounding box of each connected component (instance) in mask.
Possible function that you can use to return just bounding boxes:
def get_bounding_boxes(mask, min_size=None):
num_components, labeled_image, bboxes, centroids = cv2.connectedComponentsWithStats(image)
# return bboxes in cv2 format [x, y, w, h] without background bbox and component size
return bboxes[1:, :-1]
# (x, y, x+w, y+h) are 4 points that you are looking for
And of course in case of one instance this approach still works.
Related
I'm trying to morph two images of faces using an inverse warp. I have the Delaunay triangles for both images as well as all transformation matrices for all pairs of corresponding triangles.
I have applied the matrix to every pixel inside the triangles, but the image I am getting is all messed up and some pixels aren't being filled in as well.
I suspect the vertices lists are not in order which means the triangles are not corresponding. Or it could just be me messing up the row, cols order.
Here's my code:
from scipy.spatial import Delaunay
from skimage.draw import polygon
import numpy as np
def drawDelaunay(img, landmarks, color):
tri = Delaunay(landmarks)
vertices = []
for t in landmarks[tri.simplices]:
# t = [int(i) for i in t]
pt1 = [t[0][0], t[0][1]]
pt2 = [t[1][0], t[1][1]]
pt3 = [t[2][0], t[2][1]]
cv2.line(img, pt1, pt2, color, 1, cv2.LINE_AA, 0)
cv2.line(img, pt2, pt3, color, 1, cv2.LINE_AA, 0)
cv2.line(img, pt3, pt1, color, 1, cv2.LINE_AA, 0)
vertices.append([pt1, pt2, pt3])
return img, vertices
def getAffineMat(triangle1, triangle2):
x = np.transpose(np.matrix([*triangle1]))
y = np.transpose(np.matrix([*triangle2]))
# Add ones to bottom of x and y
x = np.vstack((x, [1,1,1]))
y = np.vstack((y, [1,1,1]))
xInv = np.linalg.pinv(x)
return np.dot(y, xInv)
srcImg = face2
srcRows, srcCols, srcDepth = face2.shape
destImg = np.zeros(face1.shape, dtype=np.uint8)
for triangle1, triangle2 in zip(vertices1, vertices2):
transMat = getAffineMat(triangle1, triangle2)
r, c = list(map(list, zip(*triangle2)))
rr, cc = polygon(r, c)
for row, col in zip(rr, cc):
transformed = np.dot(transMat, [col, row, 1])
srcX, srcY, *_ = np.array(transformed.T)
# Check if pixel is within image boundaries
if isWithinBounds(srcCols, srcRows, col, row):
# Interpolate the color of the pixel from the four nearest pixels
color = bilinearInterpolation(srcImg, srcX, srcY)
# Set the color of the current pixel in the destination image
destImg[row, col] = color
I wish to implement this without getAffineTransform or warpAffine. Any help would be much appreciated!
Sources:
Transfer coordinates from one triangle to another triangle
https://devendrapratapyadav.github.io/FaceMorphing/
But you don't have corresponding triangles! This looks like 2 separates Delaunay triangulation. Maybe made on matching points, but still no matching triangles. You can't do two Delaunay triangulation, one in each image, and expect them to match. You need 1 delaunay triangulation, and then use the same edges on both sides (so, for at least one side, triangulation will not be exactly Delaunay).
Look for example at the top-right corner of your images.
On one side you have you have 4 outgoing edges (counting those we can't see because they are confused with te image border, but they have to be there), on the other you have 6 outgoing edges.
The number of edges connected to two matching vertices is supposed to be a constant (otherwise, how could you warp anything?).
So, clearly, I think (but you did not provide any code, for that, since you postulate that triangulation is correct, when I am pretty sure it is triangulation that is not. So I can only surmise), you got a two sets of matching points, then performed 2 Delaunay's triangulation on those 2 sets of points, expecting to be able to match triangles, even tho they are not at all the same triangles.
Edit: how to transform
(in reply to your question in comment)
It's the same triangulations. You have a list of points p₁, p₂, p₃, ..., pₙ in the first images. A matching list of points q₁, q₂, q₃, ..., qₙ in the second image. You perform a triangulation in the 1st image. Whose output should be a list of triplets of indices, such as (1,3,4), (1, 2, 3), ... meaning that optimal triangulation in 1st image is the one made of triangle (p₁,p₃, p₄), (p₁, p₂, p₃), ...
And in the second image, you use triangulation (q₁,q₃,q₄), (q₁, q₂, q₃), ...
Even if it is not the optimal triangulation of q₁,q₂,...,qₙ (the one that maximize smallest angle). It should not be that far, if q₁,q₂,...,qₙ are not that different from p₁,p₂,...,pₙ (which they are not supposed to be, if you tried to match consistently both images).
So, transformation matrices are the one transforming coordinates in each matching triangles (there are one transformation for each pair of matching triangles).
To decide which point (x',y') of second image matches point (x,y) of first image, you need
to identify in which triangle (i,j,k) (that is (pᵢ,pⱼ,pₖ)) (x,y) is,
Find barycentric coordinates of (x,y) inside this triangle: (x,y)=αpᵢ+βpⱼ+γpₖ
Assume that (x',y') have the same barycentric coordinates inside the matching triangle, that is (x',y')=αqᵢ+βqⱼ+γqₖ
Transformation matrix (for triangle (i,j,k)) is the one going from (x,y) to (x',y')
I am trying to associate rgb values to pixel coordinates after having done a perspective projection. The equation for the perspective projection is:
where x, y, are the pixel locations of the point, X, Y, and Z are locations of points in the camera frame, and the other parameters denote the intrinsic camera parameters. Given a point cloud containing the point locations and rgb values, I would like to associate rgb values to pixel locations according to the perspective projection.
The following code should create the correct image:
import matplotlib.pyplot as plt
import open3d as o3d
import numpy as np
cx = 325.5;
cy = 253.5;
fx = 518.0;
fy = 519.0;
K = np.array([[fx, 0, cx], [0, fy, cy], [0, 0, 1]])
pcd = o3d.io.read_point_cloud('freiburg.pcd', remove_nan_points=True)
points = np.array(pcd.points)
colors = np.array(pcd.colors)
projection = (K # points.T).T
normalization = projection / projection[:, [2]] #last elemet must be 1
pixel_coordinates = normalization.astype(int)
img = np.zeros((480, 640, 3))
#how can I fill the img appropriately? The matrix pixel coordinates should
# inform about where to place the color intensities.
for position, intensity in zip(pixel_coordinates, colors):
row, column = position[0], position[1]
#img[row, column, :] = intensity # returns with error
img[column, row, :] = intensity # gives a strange picture.
The point cloud can be read here. I expect to be able to associate the rgb values in the last loop:
for position, intensity in zip(pixel_coordinates, colors):
row, column = position[0], position[1]
#img[row, column, :] = intensity # returns with error
img[column, row, :] = intensity # gives a strange picture.
Strangely, if the second-to-last line is not commented, the program returns and IndexError while attempting to write a rgb values outside the range of available columns. The last line in the loop runs however without problems. The generated picture and the correct picture can be seen below:
How can I modify the code above to obtain the correct image?
A couple of issues:
You are ignoring the nonlinear distortion in the projection. Are the images you are comparing to undistorted? If they are, are you sure your projection matrix K is the one associated to the undistorted image?
Projecting the 3D points will inevitably produce a point cloud on the image plane, not a continuous image. To produce an image somewhat natural you likely need to interpolate nearby samples in the 2D point cloud. Your choice of interpolation filter determines the quality of the result. For example, you could first make an image of rgb buckets, a similar image of weights, project the 3d points, place their rgb values in the closest bucket (the one obtained by rounding the projection x,y coords), with a weight equal to the reciprocal of the distance of the projection from the bucket's center (i.e. the reciprocal of the euclidean norm of the rounding residuals). You then first compute the output pixel values as weighted averages at each bucket and then, if there are any unfilled bucket, you fill them by (say) bilinear interpolation of the filled neighbors. The last step will fill 1-pixel holes surrounded by already filled values. For larger holes you will need to choose some kind of infill procedure.
I'm using skimage.measure.marching_cubes to extract a surface, defined as faces and vertices. marching_cubes also outputs values for each face.
How do I "smooth" these values (the actual smoothing could be a low-pass filter, median filter etc)? I thought that one way to achieve this would be to project, or to represent this surface in 2D, and then apply standard filters, but I can't think of how to do this from a list of faces and vertices.
The reason for this "smoothing" is because the values are not informative at the scale of a single face of the surface, but over larger areas of the surface represented by many faces.
Thanks in advance!
I eventually found a way to do this, based on MATLAB code from this paper:
Welf et al. "Quantitative Multiscale Cell Imaging in Controlled 3D Microenvironments" in Developmental Cell, 2016, Vol 36, Issue 4, p462-475
def median_filter_surface(faces, verts, measure, radius, p_norm=2):
from scipy import spatial
import numpy as np
# INPUT:
# faces: triangular surface faces - defined by 3 vertices
# verts: the above vertices, defined by x,y,z coordinates
# measure: the value related to each face that needs to be filtered
# radius: the radius for median filtering (larger = more filtering)
# p_norm: distance metric for the radius, default 2 (euclidian)
# OUTPUT:
# measure_med_filt: the "measure" after filtering
num_faces = len(faces)
face_centres = np.zeros((num_faces, 3))
# get face centre positions in 3D space (from vert coordinates)
for face in range(0, num_faces):
face_centres[face, :] = np.mean(verts[faces[face, :], :], 0)
# return all other points within a radius
tree = spatial.KDTree(face_centres)
faces_in_radius = tree.query_ball_point(face_centres, radius, p_norm)
measure_med_filt = np.zeros(len(faces))
for face in range(0, len(faces)):
measure_med_filt[face] = np.median(measure[faces_in_radius[face]])
return measure_med_filt
I am trying to circular mask an image in Python. I found some example code on the web, but I'm not sure how to change the maths to get my circle in the correct place.
I have an image image_data of type numpy.ndarray with shape (3725, 4797, 3):
total_rows, total_cols, total_layers = image_data.shape
X, Y = np.ogrid[:total_rows, :total_cols]
center_row, center_col = total_rows/2, total_cols/2
dist_from_center = (X - total_rows)**2 + (Y - total_cols)**2
radius = (total_rows/2)**2
circular_mask = (dist_from_center > radius)
I see that this code applies euclidean distance to calculate dist_from_center, but I don't understand the X - total_rows and Y - total_cols part. This produces a mask that is a quarter of a circle, centered on the top-left of the image.
What role are X and Y playing on the circle? And how can I modify this code to produce a mask that is centered somewhere else in the image instead?
The algorithm you got online is partly wrong, at least for your purposes. If we have the following image, we want it masked like so:
The easiest way to create a mask like this is how your algorithm goes about it, but it's not presented in the way that you want, nor does it give you the ability to modify it in an easy way. What we need to do is look at the coordinates for each pixel in the image, and get a true/false value for whether or not that pixel is within the radius. For example, here's a zoomed in picture showing the circle radius and the pixels that were strictly within that radius:
Now, to figure out which pixels lie inside the circle, we'll need the indices of each pixel in the image. The function np.ogrid() gives two vectors, each containing the pixel locations (or indices): there's a column vector for the column indices and a row vector for the row indices:
>>> np.ogrid[:4,:5]
[array([[0],
[1],
[2],
[3]]), array([[0, 1, 2, 3, 4]])]
This format is useful for broadcasting so that if we use them in certain functions, it will actually create a grid of all the indices instead of just those two vectors. We can thus use np.ogrid() to create the indices (or pixel coordinates) of the image, and then check each pixel coordinate to see if it's inside or outside the circle. In order to tell whether it's inside the center, we can simply find the Euclidean distance from the center to every pixel location, and then if that distance is less than the circle radius, we'll mark that as included in the mask, and if it's greater than that, we'll exclude it from the mask.
Now we've got everything we need to make a function that creates this mask. Furthermore we'll add a little bit of nice functionality to it; we can send in the center and the radius, or have it automatically calculate them.
def create_circular_mask(h, w, center=None, radius=None):
if center is None: # use the middle of the image
center = (int(w/2), int(h/2))
if radius is None: # use the smallest distance between the center and image walls
radius = min(center[0], center[1], w-center[0], h-center[1])
Y, X = np.ogrid[:h, :w]
dist_from_center = np.sqrt((X - center[0])**2 + (Y-center[1])**2)
mask = dist_from_center <= radius
return mask
In this case, dist_from_center is a matrix the same height and width that is specified. It broadcasts the column and row index vectors into a matrix, where the value at each location is the distance from the center. If we were to visualize this matrix as an image (scaling it into the proper range), then it would be a gradient radiating from the center we specify:
So when we compare it to radius, it's identical to thresholding this gradient image.
Note that the final mask is a matrix of booleans; True if that location is within the radius from the specified center, False otherwise. So we can then use this mask as an indicator for a region of pixels we care about, or we can take the opposite of that boolean (~ in numpy) to select the pixels outside that region. So using this function to color pixels outside the circle black, like I did up at the top of this post, is as simple as:
h, w = img.shape[:2]
mask = create_circular_mask(h, w)
masked_img = img.copy()
masked_img[~mask] = 0
But if we wanted to create a circular mask at a different point than the center, we could specify it (note that the function is expecting the center coordinates in x, y order, not the indexing row, col = y, x order):
center = (int(w/4), int(h/4))
mask = create_circular_mask(h, w, center=center)
Which, since we're not giving a radius, would give us the largest radius so that the circle would still fit in the image bounds:
Or we could let it calculate the center but use a specified radius:
radius = h/4
mask = create_circular_mask(h, w, radius=radius)
Giving us a centered circle with a radius that doesn't extend exactly to the smallest dimension:
And finally, we could specify any radius and center we wanted, including a radius that extends outside the image bounds (and the center can even be outside the image bounds!):
center = (int(w/4), int(h/4))
radius = h/2
mask = create_circular_mask(h, w, center=center, radius=radius)
What the algorithm you found online does is equivalent to setting the center to (0, 0) and setting the radius to h:
mask = create_circular_mask(h, w, center=(0, 0), radius=h)
I'd like to offer a way to do this that doesn't involve the np.ogrid() function. I'll crop an image called "robot.jpg", which is 491 x 491 pixels. For readability I'm not going to define as many variables as I would in a real program:
Import libraries:
import matplotlib.pyplot as plt
from matplotlib import image
import numpy as np
Import the image, which I'll call "z". This is a color image so I'm also pulling out just a single color channel. Following that, I'll display it:
z = image.imread('robot.jpg')
z = z[:,:,1]
zimg = plt.imshow(z,cmap="gray")
plt.show()
robot.jpg as displayed by matplotlib.pyplot
To wind up with a numpy array (image matrix) with a circle in it to use as a mask, I'm going to start with this:
x = np.linspace(-10, 10, 491)
y = np.linspace(-10, 10, 491)
x, y = np.meshgrid(x, y)
x_0 = -3
y_0 = -6
mask = np.sqrt((x-x_0)**2+(y-y_0)**2)
Note the equation of a circle on that last line, where x_0 and y_0 are defining the center point of the circle in a grid which is 491 elements tall and wide. Because I defined the grid to go from -10 to 10 in both x and y, it is within that system of units that x_0 and x_y set the center point of the circle with respect to the center of the image.
To see what that produces I run:
maskimg = plt.imshow(mask,cmap="gray")
plt.show()
Our "proto" masking circle
To turn that into an actual binary-valued mask, I'm just going to take every pixel below a certain value and set it to 0, and take every pixel above a certain value and set it to 256. The "certain value" will determine the radius of the circle in the same units defined above, so I'll call that 'r'. Here I'll set 'r' to something and then loop through every pixel in the mask to determine if it should be "on" or "off":
r = 7
for x in range(0,490):
for y in range(0,490):
if mask[x,y] < r:
mask[x,y] = 0
elif mask[x,y] >= r:
mask[x,y] = 256
maskimg = plt.imshow(mask,cmap="gray")
plt.show()
The mask
Now I'll just multiply the mask by the image element-wise, then display the result:
z_masked = np.multiply(z,mask)
zimg_masked = plt.imshow(z_masked,cmap="gray")
plt.show()
To invert the mask I can just swap the 0 and the 256 in the thresholding loop above, and if I do that I get:
Masked version of robot.jpg
The other answers work, but they are slow, so I will propose an answer using skimage.draw.disk. Using this is faster and I find it simple to use. Simply specify the center of the circle and radius then use the output to create a mask
from skimage.draw import disk
mask = np.zeros((10, 10), dtype=np.uint8)
row = 4
col = 5
radius = 5
rr, cc = disk(row, col, radius)
mask[rr, cc] = 1
I am using the OpenCV HoughCircles method in Python as follows:
circles = cv2.HoughCircles(img,cv.CV_HOUGH_GRADIENT,1,20,
param1=50,param2=30,minRadius=0,maxRadius=0)
This seems to work quite well. However, one thing I noticed is that it detects circles which can extend outside of the image boundaries. Does anyone know how I can filter these results out?
Think of each circle as being bounded inside a square of dimensions 2r x 2r where r is the radius of the circle. Also, the centre of this box is located at (x,y) which also corresponds to where the centre of the circle is located in the image. To see if the circle is within the image boundaries, you simply need to make sure that the box that contains the circle does not go outside of the image. Mathematically speaking, you would need to ensure that:
r <= x <= cols-1-r
r <= y <= rows-1-r # Assuming 0-indexing
rows and cols are the rows and columns of your image. All you really have to do now is cycle through every circle in the detected result and filter out those circles that go outside of the image boundaries by checking if the centre of each circle is within the two inequalities specified above. If the circle is within the two inequalities, you would save this circle. Any circles that don't satisfy the inequalities, you don't include this in the final result.
To put this logic to code, do something like this:
import cv # Load in relevant packages
import cv2
import numpy as np
img = cv2.imread(...,0) # Load in image here - Ensure 8-bit grayscale
final_circles = [] # Stores the final circles that don't go out of bounds
circles = cv2.HoughCircles(img,cv.CV_HOUGH_GRADIENT,1,20,param1=50,param2=30,minRadius=0,maxRadius=0) # Your code
rows = img.shape[0] # Obtain rows and columns
cols = img.shape[1]
circles = np.round(circles[0, :]).astype("int") # Convert to integer
for (x, y, r) in circles: # For each circle we have detected...
if (r <= x <= cols-1-r) and (r <= y <= rows-1-r): # Check if circle is within boundary
final_circles.append([x, y, r]) # If it is, add this to our final list
final_circles = np.asarray(final_circles).astype("int") # Convert to numpy array for compatability
The peculiar thing about cv2.HoughCircles is that it returns a 3D matrix where the first dimension is a singleton dimension. To eliminate this singleton dimension, I did circles[0, :] which will result in a 2D matrix. Each row of this new 2D matrix contains a tuple of (x, y, r) and characterizes where a circle is located in your image as well as its radius. I also converted the centres and radii to integers so that if you decide to draw them later on, you will be able to do it with cv2.circle.
you could, add a function which will take the center and the radius of the circle add them up/and subtract and check if this will result outside the boundaries of your image.