I have a set of 68 keypoints (size [68, 2]) that I am mapping to gaussian heatmaps. To do this, I have the following function:
def generate_gaussian(t, x, y, sigma=10):
"""
Generates a 2D Gaussian point at location x,y in tensor t.
x should be in range (-1, 1).
sigma is the standard deviation of the generated 2D Gaussian.
"""
h,w = t.shape
# Heatmap pixel per output pixel
mu_x = int(0.5 * (x + 1.) * w)
mu_y = int(0.5 * (y + 1.) * h)
tmp_size = sigma * 3
# Top-left
x1,y1 = int(mu_x - tmp_size), int(mu_y - tmp_size)
# Bottom right
x2, y2 = int(mu_x + tmp_size + 1), int(mu_y + tmp_size + 1)
if x1 >= w or y1 >= h or x2 < 0 or y2 < 0:
return t
size = 2 * tmp_size + 1
tx = np.arange(0, size, 1, np.float32)
ty = tx[:, np.newaxis]
x0 = y0 = size // 2
# The gaussian is not normalized, we want the center value to equal 1
g = torch.tensor(np.exp(- ((tx - x0) ** 2 + (ty - y0) ** 2) / (2 * sigma ** 2)))
# Determine the bounds of the source gaussian
g_x_min, g_x_max = max(0, -x1), min(x2, w) - x1
g_y_min, g_y_max = max(0, -y1), min(y2, h) - y1
# Image range
img_x_min, img_x_max = max(0, x1), min(x2, w)
img_y_min, img_y_max = max(0, y1), min(y2, h)
t[img_y_min:img_y_max, img_x_min:img_x_max] = \
g[g_y_min:g_y_max, g_x_min:g_x_max]
return t
def rescale(a, img_size):
# scale tensor to [-1, 1]
return 2 * a / img_size[0] - 1
My current code uses a for loop to compute the gaussian heatmap for each of the 68 keypoint coordinates, then stacks the resulting tensors to create a [68, H, W] tensor:
x_k1 = [generate_gaussian(torch.zeros(H, W), x, y) for x, y in rescale(kp1.numpy(), frame.shape)]
x_k1 = torch.stack(x_k1, dim=0)
However, this method is super slow. Is there some way that I can do this without a for loop?
Edit:
I tried #Cris Luengo's proposal to compute a 1D Gaussian:
def generate_gaussian1D(t, x, y, sigma=10):
h,w = t.shape
# Heatmap pixel per output pixel
mu_x = int(0.5 * (x + 1.) * w)
mu_y = int(0.5 * (y + 1.) * h)
tmp_size = sigma * 3
# Top-left
x1, y1 = int(mu_x - tmp_size), int(mu_y - tmp_size)
# Bottom right
x2, y2 = int(mu_x + tmp_size + 1), int(mu_y + tmp_size + 1)
if x1 >= w or y1 >= h or x2 < 0 or y2 < 0:
return t
size = 2 * tmp_size + 1
tx = np.arange(0, size, 1, np.float32)
ty = tx[:, np.newaxis]
x0 = y0 = size // 2
g = torch.tensor(np.exp(-np.power(tx - mu_x, 2.) / (2 * np.power(sigma, 2.))))
g = g * g[:, None]
g_x_min, g_x_max = max(0, -x1), min(x2, w) - x1
g_y_min, g_y_max = max(0, -y1), min(y2, h) - y1
img_x_min, img_x_max = max(0, x1), min(x2, w)
img_y_min, img_y_max = max(0, y1), min(y2, h)
t[img_y_min:img_y_max, img_x_min:img_x_max] = \
g[g_y_min:g_y_max, g_x_min:g_x_max]
return t
but my output ends up being an incomplete gaussian.
I'm not sure what I'm doing wrong. Any help would be appreciated.
You generate an NxN array g with a Gaussian centered on its center pixel. N is computed such that it extends by 3*sigma from that center pixel. This is the fastest way to build such an array:
tmp_size = sigma * 3
tx = np.arange(1, tmp_size + 1, 1, np.float32)
g = np.exp(-(tx**2) / (2 * sigma**2))
g = np.concatenate((np.flip(g), [1], g))
g = g * g[:, None]
What we're doing here is compute half a 1D Gaussian. We don't even bother computing the value of the Gaussian for the middle pixel, which we know will be 1. We then build the full 1D Gaussian by flipping our half-Gaussian and concatenating. Finally, the 2D Gaussian is built by the outer product of the 1D Gaussian with itself.
We could shave a bit of extra time by building a quarter of the 2D Gaussian, then concatenating four rotated copies of it. But the difference in computational cost is not very large, and this is much simpler. Note that np.exp is the most expensive operation here by far, so just minimizing how often we call it we significantly reduce the computational cost.
However, the best way to speed up the complete code is to compute the array g only once, rather than anew for each key point. Note how your sigma doesn't change, so all the arrays g that are computed are identical. If you compute it only once, it no longer matters which method you use to compute it, since this will be a minimal portion of the total program anyway.
You could, for example, have a global variable _gaussian to hold your array, and have your function compute it only the first time it is called. Or you could separate your function into two functions, one that constructs this array, and one that copies it into an image, and call them as follows:
g = create_gaussian(sigma=3)
x_k1 = [
copy_gaussian(torch.zeros(H, W), x, y, g)
for x, y in rescale(kp1.numpy(), frame.shape)
]
On the other hand, you're likely best off using existing functionality. For example, DIPlib has a function dip.DrawBandlimitedPoint() [disclosure: I'm an author] that adds a Gaussian blob to an image. Likely you'll find similar functions in other libraries.
Related
I want to draw a line on an image. I have only to give the angle and the end point of the line. How can I do this with Python?
I think it is easy by identifying the vertical line passing through that given point and ploting the line according to the angle. The line should ends with the given point.
I tried it with this code. But didn't work.
import math
def get_coords(x, y, angle, imwidth, imheight):
#img = cv2.imread('contours_none_image2.jpg', 1)
x1_length = (x-imwidth) / math.cos(angle)
y1_length = (y-imheight) / math.sin(angle)
length = max(abs(x1_length), abs(y1_length))
endx1 = x + length * math.cos(math.radians(angle))
endy1 = y + length * math.sin(math.radians(angle))
x2_length = (x-imwidth) / math.cos(angle+45)
y2_length = (y-imheight) / math.sin(angle+45)
length = max(abs(x2_length), abs(y2_length))
endx2 = x + length * math.cos(math.radians(angle+45))
endy2 = y + length * math.sin(math.radians(angle+45))
cv2.line(img, (int(endx1),int(endy1)), (int(endx2),int(endy2)), (0, 255, 255), 3)
cv2.imshow("contours_none_image2.jpg", img)
#cv2.imshow("contours_none_image2.jpg", result)
cv2.waitKey(0)
cv2.destroyAllWindows()
return endx1, endy1, endx2, endy2
An interesting way for finding the intersection point between the Y axis and the line is by using three cross products with homogeneous coordinates.
Ways for finding lines intersections are described in Wikipedia.
The cross products solution using homogeneous coordinates is described here.
Start by finding a very "far" origin point (x, y) - outside the image:
length = cv2.norm(np.array([imwidth, imheight])) # Apply maximum possible length: length = sqrt(imwidth**2 + imheight**2)
x0 = x - length * math.cos(math.radians(angle))
y0 = y + length * math.sin(math.radians(angle)) # Reverse sings because y axis in image goes down
Finding intersection with the Y axis:
The Y axis may be described as a line from (0,0) to (0, imheight-1).
We may find the line representation in homogeneous coordinates using cross product:
p0 = np.array([0, 0, 1])
p1 = np.array([0, imheight-1, 1])
l0 = np.cross(p0, p1) # [-107, 0, 0]
In the same way we may find the representation of the line from (x0, y0) to (x, y):
p0 = np.array([x0, y0, 1])
p1 = np.array([x, y, 1])
l1 = np.cross(p0, p1)
Finding the intersection point using cross product between the lines, and "normalizing" the homogeneous coordinate:
p = np.cross(l0, l1)
p = p / p[2]
Code sample:
import math
import cv2
import numpy as np
img = np.zeros((108, 192, 3), np.uint8)
x, y, angle = 150, 20, 80
imheight, imwidth = img.shape[0], img.shape[1]
angle = 90 - angle # Usualy the angle is relative to the horizontal axis - use 90 - angle for swaping axes
length = cv2.norm(np.array([imwidth, imheight])) # Apply maximum possible length: length = sqrt(imwidth**2 + imheight**2)
x0 = x - length * math.cos(math.radians(angle))
y0 = y + length * math.sin(math.radians(angle)) # Reverse sings because y axis in image goes down
# http://robotics.stanford.edu/~birch/projective/node4.html
# Find lines in homogeneous coordinates (using cross product):
# l0 represents a line of Y axis.
p0 = np.array([0, 0, 1])
p1 = np.array([0, imheight-1, 1])
l0 = np.cross(p0, p1) # [-107, 0, 0]
# l1 represents
p0 = np.array([x0, y0, 1])
p1 = np.array([x, y, 1])
l1 = np.cross(p0, p1)
# https://en.wikipedia.org/wiki/Line%E2%80%93line_intersection
# Lines intersection in homogeneous coordinates (using cross product):
p = np.cross(l0, l1)
p = p / p[2]
x0, y0 = p[0], p[1]
# Convert from homogeneous coordinate to euclidean coordinate (divide by last element).
cv2.line(img, (int(x0),int(y0)), (int(x),int(y)), (0, 255, 255), 3)
cv2.imshow("img", img)
cv2.waitKey(0)
cv2.destroyAllWindows()
Sample output:
More conventional solution:
We may simply assign x0 = 0, and find length:
x0 = x - length * cos(alpha)
y0 = y + length * sin(alpha)
Assign x0 = 0:
x - length * cos(alpha) = 0
=> x = length * cos(alpha)
=> length = x/cos(alpha)
Code:
length = x / math.cos(math.radians(angle)) # We better verify that math.cos(math.radians(angle)) != 0
x0 = 0
y0 = y + length * math.sin(math.radians(angle))
cv2.line(img, (int(x0),int(y0)), (int(x),int(y)), (255, 0, 0), 3)
cv2.imshow("img", img)
cv2.waitKey(0)
cv2.destroyAllWindows()
Output:
I'm interested in plotting a real-valued function f(x,y,z)=a, where (x,y,z) is a 3D point on the sphere and a is a real number. I calculate the Cartesian coordinates of the points of the sphere as follows, but I have no clue on how to visualize the value of f on each of those points.
import plotly.graph_objects as go
import numpy as np
fig = go.Figure(layout=go.Layout(title=go.layout.Title(text=title), hovermode=False))
# Create mesh grid for spherical coordinates
phi, theta = np.mgrid[0.0:np.pi:100j, 0.0:2.0 * np.pi:100j]
# Get Cartesian mesh grid
x = np.sin(phi) * np.cos(theta)
y = np.sin(phi) * np.sin(theta)
z = np.cos(phi)
# Plot sphere surface
self.fig.add_surface(x=x, y=y, z=z, opacity=0.35)
fig.show()
I would imagine/expect/like a visualization like this
Additionally, I also have the gradient of f calculated in closed-form (i.e., for each (x,y,z) I calculate the 3D-dimensional gradient of f). Is there a way of plotting this vector field, similarly to what is shown in the figure above?
Here's an answer that's far from perfect, but hopefully that's enough for you to build on.
For the sphere itself, I don't know of any "shortcut" to do something like that in plotly, so my approach is simply to manually create a sphere mesh. Generating the vertices is simple, for example like you did - the slightly more tricky part is figuring out the vertex indices for the triangles (which depends on the vertex generation scheme). There are various algorithms to do that smoothly (i.e. generating a sphere with no "tip"), I hacked something crude just for the demonstration. Then we can use the Mesh3d object to display the sphere along with the intensities and your choice of colormap:
N = 100 # Sphere resolution (both rings and segments, can be separated to different constants)
theta, z = np.meshgrid(np.linspace(-np.pi, np.pi, N), np.linspace(-1, 1, N))
r = np.sqrt(1 - z ** 2)
x = r * np.cos(theta)
y = r * np.sin(theta)
x = x.ravel()
y = y.ravel()
z = z.ravel()
# Triangle indices
indices = np.arange(N * (N - 1) - 1)
i1 = np.concatenate([indices, (indices // N + 1) * N + (indices + 1) % N])
i2 = np.concatenate([indices + N, indices // N * N + (indices + 1) % N])
i3 = np.concatenate([(indices // N + 1) * N + (indices + 1) % N, indices])
# Point intensity function
def f(x, y, z):
return (np.cos(x * 2) + np.sin(y ** 2) + np.sin(z) + 3) / 6
fig = go.Figure(data=[
go.Mesh3d(
x=x,
y=y,
z=z,
colorbar_title='f(x, y, z)',
colorscale=[[0, 'gold'],
[0.5, 'mediumturquoise'],
[1, 'magenta']],
intensity = f(x, y, z),
i = i1,
j = i2,
k = i3,
name='y',
showscale=True
)
])
fig.show()
This yields the following interactive plot:
To add the vector field you can use the Cone plot; this requires some tinkering because when I simply draw the cones at the same x, y, z position as the sphere, some of the cones are partially or fully occluded by the sphere. So I generate another sphere, with a slightly larger radius, and place the cones there. I also played with some lighting parameters to make it black like in your example. The full code looks like this:
N = 100 # Sphere resolution (both rings and segments, can be separated to different constants)
theta, z = np.meshgrid(np.linspace(-np.pi, np.pi, N), np.linspace(-1, 1, N))
r = np.sqrt(1 - z ** 2)
x = r * np.cos(theta)
y = r * np.sin(theta)
x = x.ravel()
y = y.ravel()
z = z.ravel()
# Triangle indices
indices = np.arange(N * (N - 1) - 1)
i1 = np.concatenate([indices, (indices // N + 1) * N + (indices + 1) % N])
i2 = np.concatenate([indices + N, indices // N * N + (indices + 1) % N])
i3 = np.concatenate([(indices // N + 1) * N + (indices + 1) % N, indices])
# Point intensity function
def f(x, y, z):
return (np.cos(x * 2) + np.sin(y ** 2) + np.sin(z) + 3) / 6
# Vector field function
def grad_f(x, y, z):
return np.stack([np.cos(3 * y + 5 * x),
np.sin(z * y),
np.cos(4 * x - 3 * y + z * 7)], axis=1)
# Second sphere for placing cones
N2 = 50 # Smaller resolution (again rings and segments combined)
R2 = 1.05 # Slightly larger radius
theta2, z2 = np.meshgrid(np.linspace(-np.pi, np.pi, N2), np.linspace(-R2, R2, N2))
r2 = np.sqrt(R2 ** 2 - z2 ** 2)
x2 = r2 * np.cos(theta2)
y2 = r2 * np.sin(theta2)
x2 = x2.ravel()
y2 = y2.ravel()
z2 = z2.ravel()
uvw = grad_f(x2, y2, z2)
fig = go.Figure(data=[
go.Mesh3d(
x=x,
y=y,
z=z,
colorbar_title='f(x, y, z)',
colorscale=[[0, 'gold'],
[0.5, 'mediumturquoise'],
[1, 'magenta']],
intensity = f(x, y, z),
i = i1,
j = i2,
k = i3,
name='y',
showscale=True
),
go.Cone(
x=x2, y=y2, z=z2, u=uvw[:, 0], v=uvw[:, 1], w=uvw[:, 2], sizemode='absolute', sizeref=2, anchor='tail',
lighting_ambient=0, lighting_diffuse=0, opacity=.2
)
])
fig.show()
And yields this plot:
Hope this helps. There are a lot of tweaks to the display, and certainly better ways to construct a sphere mesh (e.g. see this article), so there should be a lot of freedom there (albeit at the cost of some work).
Good luck!
I watched some tutorials and tried to create a Perlin noise generator in python.
It takes in a tuple for the number of vectors in the x and y directions and a scale for the distance in pixels between the arrays, then calculates the dot product between each pixel and each of the 4 arrays surrounding it, It then interpolates them bilinearly to get the pixel's value.
here's the code:
from PIL import Image
import numpy as np
scale = 16
size = np.array([8, 8])
vectors = []
for i in range(size[0]):
for j in range(size[1]):
rand = np.random.rand() * 2 * np.pi
vectors.append(np.array([np.cos(rand), np.sin(rand)]))
interpolated_map = np.zeros(size * scale)
def interpolate(x1, x2, w):
t = (w % scale) / scale
return (x2 - x1) * t + x1
def dot_product(a, b):
return a[0] * b[0] + a[1] * b[1]
for i in range(size[1] * scale):
for j in range(size[0] * scale):
dot_products = []
for m in range(4):
corner_vector_x = round(i / scale) + (m % 2)
corner_vector_y = round(j / scale) + int(m / 2)
x = i - corner_vector_x * scale
y = j - corner_vector_y * scale
if corner_vector_x >= size[0]:
corner_vector_x = 0
if corner_vector_y >= size[1]:
corner_vector_y = 0
corner_vector = vectors[corner_vector_x + corner_vector_y * (size[0])]
distance_vector = np.array([x, y])
dot_products.append(dot_product(corner_vector, distance_vector))
x1 = interpolate(dot_products[0], dot_products[1], i)
x2 = interpolate(dot_products[2], dot_products[3], i)
interpolated_map[i][j] = (interpolate(x1, x2, j) / 2 + 1) * 255
img = Image.fromarray(interpolated_map)
img.show()
I'm getting this image:
but I should be getting this:
I don't know what's going wrong, I've tried watching multiple different tutorials, reading a bunch of different articles, but the result is always the same.
Essentially, I will have some 3d intensity distribution, where in a simplified version of my problem I will have some Gaussian centered at a point in 3D space, represented by a 3D numpy array, defined as:
I = np.zeros((ny, nx, nz))
tolerance = 1e-4 # minimum value of Gaussian for compact representation
sigma = x[1]-x[0] # Gaussian width
max_dist = sigma*(-2*np.log(tolerance))
di = np.ceil(max_dist/dx) # maximum distance in compact representation, in index format
# Create intensity field/true Gaussian
# this exists separately as its own function synth_I() where [0] is instead for each particle [i]
ix = round((xp[0] - x[0]) / dx) # index of particle center
iy = round((yp[0] - y[0]) / dy)
iz = round((yp[0] - y[0]) / dz)
iix = np.arange(max(0, ix - di), min(nx, ix + di), 1, dtype=int) # grid points with nonzero intensity values
iiy = np.arange(max(0, iy - di), min(ny, iy + di), 1, dtype=int)
iiz = np.arange(max(0, iz - di), min(nz, iz + di), 1, dtype=int)
ddx = dx * iix - xp[0] # distance between particle center and grid point
ddy = dy * iiy - yp[0]
ddz = dz * iiz - zp[0]
gx = np.exp(-1 / (2 * sigma ** 2) * ddx ** 2) # 1D Gaussian
gy = np.exp(-1 / (2 * sigma ** 2) * ddy ** 2)
gz = np.exp(-1 / (2 * sigma ** 2) * ddz ** 2)
gx = gx[np.newaxis,:, np.newaxis]
gy = gy[:,np.newaxis, np.newaxis]
gz = gz[np.newaxis, np.newaxis, :]
I[np.ix_(iiy, iix, iiz)] = I[np.ix_(iiy, iix, iiz)] + gy*gx*gz
The idea is to fit some a series of Gaussians, varying their amplitudes, centered at a finite number of grid points (chosen as those with an intensity above a minimum threshold, 0.5 in this case) to some unknown intensity distribution, using a gradient descent algorithm due to the convexity of the problem but also due to scaling concerns.
We want to minimize where x_v represents all the grid points, while G represents the value of a Gaussian centered at grid point x_i at a given grid point x_v with amplitude s_i. This has the following gradient:
And this gradient is implemented with the following code:
def diff_and_grad(s): # ping test this:
part_params = np.concatenate((xpart, ypart, zpart, s)) # for own code
# create an intensity field as a combination of Gaussians as above
synthI = synth_I_field_compact(part_params, nd, sigma, x, y, z)
Idiff = I - synthI # difference in measurements
f = 0.5 * np.sum(np.power(Idiff, 2)) # objective function
g = np.zeros(Np) # gradient
for i in range(0, Np):
ix = round((xpart[i] - x[0]) / dx)
iy = round((ypart[i] - y[0]) / dy)
iz = round((zpart[i] - z[0]) / dz)
iix = np.arange(max(0, ix - di), min(nx, ix + di), 1, dtype=int)
iiy = np.arange(max(0, iy - di), min(ny, iy + di), 1, dtype=int)
iiz = np.arange(max(0, iz - di), min(nz, iz + di), 1, dtype=int)
ddx = dx * iix - xpart[i]
ddy = dy * iiy - ypart[i]
ddz = dz * iiz - zpart[i]
gx = np.exp(-1 / (2 * sigma ** 2) * ddx ** 2)
gy = np.exp(-1 / (2 * sigma ** 2) * ddy ** 2)
gz = np.exp(-1 / (2 * sigma ** 2) * ddz ** 2)
Id = Idiff[np.ix_(iiy, iix, iiz)]
g[i] = -Id.dot(gz).dot(gx).dot(gy) # gradient is -product of local intensity difference with gaussian centered at grid point
return f, g
However, for an initial estimate of amplitude taken as the value of the intensity measurements at the corresponding grid points, this analytical gradient differs from one found with a finite difference scheme, and thus the cg algorithm does not work:
However, after a couple days of debugging, I have been unable to find the source of the problem. A similar method of calculating the gradients for a different problem is used with much of the same code, and as well this implementation works in 2D without the z axis contributions as well. There must be something fundamental I am missing, but I am not sure what.
I have a copy of the full test code at https://pastebin.com/rhs4tasZ for any additional information needed, or for anyone who wants to run this code themselves
Fixed the problem, I'll close this question now. I extracted the wrong parameters for z in synth_image
I need to reconstruct a heighfield f(x,y) -> z from a vector field g(x,y) -> (a,b,c), according to a set of given equations, but I have an issue for computing the gradient (I am using L_BFGS_B because I think it's the only optimizer that will allow me to not explode memory).
I am having the following conflict:
Images are 2d arrays, gradient is computed along each axis and needs two values per pixel
Heightfield is only 1 variable per pixel (the height)
Therefore, after computing gradient, scipy complains that the function cannot return as gradient an array with (img_x * img_y, 2) shape. Only 1D shape with exact same count than heightfield array.
Am I missing something obvious here ?
Detailed problem
I define three variables nx, ny, nz:
nx = g(x,y).x
ny = g(x,y).y
nz = g(x,y).z
Now, for each point (x,y) of the grid, I need to compute the height of the point (f(x,y)), according to the following equations:
nz * [f(x+1, y) - f(x,y)] == nx
nz * [f(x, y+1) - f(x,y)] == ny
I have tried expressing this in a loss function:
class Eval():
def __init__(self, g):
'''
g: (x, y, 3)
'''
self.g = g
def loss(self, x):
depth = x.reshape(self.g.shape[:2])
x_roll = np.roll(depth, -1, axis=0)
y_roll = np.roll(depth, -1, axis=1)
dx = depth - x_roll
dy = depth - y_roll
nx = self.g[:,:,0]
ny = self.g[:,:,1]
nz = self.g[:,:,2]
loss_x = nz * dx - nx
loss_y = nz * dy - ny
self.error_loss = np.stack([loss_x, loss_y], axis=-1)
total_loss = norm(self.error_loss, axis=-1)
return np.sum(total_loss)
def grads(self, x):
x_roll = np.roll(self.error_loss[:,:,0], -1, axis=0)
y_roll = np.roll(self.error_loss[:,:,1], -1, axis=1)
dx = self.error_loss[:,:,0] - x_roll
dy = self.error_loss[:,:,1] - y_roll
g_xy = np.stack([dx, dy], axis=-1)
# g_xy has shape (x, y, 2)
# BUT THIS MUST RETURN (x * y), not (x * y, 2) nor (x * y * 2)
# WHAT SHOULD I RETURN HERE ? ||g_xy|| ?
And call it like this
vector_field = ... # shape: 1024, 1024, 3
x0 = np.random.uniform(size=vector_field.shape[:2])
ev = Eval(vector_field)
result = optim.fmin_l_bfgs_b(ev.loss, x0, fprime=ev.grads)
Questions
Is there a way to express those equations for resolving them in another way ? In theory, they are overconstrained linear system that could be resolved with least squares approach, but I cannot find how to reformulate the equations in the form of Ax = b
Otherwise what should I return for my gradient ? or I am computing it properly ?