I have a batch of depth images, shape -> [B, 1, H, W]. For each pixel in each image of the batch I need to perform:
X = d * Kinverse # [u, v, 1] #therefore X is in R^3
where d is float tensor[0;1] representing depth at pixel u,v; Kinverse is a constant 3X3 matrix and u, v refer to the pixel column and row respectively.
Is there some way I can vectorize the operation to obtain X(u+1,v), X(u,v) and X(u,v+1) for all the images in the batch.
I eventually need to take this cross product:
{X(u+1,v) - X(u,v)} x {X(u, v+1) - X(u,v)}
Thanks for the help!
You can use torch.meshgrid to produce the u and v tensors. Once you have them, you can use torch.einsum to do the batched matrix multiplication with Kinverse.
Finally, you can use torch.cross to compute the cross product:
u, v = torch.meshgrid(*[torch.arange(s_, dtype=d.dtype, device=d.device) for s_ in d.shape[2:]])
# make a single 1x1xHxW for [u v 1] per pixel:
uv = torch.cat((u[None, None, ...], v[None, None, ...], torch.ones_like(u)[None, None, ...]), dim=1)
# compute X
X = d * torch.einsum('ij,bjhw->bihw',Kinverse,uv)
# the cross product
out = torch.cross(X[..., 1:, :-1] - X[..., :-1, :-1], X[..., :-1, 1:] - X[..., :-1, :-1], dim=1)
Related
I am trying to find the closest matches between two batches of pytorch tensors. Assuming I have a batch of mxn tensors with batch size b1 and a batch of mxn tensors with batch size b2, I would like to find:
The distance between each mxn tensor in batch b1 and each mxn tensor in batch b2. This distance matrix would be of size b1xb2.
For each tensor in b1, I would like the batch index of the closest (by distance) tensor in b2.
I define distance as the sum of the elementwise squared Euclidean distance between corresponding elements in each tensor. For example, if the first tensor in b1 (i.e. batch index = 0) is [[a, b, c], [d, e, f], [g, h, i], [j, k, l]] and the first tensor in b2 (i.e. batch index = 0) is [[z, y, x], [w, v, u], [t, s, r]], then the distance between b1 and b2 is: (a-z)^2 + (b-y)^2 + (c-x)^2 + (d-w)^2 + (e-v)^2 +(f-u)^2 +...+(l-r)^2
Here's what I have tried:
a = torch.rand((3, 3, 4))
b = torch.rand((5, 3, 4))
flat_a = torch.flatten(a, start_dim = 1)
flat_b = torch.flatten(b, start_dim = 1)
torch.cdist(flat_a, flat_b)
Which gives me a 3x5 matrix that I hope is correct. And I would now like to return the batch indices of the 3x4 tensors in b that are the closest matches to the tensors in a.
Thanks
Let's call the distance tensor dist. All you have to do is:
b_idx = torch.argmin(dist,dim = 1) # returns tensor of shape [3]
which returns the indices into b along dimension 0.
I have a tensor input of dimensions (B,C,H,W) and I would like to find a correlation matrix of the input. The code I am using is :
def corr(x):
"""
x: [B, C, H, W]
"""
# [B, C, H, W] -> [B, C, H * W]
x = x.view((x.size(0), x.size(1), -1))
# estimated covariance
x = x - x.mean(dim=-1, keepdim=True)
factor = 1 / (x.shape[-1] - 1)
cov = factor * (x # x.transpose(-1, -2))
return torch.div(cov,torch.diagonal(cov, dim1=-2, dim2=-1))
So I rechecked myself and it looks like I am getting good results for the cov variable in a function but when I try to normalize it to get the correlation, the result's range is very strange, there are values above 1 and below -1, and overall the solution does not seem to be right.
Any suggestions on how to solve the problem?
I am trying to implement PCA without any library for image dimension reduction. I tried the code in the O'Reilly Computer Vision book and implement it on a sample lenna picture:
from PIL import Image
from numpy import *
def pca(X):
num_data, dim = X.shape
mean_X = X.mean(axis=0)
X = X - mean_X
if dim > num_data:
# PCA compact trick
M = np.dot(X, X.T) # covariance matrix
e, U = np.linalg.eigh(M) # calculate eigenvalues an deigenvectors
tmp = np.dot(X.T, U).T
V = tmp[::-1] # reverse since the last eigenvectors are the ones we want
S = np.sqrt(e)[::-1] #reverse since the last eigenvalues are in increasing order
for i in range(V.shape[1]):
V[:,i] /= S
else:
# normal PCA, SVD method
U,S,V = np.linalg.svd(X)
V = V[:num_data] # only makes sense to return the first num_data
return V, S, mean_X
img=color.rgb2gray(io.imread('D:\lenna.png'))
x,y,z=pca(img)
plt.imshow(x)
but the image plot of the pca doesnt look like the original image like at all.
As far as i know PCA kinda reduce the image dimension but it will still somehow resemble the original image but in lower detail. Whats wrong with the code?
Well, nothing is wrong per se in your code, but you're not displaying the right thing if I do understand what you actually want to do!
What I would write for your problem is the following:
def pca(X, number_of_pcs):
num_data, dim = X.shape
mean_X = X.mean(axis=0)
X = X - mean_X
if dim > num_data:
# PCA compact trick
M = np.dot(X, X.T) # covariance matrix
e, U = np.linalg.eigh(M) # calculate eigenvalues an deigenvectors
tmp = np.dot(X.T, U).T
V = tmp[::-1] # reverse since the last eigenvectors are the ones we want
S = np.sqrt(e)[::-1] #reverse since the last eigenvalues are in increasing order
for i in range(V.shape[1]):
V[:,i] /= S
return V, S, mean_X
else:
# normal PCA, SVD method
U, S, V = np.linalg.svd(X, full_matrices=False)
# reconstruct the image using U, S and V
# otherwise you're just outputting the eigenvectors of X*X^T
V = V.T
S = np.diag(S)
X_hat = np.dot(U[:, :number_of_pcs], np.dot(S[:number_of_pcs, :number_of_pcs], V[:,:number_of_pcs].T))
return X_hat, S, mean_X
The change here lies in the fact that we want to reconstruct the image using a given number of eigenvectors (determined by number_of_pcs).
The thing to remember is that in np.linalg.svd, the columns of U are the eigenvectors of X.X^T.
When doing that, we obtain the following results (displayed here using 1 and 10 principal components):
X_hat, S, mean_X = pca(img, 1)
plt.imshow(X_hat)
X_hat, S, mean_X = pca(img, 10)
plt.imshow(X_hat)
PS: note that the picture aren't displayed in grayscale because of matplotlib.pyplot, but this is a very minor issue here.
I'm having a problem for just one point (x, y) of the image and having already calculated the transformation matrix on the two images calculate what the corresponding point (x, y) in the second image.
If i have a pixel point [510,364] from my source image and de transformation matrix that i already calculate:
Matrix Transform: [[ 7.36664511e-01 3.38845039e+01 2.17700574e+03]
[-1.16261372e+00 6.30840432e+01 8.09587058e+03]
[ 4.28933532e-05 8.15551141e-03 1.00000000e+00]]
i can get my new point : [3730,7635]
How can i do this?
h, status =cv2.findHomography(arraypoints_fire,arraypoints_vertical)
warped_image = cv2.warpPerspective(fire_image_open, h, (vertical_image_open.shape[1],vertical_image_open.shape[0]))
cv2.namedWindow('Warped Source Image', cv2.WINDOW_NORMAL)
cv2.imshow("Warped Source Image", warped_image)
cv2.namedWindow('Overlay', cv2.WINDOW_NORMAL)
overlay_image=cv2.addWeighted(vertical_image_open,0.3,warped_image,0.8,0)
cv2.imshow('Overlay',overlay_image)
I've met the same problem and found an answer here, but on cpp. According docs, opencv warpPerspective uses this formula, where
src - input image.
dst - output image that has the size dsize and the same type as src.
M - 3×3 transformation matrix (inverted).
You can use it directly on the point:
# M - transform matrix, created with cv2.perspectiveTransform
def warp_point(x: int, y: int) -> tuple[int, int]:
d = M[2, 0] * x + M[2, 1] * y + M[2, 2]
return (
int((M[0, 0] * x + M[0, 1] * y + M[0, 2]) / d), # x
int((M[1, 0] * x + M[1, 1] * y + M[1, 2]) / d), # y
)
UPD:
I've found one more answer, but on python :D
It seems, I forgot brackets, in first part, fixed.
I'm trying to implement the loss function of the classic Image Colorization paper by Levin et al (2004) in Tensorflow/Keras:
This is the weights equation (correlation between intensities):
y is every neighboring pixel of x in a 3x3 window and w is the weight for each of these pixels.
The weights require computing the mean and variance for the neighborhood of every pixel.
I couldn't find a function that would allow me to write this loss function in a symbolic way, and I'm thinking I should write it in a loop where I calculate the w for each window.
How can I write this Loss function in Tensorflow In a Symbolic way or in loops?
Thanks so much.
EDIT: Here's the code I've come up for calculating the weights in Numpy:
import cv2
import numpy as np
im = cv2.resize(cv2.imread('./Image.jpg', 0), (256, 256)) / np.float32(255.0)
M = 3
N = 3
# Split the image into 3x3 windows
windows = [im[x:x + M, y:y + N] for x in range(0, im.shape[0], M) for y in range(0, im.shape[1], N)]
# Calculate the correlation for each window
weights = [1 + np.corrcoef(tile) for tile in windows]
I think this code computes the value in your formula:
import tensorflow as tf
from itertools import product
SIGMA = 1.0
dtype = tf.float32
# Input images batch
img = tf.placeholder(dtype, [None, None, None])
img_shape = tf.shape(img)
img_height = img_shape[1]
img_width = img_shape[2]
# Compute 3 x 3 block means
mean_filter = tf.ones((3, 3), dtype) / 9
img_mean = tf.nn.conv2d(img[:, :, :, tf.newaxis],
mean_filter[:, :, tf.newaxis, tf.newaxis],
[1, 1, 1, 1], 'VALID')[:, :, :, 0]
# Remove 1px border
img_clip = img[:, 1:-1, 1:-1]
# Difference between pixel intensity and its block mean
x_diff = img_clip - img_mean
# Compute neighboring pixel loss contributions
contributions = []
for i, j in product((-1, 0, 1), repeat=2):
if i == j == 0: continue
# Take "shifted" image
displaced_img = img[:, 1 + i:img_width - 1 + i, 1 + j:img_height - 1 + j]
# Compute difference with mean of corresponding pixel block
y_diff = displaced_img - img_mean
# Weights formula
weight = 1 + x_diff * y_diff / (SIGMA ** 2)
# Contribution of this displaced image to the loss of each pixel
contribution = weight * displaced_img
contributions.append(contribution)
contributions = tf.add_n(contributions)
# Compute loss value
loss = tf.reduce_sum(tf.squared_difference(img_clip, contributions))
The loss for the pixels along the image border is not computed, since in principle is not well defined in the formula, although you could make a few changes to take them into account if you want (change convolution to "'SAME'", pad where necessary, etc.).
this is a mean squared error of a 3 x 3 windows. right?
sounds like a GLCM matrix for texture analysis do you want apply this loss function for every 3x3 windows in the image?
I think that is better build the function that make this calculation with a Random weight in Numpy so after try build with TF to try a optimization.