The goal is to extract a random 2x5 patch from a 5x10 image, and do so randomly for all images in a batch. Looking to write a faster implementation that avoids for loops. Haven't been able to figure out how to use the torch .gather operation with two index arrays (idx_h and idx_w in code example).
Naive for loop:
import torch
b = 3 # batch size
h = 5 # height
w = 10 # width
crop_border = (3, 5) # number of pixels (height, width) to crop
x = torch.arange(b * h * w).reshape(b, h, w)
print(x)
dh_ = torch.randint(0, crop_border[0], size=(b,))
dw_ = torch.randint(0, crop_border[1], size=(b,))
_dh = h - (crop_border[0] - dh_)
_dw = w - (crop_border[1] - dw_)
idx_h = torch.stack([torch.arange(d_, _d) for d_, _d in zip(dh_, _dh)])
idx_w = torch.stack([torch.arange(d_, _d) for d_, _d in zip(dw_, _dw)])
print(idx_h, idx_w)
new_shape = (b, idx_h.shape[1], idx_w.shape[1])
cropped_x = torch.empty(new_shape)
for batch in range(b):
for height in range(idx_h.shape[1]):
for width in range(idx_w.shape[1]):
cropped_x[batch, height, width] = x[
batch, idx_h[batch, height], idx_w[batch, width]
]
print(cropped_x)
Index arrays needed to be repeated and reshaped to work with gather operation. Fast_crop code based pytorch discussion: https://discuss.pytorch.org/t/similar-to-torch-gather-over-two-dimensions/118827
def fast_crop(x, idx1, idx2):
"""
Compute
x: N x B x V
idx1: N x K matrix where idx1[i, j] is between [0, B)
idx2: N x K matrix where idx2[i, j] is between [0, V)
Return:
cropped: N x K matrix where y[i, j] = x[i, idx1[i,j], idx2[i,j]]
"""
x = x.contiguous()
assert idx1.shape == idx2.shape
lin_idx = idx2 + x.size(-1) * idx1
x = x.view(-1, x.size(1) * x.size(2))
lin_idx = lin_idx.view(-1, lin_idx.shape[1] * lin_idx.shape[2])
cropped = x.gather(-1, lin_idx)
return cropped.reshape(idx1.shape)
idx1 = torch.repeat_interleave(idx_h, idx_w.shape[1]).reshape(new_shape)
idx2 = torch.repeat_interleave(idx_w, idx_h.shape[1], dim=0).reshape(new_shape)
cropped = fast_crop(x, idx1, idx2)
(cropped == cropped_x).all()
Using realistic numbers for b = 100, h = 100, w = 130 and crop_border = (40, 95), a 10 trial run takes the for loop 32s while fast_crop only 0.043s.
Related
input is a list of observations, every observation is fixed size set of elipses (every elipse is represented by 7 parameters).
output is a list of images, one image for one observation, we are basically putting elipses from observation to completely white image. If few elipses overlap then we are putting mean value of rgb values.
n, m = size of image in pixels, image is represented as (n, m, 3) numpy array (3 because of RGB coding)
N = number of elipses in every individual observation
xx, yy = np.mgrid[:n, :m]
def elipses_population_to_img_population(elipses_population):
population_size = elipses_population.shape[0]
img_population = np.empty((population_size, n, m, 3))
for j in range(population_size):
imarray = np.empty((N, n, m, 3))
imarray.fill(np.nan)
for i in range(N):
x = elipses_population[j, i, 0]
y = elipses_population[j, i, 1]
R = elipses_population[j, i, 2]
G = elipses_population[j, i, 3]
B = elipses_population[j, i, 4]
a = elipses_population[j, i, 5]
b = elipses_population[j, i, 6]
xx_centered = xx - x
yy_centered = yy - y
elipse = (xx_centered / a)**2 + (yy_centered / b)**2 < 1
imarray[i, elipse, :] = np.array([R, G, B])
means_img = np.nanmean(imarray, axis=0)
means_img = np.nan_to_num(means_img, nan=255)
img_population[j, :, :, :] = means_img
return img_population
Code is working correctly, but i am looking for optimization advices. I am running it many times in my code so every small improve would be helpful.
I'm trying to achieve linear interpolation, where the data points are N images of shape: HxWx3 (stored in buf (NxHxWx3)), and the points to interpolate are specified in another (2D) grid (interp_values).
Non-vectorizable approach:
In principle I have made interp_values a HxW grid with values 0..N-1 indicating for each i,j element from which image (in buf) to read it from, including fractional values meaning interpolation.
E.g.: a value of 3.6 means blend 40% (1-0.6) of image 3 with 60% (0.6) of image 4. However with this approach it is quite impossible to vectorize the code, and performance was poor.
One vectorization approach:
So I changed interp_values to be a NxHxWx3 grid with values 0..1. Each column :,i,j,c would specify blend coefficients for the N images, where only 1 or 2 elements are non-zero, e.g. for 3.6 we have: [0, 0, 0, 0.6, 0.4, 0, 0, ...]. I can convert interp_values from HxW to NxHxWx3 with:
def expand_interp_values(interp_values):
r = np.zeros((N,) + interp_values.shape + (3,))
for i in range(interp_values.shape[0]):
for j in range(interp_values.shape[1]):
v = interp_values[i, j]
a, b, x = math.floor(v), math.ceil(v), math.fmod(v, 1)
if int(a) == int(b):
r[a, i, j, :] = 3 * [1]
else:
r[a, i, j, :] = 3 * [1 - x]
r[b, i, j, :] = 3 * [x]
return r
This representation is more sparse (many zeros) but now interpolation can be computed as element-wise multiplication between buf and interp_values (the multiplication part of the linear interpolation) followed by a sum(..., axis=0) (i.e. the addition part of the linear interpolation):
def linear_interp(data, interp_values):
return np.sum(data * interp_values, axis=0)
With this approach, there is some performance improvement, however it seems with this approach the CPU will be most of the times busy computing x1*0, x2*0, ... or 0 + 0 + 0...
Can this be improved any better?
Additionally, the creation of the expanded interp_values grid is not vectorized, so perhaps performance would be bad if that grid has to be updated continuously.
Complete python+opencv code:
import cv2
import numpy as np
import math
vid = cv2.VideoCapture(0)
vid.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
vid.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)
# store last N images into a NxHxWx3 grid (circular buffer):
N = 25
buf = None
interp_values = None
DOWNSAMPLING = 6
def linear_interp(data, interp_values):
return np.sum(data * interp_values / 256, axis=0)
def expand_interp_values(interp_values):
r = np.zeros((N,) + interp_values.shape + (3,))
for i in range(interp_values.shape[0]):
for j in range(interp_values.shape[1]):
v = interp_values[i, j]
a, b, x = math.floor(v), math.ceil(v), math.fmod(v, 1)
if int(a) == int(b):
r[a, i, j, :] = 3 * [1]
else:
r[a, i, j, :] = 3 * [1 - x]
r[b, i, j, :] = 3 * [x]
return r
while True:
ret, frame = vid.read()
H, W, Ch = frame.shape
frame = cv2.resize(frame, dsize=(W//DOWNSAMPLING, H//DOWNSAMPLING), interpolation=cv2.INTER_LINEAR)
# circular buffer:
if buf is None:
buf = np.zeros((N,) + frame.shape, dtype=np.uint8)
# there should be a simpler way to a FIFO-grid...
for i in reversed(range(1, N)):
buf[i] = buf[i - 1]
buf[0] = frame
if interp_values is None:
# create a lookup pattern here:
interp_values = np.zeros(frame.shape[:2])
for i in range(frame.shape[0]):
for j in range(frame.shape[1]):
y = i / (frame.shape[0] - 1) * 2 - 1
x = j / (frame.shape[1] - 1) * 2 - 1
#interp_values[i, j] = (N - 1) * min(1, math.hypot(x, y))
interp_values[i, j] = (N - 1) * (y + 1) / 2
interp_values = expand_interp_values(interp_values)
im = linear_interp(buf, interp_values)
im = cv2.resize(im, dsize=(W, H), interpolation=cv2.INTER_LANCZOS4)
cv2.imshow('image', im)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
vid.release()
cv2.destroyAllWindows()
I have an image INPUT, a painted CANVAS, and a homography matrix H, this code below will "crop" the CANVAS (bigger than INPUT) which was warped using the homography matrix H to the size of the INPUT. But so far my code is so inefficient that the loop starts to slow down the whole process. The example code below will effectively produce 1280*720 of loops for 3 channels image (I will be dealing with larger channels image/tensor). Is there a way to optimize/vectorize this process? Thanks in advance!
INPUT = np.zeros(720, 1280, 3)
CANVAS = np.zeros(1548, 1104, 3)
H = np.random.uniform(size=(3, 3))
inputIm_shape = INPUT.shape
h, w, c = inputIm_shape
xs, ys, a = [], [], np.zeros((w, h))
# this is where the bottleneck happens
for index, _ in np.ndenumerate(a):
xs.append(index[0]), ys.append(index[1])
input_coords = np.vstack(
(np.array(xs), np.array(ys), np.ones(len(xs))))
transformed = np.matmul(H, input_coords)
transformed[0, :] = np.divide(transformed[0, :], transformed[2, :])
transformed[1, :] = np.divide(transformed[1, :], transformed[2, :])
map_image = np.zeros(inputIm_shape)
hc, wc, cc = CANVAS.shape
badcount = 0
# and this too
for k in range(0, input_coords.shape[1]):
if int(transformed[0, k]) < 0 or int(transformed[0, k]) > wc - 1 or int(transformed[1, k]) < 0 or int(transformed[1, k]) > hc - 1:
badcount += 1
x_input = max(min(int(input_coords[0, k]), w - 1), 0)
y_input = max(min(int(input_coords[1, k]), h - 1), 0)
x_canvas = max(min(int(transformed[0, k]), wc - 1), 0)
y_canvas = max(min(int(transformed[1, k]), hc - 1), 0)
map_image[y_input, x_input] = CANVAS[y_canvas, x_canvas]
So I have this 3x3 G matrix (not shown here, it's irrelevant to my problem) that I created using the two variables u (a vector, x - y) and the scalar k. x_j = (x_1 (j), x_2 (j), x_3 (j)) and y_j = (y_1 (j), y_2 (j), y_3 (j)). alpha_j is a 3x3 matrix. The A matrix is block diagonal matrix of size 3nx3n. I am having trouble with the W matrix. How do I code a matrix of size 3nx3n, where the (i,j)th block is the 3x3 matrix given by alpha_i*G_[ij]*alpha_j?? I am lost.
My alpha_j matrix also seems to be having some trouble. The loop keeps throwing me the error, "only length-1 arrays can be converted to Python scalars." pls help :/
def W(x, y, k, alpha, A):
u = x - y
n = x.shape[0]
W = np.zeros((3*n, 3*n))
for i in range(0, n-1):
for j in range(0, n-1):
#u = -np.array([[x[i,0] - x[j,0]], [x[i,1] - x[j,1]], [0]]) ??
W[i][j] = (alpha_j(alpha, A) * G(u, k) * alpha_j(alpha, A))
W[i][i] = np.zeros((n, n))
return W
def alpha_j(a, A):
alph = np.array([[0,0,0],[0,0,0],[0,0,0]],complex)
rho = np.random.rand(3,1)
for i in range(0, 2):
for j in range(0, 2):
alph[i][j] = (rho[i] * a * A[i][j])
return alph
#-------------------------------------------------------------------
x1 = np.array([[1], [2], [0]])
y1 = np.array([[4], [5], [0]])
# SYSTEM PARAMETERS
# incoming Wave angle
theta = 0 # can range from [0, 2pi)
# susceptibility
chi = 10 + 1j
# wavelength
lam = 0.5 # microns (values between .4-.7)
# frequency
k = (2 * np.pi)/lam # 1/microns
# volume
V_0 = (0.05)**3 # microns^3
# incoming wave vector
K = k * np.array([[0], [np.sin(theta)], [np.cos(theta)]])
# polarization vector
vecinc = np.array([[1], [0], [0]]) # (can choose any vector perpendicular to K)
# for the fixed alpha case
alpha = (V_0 * 3 * chi)/(chi + 3)
# 3 x 3 matrix
A = np.matlib.identity(3) # could be any symmetric matrix,
#-------------------------------------------------------------------
# TEST FUNCTIONS
test = G((x1-y1), k)
print(test)
w = W(x1, y1, k, alpha, A)
print(w)
Sometimes my W loops throws me the error, "can't set an array element with a sequence." But I need to set each array element in this arbitrary matrix W to the 3x3 matrix created by multiplying alpha by G...
To your question of how to create a new array with a block for each element, the following should do the trick:
G = np.random.random([3,3])
result = np.zeros([9,9])
num_blocks = 3
a = np.random.random([3,3])
b = np.random.random([3,3])
for i in range(G.shape[0]):
for j in range(G.shape[1]):
block_result = a*G[i,j]*b
for k in range(num_blocks):
for l in range(num_blocks):
result[3*i + k, 3*j + l] = block_result[i, j]
You should be able to generalize from there. I hope I've understood correctly.
EDIT: It looks like I haven't understood correctly. I'm leaving it in hopes it spurs you to an answer. The general idea is to generate ranges of indices to operate on, and then just operate on them directly. Slicing might be helpful, too.
Ah, you asked how to create a diagonal filled with blocks. In that case:
num_diagonal_blocks = 3 # for example
for block_dim in range(num_diagonal_blocks)
# do your block calculation...
for k in range(G.shape[0]):
for l in range(G.shape[1]):
result[3*block_dim + k, 3*block_dim + l] = # assign to element of block
I think that's nearly it.
I have a 3D array (a 2D array of vectors), of which I want to transform each vector with a rotation matrix. The rotations are in two separate 2D arrays of radians angle values called cols and rows.
I've been able to have NumPy compute the angles for me already, without a Python loop. Now I'm looking for a way to have NumPy generate the rotation matrices, too, hopefully resulting in a great performance boost.
size = img.shape[:2]
# Create an array that assigns each pixel the percentage of
# the correction (value between -1 and 1, distributed linearly).
cols = np.array([np.arange(size[1]) for __ in range(size[0])]) / (size[1] - 1) * 2 - 1
rows = np.array([np.arange(size[0]) for __ in range(size[1])]).T / (size[0] - 1) * 2 - 1
# Atan distribution based on F-number and Sensor size.
cols = np.arctan(sh * cols / (2 * f))
rows = np.arctan(sv * rows / (2 * f))
### This is the loop that I would like to remove and find a
### clever way to make NumPy do the same operation natively.
for i in range(size[0]):
for j in range(size[1]):
ah = cols[i,j]
av = rows[i,j]
# Y-rotation.
mat = np.matrix([
[ np.cos(ah), 0, np.sin(ah)],
[0, 1, 0],
[-np.sin(ah), 0, np.cos(ah)]
])
# X-rotation.
mat *= np.matrix([
[1, 0, 0],
[0, np.cos(av), -np.sin(av)],
[0, np.sin(av), np.cos(av)]
])
img[i,j] = img[i,j] * mat
return img
Is there any clever way to rewrite the loop in NumPy operations?
(Let's assume the shape of img be (a, b, 3).)
Firstly, cols and rows does not need to be fully expanded to (a, b) (you could write cols[j] instead of cols[i,j]). And they can be easy generated using np.linspace:
cols = np.linspace(-1, 1, size[1]) # shape: (b,)
rows = np.linspace(-1, 1, size[0]) # shape: (a,)
cols = np.arctan(sh * cols / (2*f))
rows = np.arctan(sv * rows / (2*f))
Then we get precalculate the components of the matrices.
# shape: (b,)
cos_ah = np.cos(cols)
sin_ah = np.sin(cols)
zeros_ah = np.zeros_like(cols)
ones_ah = np.ones_like(cols)
# shape: (a,)
cos_av = np.cos(rows)
sin_av = np.sin(rows)
zeros_av = np.zeros_like(rows)
ones_av = np.ones_like(rows)
And then construct the rotation matrices:
# shape: (3, 3, b)
y_mat = np.array([
[cos_ah, zeros_ah, sin_ah],
[zeros_ah, ones_ah, zeros_ah],
[-sin_ah, zeros_ah, cos_ah],
])
# shape: (3, 3, a)
x_mat = np.array([
[ones_av, zeros_av, zeros_av],
[zeros_av, cos_av, -sin_av],
[zeros_av, sin_av, cos_av],
])
Now let's see. If we have a loop we would write:
for i in range(size[0]):
for j in range(size[1]):
img[i, j, :] = img[i, j, :] # y_mat[:, :, j] # x_mat[:, :, i]
or, if we expand out the matrix multiplications:
This can be handled nicely using np.einsum (note the i,j,k,m,n corresponds exactly like the equation above):
img = np.einsum('ijk,kmj,mni->ijn', img, y_mat, x_mat)
To summarize:
size = img.shape[:2]
cols = np.linspace(-1, 1, size[1]) # shape: (b,)
rows = np.linspace(-1, 1, size[0]) # shape: (a,)
cols = np.arctan(sh * cols / (2*f))
rows = np.arctan(sv * rows / (2*f))
cos_ah = np.cos(cols)
sin_ah = np.sin(cols)
zeros_ah = np.zeros_like(cols)
ones_ah = np.ones_like(cols)
cos_av = np.cos(rows)
sin_av = np.sin(rows)
zeros_av = np.zeros_like(rows)
ones_av = np.ones_like(rows)
y_mat = np.array([
[cos_ah, zeros_ah, sin_ah],
[zeros_ah, ones_ah, zeros_ah],
[-sin_ah, zeros_ah, cos_ah],
])
x_mat = np.array([
[ones_av, zeros_av, zeros_av],
[zeros_av, cos_av, -sin_av],
[zeros_av, sin_av, cos_av],
])
return np.einsum('ijk,kmj,mni->ijn', img, y_mat, x_mat)