Generate matrices for pairs of values in Numpy - python

I have a 3D array (a 2D array of vectors), of which I want to transform each vector with a rotation matrix. The rotations are in two separate 2D arrays of radians angle values called cols and rows.
I've been able to have NumPy compute the angles for me already, without a Python loop. Now I'm looking for a way to have NumPy generate the rotation matrices, too, hopefully resulting in a great performance boost.
size = img.shape[:2]
# Create an array that assigns each pixel the percentage of
# the correction (value between -1 and 1, distributed linearly).
cols = np.array([np.arange(size[1]) for __ in range(size[0])]) / (size[1] - 1) * 2 - 1
rows = np.array([np.arange(size[0]) for __ in range(size[1])]).T / (size[0] - 1) * 2 - 1
# Atan distribution based on F-number and Sensor size.
cols = np.arctan(sh * cols / (2 * f))
rows = np.arctan(sv * rows / (2 * f))
### This is the loop that I would like to remove and find a
### clever way to make NumPy do the same operation natively.
for i in range(size[0]):
for j in range(size[1]):
ah = cols[i,j]
av = rows[i,j]
# Y-rotation.
mat = np.matrix([
[ np.cos(ah), 0, np.sin(ah)],
[0, 1, 0],
[-np.sin(ah), 0, np.cos(ah)]
])
# X-rotation.
mat *= np.matrix([
[1, 0, 0],
[0, np.cos(av), -np.sin(av)],
[0, np.sin(av), np.cos(av)]
])
img[i,j] = img[i,j] * mat
return img
Is there any clever way to rewrite the loop in NumPy operations?

(Let's assume the shape of img be (a, b, 3).)
Firstly, cols and rows does not need to be fully expanded to (a, b) (you could write cols[j] instead of cols[i,j]). And they can be easy generated using np.linspace:
cols = np.linspace(-1, 1, size[1]) # shape: (b,)
rows = np.linspace(-1, 1, size[0]) # shape: (a,)
cols = np.arctan(sh * cols / (2*f))
rows = np.arctan(sv * rows / (2*f))
Then we get precalculate the components of the matrices.
# shape: (b,)
cos_ah = np.cos(cols)
sin_ah = np.sin(cols)
zeros_ah = np.zeros_like(cols)
ones_ah = np.ones_like(cols)
# shape: (a,)
cos_av = np.cos(rows)
sin_av = np.sin(rows)
zeros_av = np.zeros_like(rows)
ones_av = np.ones_like(rows)
And then construct the rotation matrices:
# shape: (3, 3, b)
y_mat = np.array([
[cos_ah, zeros_ah, sin_ah],
[zeros_ah, ones_ah, zeros_ah],
[-sin_ah, zeros_ah, cos_ah],
])
# shape: (3, 3, a)
x_mat = np.array([
[ones_av, zeros_av, zeros_av],
[zeros_av, cos_av, -sin_av],
[zeros_av, sin_av, cos_av],
])
Now let's see. If we have a loop we would write:
for i in range(size[0]):
for j in range(size[1]):
img[i, j, :] = img[i, j, :] # y_mat[:, :, j] # x_mat[:, :, i]
or, if we expand out the matrix multiplications:
This can be handled nicely using np.einsum (note the i,j,k,m,n corresponds exactly like the equation above):
img = np.einsum('ijk,kmj,mni->ijn', img, y_mat, x_mat)
To summarize:
size = img.shape[:2]
cols = np.linspace(-1, 1, size[1]) # shape: (b,)
rows = np.linspace(-1, 1, size[0]) # shape: (a,)
cols = np.arctan(sh * cols / (2*f))
rows = np.arctan(sv * rows / (2*f))
cos_ah = np.cos(cols)
sin_ah = np.sin(cols)
zeros_ah = np.zeros_like(cols)
ones_ah = np.ones_like(cols)
cos_av = np.cos(rows)
sin_av = np.sin(rows)
zeros_av = np.zeros_like(rows)
ones_av = np.ones_like(rows)
y_mat = np.array([
[cos_ah, zeros_ah, sin_ah],
[zeros_ah, ones_ah, zeros_ah],
[-sin_ah, zeros_ah, cos_ah],
])
x_mat = np.array([
[ones_av, zeros_av, zeros_av],
[zeros_av, cos_av, -sin_av],
[zeros_av, sin_av, cos_av],
])
return np.einsum('ijk,kmj,mni->ijn', img, y_mat, x_mat)

Related

torch gather using two index arrays

The goal is to extract a random 2x5 patch from a 5x10 image, and do so randomly for all images in a batch. Looking to write a faster implementation that avoids for loops. Haven't been able to figure out how to use the torch .gather operation with two index arrays (idx_h and idx_w in code example).
Naive for loop:
import torch
b = 3 # batch size
h = 5 # height
w = 10 # width
crop_border = (3, 5) # number of pixels (height, width) to crop
x = torch.arange(b * h * w).reshape(b, h, w)
print(x)
dh_ = torch.randint(0, crop_border[0], size=(b,))
dw_ = torch.randint(0, crop_border[1], size=(b,))
_dh = h - (crop_border[0] - dh_)
_dw = w - (crop_border[1] - dw_)
idx_h = torch.stack([torch.arange(d_, _d) for d_, _d in zip(dh_, _dh)])
idx_w = torch.stack([torch.arange(d_, _d) for d_, _d in zip(dw_, _dw)])
print(idx_h, idx_w)
new_shape = (b, idx_h.shape[1], idx_w.shape[1])
cropped_x = torch.empty(new_shape)
for batch in range(b):
for height in range(idx_h.shape[1]):
for width in range(idx_w.shape[1]):
cropped_x[batch, height, width] = x[
batch, idx_h[batch, height], idx_w[batch, width]
]
print(cropped_x)
Index arrays needed to be repeated and reshaped to work with gather operation. Fast_crop code based pytorch discussion: https://discuss.pytorch.org/t/similar-to-torch-gather-over-two-dimensions/118827
def fast_crop(x, idx1, idx2):
"""
Compute
x: N x B x V
idx1: N x K matrix where idx1[i, j] is between [0, B)
idx2: N x K matrix where idx2[i, j] is between [0, V)
Return:
cropped: N x K matrix where y[i, j] = x[i, idx1[i,j], idx2[i,j]]
"""
x = x.contiguous()
assert idx1.shape == idx2.shape
lin_idx = idx2 + x.size(-1) * idx1
x = x.view(-1, x.size(1) * x.size(2))
lin_idx = lin_idx.view(-1, lin_idx.shape[1] * lin_idx.shape[2])
cropped = x.gather(-1, lin_idx)
return cropped.reshape(idx1.shape)
idx1 = torch.repeat_interleave(idx_h, idx_w.shape[1]).reshape(new_shape)
idx2 = torch.repeat_interleave(idx_w, idx_h.shape[1], dim=0).reshape(new_shape)
cropped = fast_crop(x, idx1, idx2)
(cropped == cropped_x).all()
Using realistic numbers for b = 100, h = 100, w = 130 and crop_border = (40, 95), a 10 trial run takes the for loop 32s while fast_crop only 0.043s.

fast <image,time> linear interpolation

I'm trying to achieve linear interpolation, where the data points are N images of shape: HxWx3 (stored in buf (NxHxWx3)), and the points to interpolate are specified in another (2D) grid (interp_values).
Non-vectorizable approach:
In principle I have made interp_values a HxW grid with values 0..N-1 indicating for each i,j element from which image (in buf) to read it from, including fractional values meaning interpolation.
E.g.: a value of 3.6 means blend 40% (1-0.6) of image 3 with 60% (0.6) of image 4. However with this approach it is quite impossible to vectorize the code, and performance was poor.
One vectorization approach:
So I changed interp_values to be a NxHxWx3 grid with values 0..1. Each column :,i,j,c would specify blend coefficients for the N images, where only 1 or 2 elements are non-zero, e.g. for 3.6 we have: [0, 0, 0, 0.6, 0.4, 0, 0, ...]. I can convert interp_values from HxW to NxHxWx3 with:
def expand_interp_values(interp_values):
r = np.zeros((N,) + interp_values.shape + (3,))
for i in range(interp_values.shape[0]):
for j in range(interp_values.shape[1]):
v = interp_values[i, j]
a, b, x = math.floor(v), math.ceil(v), math.fmod(v, 1)
if int(a) == int(b):
r[a, i, j, :] = 3 * [1]
else:
r[a, i, j, :] = 3 * [1 - x]
r[b, i, j, :] = 3 * [x]
return r
This representation is more sparse (many zeros) but now interpolation can be computed as element-wise multiplication between buf and interp_values (the multiplication part of the linear interpolation) followed by a sum(..., axis=0) (i.e. the addition part of the linear interpolation):
def linear_interp(data, interp_values):
return np.sum(data * interp_values, axis=0)
With this approach, there is some performance improvement, however it seems with this approach the CPU will be most of the times busy computing x1*0, x2*0, ... or 0 + 0 + 0...
Can this be improved any better?
Additionally, the creation of the expanded interp_values grid is not vectorized, so perhaps performance would be bad if that grid has to be updated continuously.
Complete python+opencv code:
import cv2
import numpy as np
import math
vid = cv2.VideoCapture(0)
vid.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
vid.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)
# store last N images into a NxHxWx3 grid (circular buffer):
N = 25
buf = None
interp_values = None
DOWNSAMPLING = 6
def linear_interp(data, interp_values):
return np.sum(data * interp_values / 256, axis=0)
def expand_interp_values(interp_values):
r = np.zeros((N,) + interp_values.shape + (3,))
for i in range(interp_values.shape[0]):
for j in range(interp_values.shape[1]):
v = interp_values[i, j]
a, b, x = math.floor(v), math.ceil(v), math.fmod(v, 1)
if int(a) == int(b):
r[a, i, j, :] = 3 * [1]
else:
r[a, i, j, :] = 3 * [1 - x]
r[b, i, j, :] = 3 * [x]
return r
while True:
ret, frame = vid.read()
H, W, Ch = frame.shape
frame = cv2.resize(frame, dsize=(W//DOWNSAMPLING, H//DOWNSAMPLING), interpolation=cv2.INTER_LINEAR)
# circular buffer:
if buf is None:
buf = np.zeros((N,) + frame.shape, dtype=np.uint8)
# there should be a simpler way to a FIFO-grid...
for i in reversed(range(1, N)):
buf[i] = buf[i - 1]
buf[0] = frame
if interp_values is None:
# create a lookup pattern here:
interp_values = np.zeros(frame.shape[:2])
for i in range(frame.shape[0]):
for j in range(frame.shape[1]):
y = i / (frame.shape[0] - 1) * 2 - 1
x = j / (frame.shape[1] - 1) * 2 - 1
#interp_values[i, j] = (N - 1) * min(1, math.hypot(x, y))
interp_values[i, j] = (N - 1) * (y + 1) / 2
interp_values = expand_interp_values(interp_values)
im = linear_interp(buf, interp_values)
im = cv2.resize(im, dsize=(W, H), interpolation=cv2.INTER_LANCZOS4)
cv2.imshow('image', im)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
vid.release()
cv2.destroyAllWindows()

3D covariance matrix - vectrorizing python

I need to speed up a python code, I would like to avoid the use of the following for cycle, where "data" matrix has dimension [dim1xdim2]:
for i in range(int(dim1)):
data_process = data[i,:].reshape((dim2, 1))
rxx = data_process * np.matrix.getH(np.asmatrix(data_process)) / dim2
Using the 'for cycle' the dimension of the rxx matrix is [dim2xdim2], I would get a 3D "rxx" matrix [dim1xdim2xdim2]. I tried to use the following solution:
data_new = repeat(data_process0[:, :, newaxis], dim2, axis=2)
N_2 = data_new.shape[2]
m1 = data_new - data_new.sum(2, keepdims=1) / N_2
y_out = einsum('ijk,ilk->ijl', m1, m1) / (N_2 - 1)
In this case I got 3D "y_out" matrix [dim1xdim2xdim2] but this doesn't work in my case.
Thanks
representative sample data:
from numpy import matrix, random, asmatrix, linalg, empty
B = random.random((156, 48))
A = B.shape
eig_val = empty(A, dtype=complex)
eig_vec = empty((A[0], A[1], A[1]), dtype=complex)
for i in range(int(A[0])):
data_process = B[i, :].reshape((A[1], 1))
rxx = data_process * matrix.getH(asmatrix(data_process)) / A[1]
eig_val[i:, ...], eig_vec[i:, ...] = linalg.eig(rxx)

Numpy syntax to assign elements of an array

I want to create an array whose elements are a function of their position.
Something like
N = 1000000
newarray = np.zeros([N,N,N])
for i in range(N):
for j in range(N):
for k in range(N):
newarray[i,j,k] = f(i,j,k)
Is there a way to increase the speed of this operation, by removing the for loops / parallelizing it using the numpy syntax?
This is the f function
def f(i,j,k):
indices = (R[:,0]==i) *( R[:,1]==j) * (R[:,2]==k)
return M[indices]
where for example
R = np.random.randint(0,N,[N,3])
M = np.random.randn(N)*15
and in the actual application they are not random.
You can do that operation with the at method of np.add:
import numpy as np
np.random.seed(0)
N = 100
R = np.random.randint(0, N, [N, 3])
M = np.random.randn(N) * 15
newarray = np.zeros([N, N, N])
np.add.at(newarray, (R[:, 0], R[:, 1], R[:, 2]), M)
In this case, if R has any repeated row the corresponding value in newarray will be the sum of all the corresponding values in M.
EDIT: To take the average instead of sum for repeated elements you could do something like this:
import numpy as np
np.random.seed(0)
N = 100
R = np.random.randint(0, N, [N, 3])
M = np.random.randn(N) * 15
newarray = np.zeros([N, N, N])
np.add.at(newarray, (R[:, 0], R[:, 1], R[:, 2]), M)
newarray_count = np.zeros([N, N, N])
np.add.at(newarray_count, (R[:, 0], R[:, 1], R[:, 2]), 1)
m = newarray_count > 1
newarray[m] /= newarray_count[m]

efficiently calculate list of 3d rotation matrices in numpy or scipy

I have a list of N unit-normalized 3D vectors p stored in a numpy ndarray with shape (N, 3). I have another such list, q. I want to calculate an ndarray U of shape (N, 3, 3) storing the rotation matrices that rotate each point in p to the corresponding point q.
The list of rotation matrices U should satisfy:
np.all(np.einsum('ijk,ik->ij', U, p) == q)
On a point-by-point basis, the problem reduces to being able to compute a rotation matrix for a rotation of some angle about some axis. Code solving the single-point case appears below:
def rotation_matrix(angle, direction):
direction = np.atleast_1d(direction).astype('f4')
sina = np.sin(angle)
cosa = np.cos(angle)
direction = direction/np.sqrt(np.sum(direction*direction))
R = np.diag([cosa, cosa, cosa])
R += np.outer(direction, direction) * (1.0 - cosa)
direction *= sina
R += np.array(((0.0, -direction[2], direction[1]),
(direction[2], 0.0, -direction[0]),
(-direction[1], direction[0], 0.0)))
return R
What I need is a function that behaves exactly as the above function, but instead of accepting a single angle and a single direction, it accepts an angles array of shape (npts, ) and a directions array of shape (npts, 3). The code below is only partially finished - the problem is that neither np.diag nor np.outer accept an axis argument
def rotation_matrices(angles, directions):
directions = np.atleast_2d(directions)
angles = np.atleast_1d(angles)
npts = directions.shape[0]
directions = directions/np.sqrt(np.sum(directions*directions, axis=1)).reshape((npts, 1))
sina = np.sin(angles)
cosa = np.cos(angles)
# Lines below require extension to 2d case - np.diag and np.outer do not support axis arguments
R = np.diag([cosa, cosa, cosa])
R += np.outer(directions, directions) * (1.0 - cosa)
directions *= sina
R += np.array(((0.0, -directions[2], directions[1]),
(directions[2], 0.0, -directions[0]),
(-directions[1], directions[0], 0.0)))
return R
Does either numpy or scipy have a compact vectorized function computing the appropriate rotation matrices in a way that avoids using for loops? The problem is that neither np.diag nor np.outer accept axis as an argument. My application will have N be very large, 1e7 or greater, so a vectorized function that keeps all the relevant axes aligned is necessary for performance reasons.
Dropping this here for now, will explain later. Using levi-cevita symbols from #jaime's answer here and the matrix form of the Rodrigues formula here and some algebra based on k = (a x b)/sin(theta)
def rotmatx(p, q):
eijk = np.zeros((3, 3, 3))
eijk[0, 1, 2] = eijk[1, 2, 0] = eijk[2, 0, 1] = 1
eijk[0, 2, 1] = eijk[2, 1, 0] = eijk[1, 0, 2] = -1
d = (p * q).sum(-1)[:, None, None]
c = (p.dot(eijk) # q[..., None]).squeeze() # cross product (optimized)
cx = c.dot(eijk)
return np.eye(3) + cx + cx # cx / (1 + d)
EDIT: dang. question changed.
def rotation_matrices(angles, directions):
eijk = np.zeros((3, 3, 3))
eijk[0, 1, 2] = eijk[1, 2, 0] = eijk[2, 0, 1] = 1
eijk[0, 2, 1] = eijk[2, 1, 0] = eijk[1, 0, 2] = -1
theta = angles[:, None, None]
K = directions.dot(eijk)
return np.eye(3) + K * np.sin(theta) + K # K * (1 - np.cos(theta))
Dropping another solution for bulk rotation of a Nx3x3 matrix. Where the 3x3 components represent vector components in
[[11, 12, 13],
[21, 22, 23],
[31, 32, 33]]
Now matrix rotation by np.einsum is:
data = np.random.uniform(size=(500, 3, 3))
rotmat = np.random.uniform(size=(3, 3))
data_rot = np.einsum('ij,...jk,lk->...il', rotmat, data, rotmat)
This is equivalent to
for data_mat in data:
np.dot(np.dot(rotmat, data_mat), rotmat.T)
Speedup over a np.dot-loop is around 250x.

Categories