How do i speed up this looping of pixels in PIL

How do i speed up this looping of pixels in PIL - python

I want to treat the r and g channel of a pixel and convert it from 0 <-> 255 to -1 <-> 1, then rotate (r, g) around (0,0) using the angle stored in rotations[i]. This is how I normally do it with regular for loops, but since the images I work with are ~4k*4k in dimensions, this takes a long time, and I would love to speed this up. I have little knowledge about parallelization, etc., but any resources would be helpful. I've tried libraries like joblib and multiprocessing, but I'm feeling as though I've made some fundamental mistake in those implementations usually resulting in some pickle error.
c = math.cos(rotations[i])
s = math.sin(rotations[i])
pixels = texture_.load()
for X in range(width):
for Y in range(height):
x = (pixels[X, Y][0]/255 -.5)*2
y = (pixels[X, Y][1]/255 -.5)*2
z = pixels[X, Y][2]
x_ = x*c-y*s
y_ = x*s+y*c
x_ = 255*(x_/2+.5)
y_ = 255*(y_/2+.5)
pixels[X, Y] = (math.floor(x_), math.floor(y_), z)

Use numpy to vectorize the computation and compute all individual elements at once in a matrix style computation.
Try something like this:
import numpy as np
pixels = np.array(pixels) # Assuming shape of (width, length, 3)
x = 2 * (pixels[:, :, 0]/255 - 0.5)
y = 2 * (pixels[:, :, 1]/255 - 0.5)
z = pixels[:, :, 2]
x_ = x * c - y * s
y_ = x * s + y * c
x_ = 255 * (x_ / 2 + .5)
y_ = 255 * (y_ / 2 + .5)
pixels[:, :, 0] = np.floor(x_)
pixels[:, :, 1] = np.floor(y_)
pixels[:, :, 2] = z

Related

Optimizing gaussian heatmap generation

I have a set of 68 keypoints (size [68, 2]) that I am mapping to gaussian heatmaps. To do this, I have the following function:
def generate_gaussian(t, x, y, sigma=10):
"""
Generates a 2D Gaussian point at location x,y in tensor t.
x should be in range (-1, 1).
sigma is the standard deviation of the generated 2D Gaussian.
"""
h,w = t.shape
# Heatmap pixel per output pixel
mu_x = int(0.5 * (x + 1.) * w)
mu_y = int(0.5 * (y + 1.) * h)
tmp_size = sigma * 3
# Top-left
x1,y1 = int(mu_x - tmp_size), int(mu_y - tmp_size)
# Bottom right
x2, y2 = int(mu_x + tmp_size + 1), int(mu_y + tmp_size + 1)
if x1 >= w or y1 >= h or x2 < 0 or y2 < 0:
return t
size = 2 * tmp_size + 1
tx = np.arange(0, size, 1, np.float32)
ty = tx[:, np.newaxis]
x0 = y0 = size // 2
# The gaussian is not normalized, we want the center value to equal 1
g = torch.tensor(np.exp(- ((tx - x0) ** 2 + (ty - y0) ** 2) / (2 * sigma ** 2)))
# Determine the bounds of the source gaussian
g_x_min, g_x_max = max(0, -x1), min(x2, w) - x1
g_y_min, g_y_max = max(0, -y1), min(y2, h) - y1
# Image range
img_x_min, img_x_max = max(0, x1), min(x2, w)
img_y_min, img_y_max = max(0, y1), min(y2, h)
t[img_y_min:img_y_max, img_x_min:img_x_max] = \
g[g_y_min:g_y_max, g_x_min:g_x_max]
return t
def rescale(a, img_size):
# scale tensor to [-1, 1]
return 2 * a / img_size[0] - 1
My current code uses a for loop to compute the gaussian heatmap for each of the 68 keypoint coordinates, then stacks the resulting tensors to create a [68, H, W] tensor:
x_k1 = [generate_gaussian(torch.zeros(H, W), x, y) for x, y in rescale(kp1.numpy(), frame.shape)]
x_k1 = torch.stack(x_k1, dim=0)
However, this method is super slow. Is there some way that I can do this without a for loop?
Edit:
I tried #Cris Luengo's proposal to compute a 1D Gaussian:
def generate_gaussian1D(t, x, y, sigma=10):
h,w = t.shape
# Heatmap pixel per output pixel
mu_x = int(0.5 * (x + 1.) * w)
mu_y = int(0.5 * (y + 1.) * h)
tmp_size = sigma * 3
# Top-left
x1, y1 = int(mu_x - tmp_size), int(mu_y - tmp_size)
# Bottom right
x2, y2 = int(mu_x + tmp_size + 1), int(mu_y + tmp_size + 1)
if x1 >= w or y1 >= h or x2 < 0 or y2 < 0:
return t
size = 2 * tmp_size + 1
tx = np.arange(0, size, 1, np.float32)
ty = tx[:, np.newaxis]
x0 = y0 = size // 2
g = torch.tensor(np.exp(-np.power(tx - mu_x, 2.) / (2 * np.power(sigma, 2.))))
g = g * g[:, None]
g_x_min, g_x_max = max(0, -x1), min(x2, w) - x1
g_y_min, g_y_max = max(0, -y1), min(y2, h) - y1
img_x_min, img_x_max = max(0, x1), min(x2, w)
img_y_min, img_y_max = max(0, y1), min(y2, h)
t[img_y_min:img_y_max, img_x_min:img_x_max] = \
g[g_y_min:g_y_max, g_x_min:g_x_max]
return t
but my output ends up being an incomplete gaussian.
I'm not sure what I'm doing wrong. Any help would be appreciated.

You generate an NxN array g with a Gaussian centered on its center pixel. N is computed such that it extends by 3*sigma from that center pixel. This is the fastest way to build such an array:
tmp_size = sigma * 3
tx = np.arange(1, tmp_size + 1, 1, np.float32)
g = np.exp(-(tx**2) / (2 * sigma**2))
g = np.concatenate((np.flip(g), [1], g))
g = g * g[:, None]
What we're doing here is compute half a 1D Gaussian. We don't even bother computing the value of the Gaussian for the middle pixel, which we know will be 1. We then build the full 1D Gaussian by flipping our half-Gaussian and concatenating. Finally, the 2D Gaussian is built by the outer product of the 1D Gaussian with itself.
We could shave a bit of extra time by building a quarter of the 2D Gaussian, then concatenating four rotated copies of it. But the difference in computational cost is not very large, and this is much simpler. Note that np.exp is the most expensive operation here by far, so just minimizing how often we call it we significantly reduce the computational cost.
However, the best way to speed up the complete code is to compute the array g only once, rather than anew for each key point. Note how your sigma doesn't change, so all the arrays g that are computed are identical. If you compute it only once, it no longer matters which method you use to compute it, since this will be a minimal portion of the total program anyway.
You could, for example, have a global variable _gaussian to hold your array, and have your function compute it only the first time it is called. Or you could separate your function into two functions, one that constructs this array, and one that copies it into an image, and call them as follows:
g = create_gaussian(sigma=3)
x_k1 = [
copy_gaussian(torch.zeros(H, W), x, y, g)
for x, y in rescale(kp1.numpy(), frame.shape)
]
On the other hand, you're likely best off using existing functionality. For example, DIPlib has a function dip.DrawBandlimitedPoint() [disclosure: I'm an author] that adds a Gaussian blob to an image. Likely you'll find similar functions in other libraries.

torch gather using two index arrays

The goal is to extract a random 2x5 patch from a 5x10 image, and do so randomly for all images in a batch. Looking to write a faster implementation that avoids for loops. Haven't been able to figure out how to use the torch .gather operation with two index arrays (idx_h and idx_w in code example).
Naive for loop:
import torch
b = 3 # batch size
h = 5 # height
w = 10 # width
crop_border = (3, 5) # number of pixels (height, width) to crop
x = torch.arange(b * h * w).reshape(b, h, w)
print(x)
dh_ = torch.randint(0, crop_border[0], size=(b,))
dw_ = torch.randint(0, crop_border[1], size=(b,))
_dh = h - (crop_border[0] - dh_)
_dw = w - (crop_border[1] - dw_)
idx_h = torch.stack([torch.arange(d_, _d) for d_, _d in zip(dh_, _dh)])
idx_w = torch.stack([torch.arange(d_, _d) for d_, _d in zip(dw_, _dw)])
print(idx_h, idx_w)
new_shape = (b, idx_h.shape[1], idx_w.shape[1])
cropped_x = torch.empty(new_shape)
for batch in range(b):
for height in range(idx_h.shape[1]):
for width in range(idx_w.shape[1]):
cropped_x[batch, height, width] = x[
batch, idx_h[batch, height], idx_w[batch, width]
]
print(cropped_x)

Index arrays needed to be repeated and reshaped to work with gather operation. Fast_crop code based pytorch discussion: https://discuss.pytorch.org/t/similar-to-torch-gather-over-two-dimensions/118827
def fast_crop(x, idx1, idx2):
"""
Compute
x: N x B x V
idx1: N x K matrix where idx1[i, j] is between [0, B)
idx2: N x K matrix where idx2[i, j] is between [0, V)
Return:
cropped: N x K matrix where y[i, j] = x[i, idx1[i,j], idx2[i,j]]
"""
x = x.contiguous()
assert idx1.shape == idx2.shape
lin_idx = idx2 + x.size(-1) * idx1
x = x.view(-1, x.size(1) * x.size(2))
lin_idx = lin_idx.view(-1, lin_idx.shape[1] * lin_idx.shape[2])
cropped = x.gather(-1, lin_idx)
return cropped.reshape(idx1.shape)
idx1 = torch.repeat_interleave(idx_h, idx_w.shape[1]).reshape(new_shape)
idx2 = torch.repeat_interleave(idx_w, idx_h.shape[1], dim=0).reshape(new_shape)
cropped = fast_crop(x, idx1, idx2)
(cropped == cropped_x).all()
Using realistic numbers for b = 100, h = 100, w = 130 and crop_border = (40, 95), a 10 trial run takes the for loop 32s while fast_crop only 0.043s.

Perlin noise generator isn't working, doesn't look smooth

I watched some tutorials and tried to create a Perlin noise generator in python.
It takes in a tuple for the number of vectors in the x and y directions and a scale for the distance in pixels between the arrays, then calculates the dot product between each pixel and each of the 4 arrays surrounding it, It then interpolates them bilinearly to get the pixel's value.
here's the code:
from PIL import Image
import numpy as np
scale = 16
size = np.array([8, 8])
vectors = []
for i in range(size[0]):
for j in range(size[1]):
rand = np.random.rand() * 2 * np.pi
vectors.append(np.array([np.cos(rand), np.sin(rand)]))
interpolated_map = np.zeros(size * scale)
def interpolate(x1, x2, w):
t = (w % scale) / scale
return (x2 - x1) * t + x1
def dot_product(a, b):
return a[0] * b[0] + a[1] * b[1]
for i in range(size[1] * scale):
for j in range(size[0] * scale):
dot_products = []
for m in range(4):
corner_vector_x = round(i / scale) + (m % 2)
corner_vector_y = round(j / scale) + int(m / 2)
x = i - corner_vector_x * scale
y = j - corner_vector_y * scale
if corner_vector_x >= size[0]:
corner_vector_x = 0
if corner_vector_y >= size[1]:
corner_vector_y = 0
corner_vector = vectors[corner_vector_x + corner_vector_y * (size[0])]
distance_vector = np.array([x, y])
dot_products.append(dot_product(corner_vector, distance_vector))
x1 = interpolate(dot_products[0], dot_products[1], i)
x2 = interpolate(dot_products[2], dot_products[3], i)
interpolated_map[i][j] = (interpolate(x1, x2, j) / 2 + 1) * 255
img = Image.fromarray(interpolated_map)
img.show()
I'm getting this image:
but I should be getting this:
I don't know what's going wrong, I've tried watching multiple different tutorials, reading a bunch of different articles, but the result is always the same.

Vectorisation and broadcasting

I need to vectorise the following for loop and I am new to broadcasting and vectorisation (and generally object orientated programming is new to me).
width = 1000
height = 400
for v in range(height):
for u in range(width):
start[v,u,0] = -0.5 + u / (width-1)
start[v,u,1] = (-0.5 + v / (height-1)) * height / width
start[v,u,2] = 0
I tried this:
start[:,:,0] = [-0.5+u/(width-1) for u in numpy.arange(width)]
start[:,:,1] = [(-0.5+v/(height-1))*height for v in numpy.arange(height)]
But struggling with shapes and find it difficult to understand broadcasting.

You could use NumPy's mgrid to vectorise your code:
import numpy as np
width = 1000
height = 400
v, u = np.mgrid[0:height, 0:width]
start = np.zeros(shape=(height, width, 3))
start[:, :, 0] = -.5 + u/(width - 1)
start[:, :, 1] = (-.5 + v/(height - 1)) * height / width
If you wish to make use of broadcasting simply replace mgridby ogrid.

Initialising a vector field in numpy

I'd like to initialize a numpy array to represent a two-dimensional vector field on a 100 x 100 grid of points defined by:
import numpy as np
dx = dy = 0.1
nx = ny = 100
x, y = np.meshgrid(np.arange(0,nx*dx,dx), np.arange(0,ny*dy,dy))
The field is a constant-speed circulation about the point cx,cy and I can initialize it OK with regular Python loops:
v = np.empty((nx, ny, 2))
cx, cy = 5, 5
s = 2
for i in range(nx):
for j in range(ny):
rx, ry = i*dx - cx, j*dy - cy
r = np.hypot(rx, ry)
if r == 0:
v[i,j] = 0,0
continue
# (-ry/r, rx/r): the unit vector tangent to the circle centred at (cx,cy), radius r
v[i,j] = (s * -ry/r, s * rx/r)
But when I'm having trouble vectorizing with numpy. The closest I've got is
v = np.array([s * -(y-cy) / np.hypot(x-cx, y-cy), s * (x-cx) / np.hypot(x-cx, y-cy)])
v = np.rollaxis(v, 1, 0)
v = np.rollaxis(v, 2, 1)
v[np.isinf(v)] = 0
But this isn't equivalent and doesn't give the right answer. What is the correct way to initialize a vector field using numpy?
EDIT: OK - now I'm confused following the suggestion below, I try:
vx = s * -(y-cy) / np.hypot(x-cx, y-cy)
vy = s * (x-cx) / np.hypot(x-cx, y-cy)
v = np.dstack((vx, vy))
v[np.isnan(v)] = 0
but get a completely different array...

From your initial setup:
import numpy as np
dx = dy = 0.1
nx = ny = 100
x, y = np.meshgrid(np.arange(0, nx * dx, dx),
np.arange(0, ny * dy, dy))
cx = cy = 5
s = 2
You could compute v like this:
rx, ry = y - cx, x - cy
r = np.hypot(rx, ry)
v2 = s * np.dstack((-ry, rx)) / r[..., None]
v2[np.isnan(v2)] = 0
If you're feeling really fancy, you could create yx as a 3D array, and broadcast all of the operations over it:
# we make these [2,] arrays to broadcast over the last output dimension
c = np.array([5, 5])
s = np.array([-2, 2])
# this creates a [100, 100, 2] mesh, where the last dimension corresponds
# to (y, x)
yx = np.mgrid[0:nx * dx:dx, 0:ny * dy:dy].T
yxdiff = yx - c[None, None, :]
r = np.hypot(yxdiff[..., 0], yxdiff[..., 1])[..., None]
v3 = s[None, None, :] * yxdiff / r
v3[np.isnan(v3)] = 0
Check that these both give the same answer as your original code:
print np.all(v == v2), np.all(v == v3)
# True, True
Edit
Why rx, ry = y - cx, x - cy rather than rx, ry = x - cx, y - cy? I agree it's very counterintuitive - the only reason I decided to do it that way was to match the output of your original code.
The issue is that in your grids, consecutive x values are actually found in consecutive columns of x, and consecutive y values are found in consecutive rows of y, i.e. x[:, j] is the j th x-value and y[i, :] is the i th y-value. However, in your inner loop, you are multiplying dx by i, which is your row index, and dy by j, which is your column index. You're therefore flipping the x and y dimensions of your output.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How do i speed up this looping of pixels in PIL - python

Related

Optimizing gaussian heatmap generation

torch gather using two index arrays

Perlin noise generator isn't working, doesn't look smooth

Vectorisation and broadcasting

Initialising a vector field in numpy

Categories

Resources