How can I vectorize this PyTorch snippet? - python

My pytorch code is running too slow due to it not being vectorized and I am unsure how to go about vectorizing it as I am relatively new to PyTorch. Can someone help me do this or point me in the right direction?
level_stride = 8
loc = torch.zeros(H * W, 2)
for i in range(H):
for j in range(W):
loc[i * H + j][0] = level_stride * (j + 0.5)
loc[i * H + j][1] = level_stride * (i + 0.5)

First of all, you defined the tensor to be of size (H*W, 2). This is of course entirely optional, but it might be more expressive to preserve the dimensionality explicitly, by having H and W be separate dimension in the tensor. That makes some operations later on easier as well.
The values you compute to fill the tensor with originate from ranges. We can use the torch.arange function to get the same ranges, already in form of a tensor, ready to be put into your loc tensor. If that is done, you could see it as, completely leaving out the for loops, and just treating j and i as different tensors.
If you're not familiar with tensors, this might seem confusing, but operations between single number values and tensors work just as well, so not much of the rest of the code has to be changed.
I'll give you a expression of how your code could look like with these changes applied:
level_stride = 8
loc = torch.zeros(H, W, 2)
j = torch.arange(W).expand((H, W))
loc[:, :, 0] = level_stride * (j + 0.5)
i = torch.arange(H).expand((W, H)).T
loc[:,:,1] = level_stride * (i + 0.5)
The most notable changes are the assignments to j and i, and the usage of slicing to fill the data into loc.
For completeness, let's go over the expressions that are assigned to i and j.
j starts as a torch.arange(W) which is just like a regular range, just in form of a tensor. Then .expand is applied, which you could see as the tensor being repeated. For example, if H had been 5 and W 2, then a range of 2 would have been created, and expanded to a size of (5, 2). The size of this tensor thereby matches the first two sizes in the loc tensor.
i starts just the same, only that W and H swap positions. This is due to i originating from a range based on H rather then W. Notable here is that .T is applied at the end of that expression. The reason for this is that the i tensor still has to match the first two dimensions of loc. .T transposes the tensor for this reason.
If you have a subject specific reason to have the loc tensor in the (H*W, 2) shape, but are otherwise happy with this solution, you could reshape the tensor in the end with loc.reshape(H*W,2).

Related

Python: [PyTorch] Selectively add along axes without using a loop

let’s say I have a Tensor X with dimensions [batch, channels, H, W]
then I have another tensor b that holds bias values for each channel which has dims [channels,]
I want y = x + b (per sample)
Is there a nice way to broadcast this over H and W for each channel for each sample in the batch without using a loop.
If i’m convolving I know I can use the bias field in the function to achieve this, but I’m just wondering if this can be achieved just with primitive ops (not using explicit looping)
Link to PyTorch forum question
y = x + b[None, :, None, None] (basically expand b into x's axis template)

Total Variation Regularization for Tensors in Python

Formula
Hi, I am trying to implement total variation function for tensor or in more accurate, multichannel images. I found that for above Total Variation (in picture), there is source code like this:
def compute_total_variation_loss(img, weight):
tv_h = ((img[:,:,1:,:] - img[:,:,:-1,:]).pow(2)).sum()
tv_w = ((img[:,:,:,1:] - img[:,:,:,:-1]).pow(2)).sum()
return weight * (tv_h + tv_w)
Since, I am very beginner in python I didn't understood how the indices are referred to i and j in image. I also want to add total variation for c (besides i and j) but I don't know which index refers to c.
Or to be more concise, how to write following equation in python:
enter image description here
This function assumes batched images. So img is a 4 dimensional tensor of dimensions (B, C, H, W) (B is the number of images in the batch, C the number of color channels, H the height and W the width).
So, img[0, 1, 2, 3] is the pixel (2, 3) of the second color (green in RGB) in the first image.
In Python (and Numpy and PyTorch), a slice of elements can be selected with the notation i:j, meaning that the elements i, i + 1, i + 2, ..., j - 1 are selected. In your example, : means all elements, 1: means all elements but the first and :-1 means all elements but the last (negative indices retrieves the elements backward). Please refer to tutorials on "slicing in NumPy".
So img[:,:,1:,:] - img[:,:,:-1,:] is equivalent to the (batch of) images minus themselves shifted by one pixel vertically, or, in your notation X(i + 1, j, k) - X(i, j, k). Then the tensor is squared (.pow(2)) and summed (.sum()). Note that the sum is also over the batch in this case, so you receive the total variation of the batch, not of each images.

What are b, y, x and c which get flattened and returned along with the max-pooled features in tf.nn.max_pool_with_argmax?

I went through the documentation of tf.nn.max_pool_with_argmax where it is written
Performs max pooling on the input and outputs both max values and indices.
The indices in argmax are flattened, so that a maximum value at
position [b, y, x, c] becomes flattened index ((b * height + y) *
width + x) * channels + c.
The indices returned are always in [0, height) x [0, width) before
flattening, even if padding is involved and the mathematically correct
answer is outside (either negative or too large). This is a bug, but
fixing it is difficult to do in a safe backwards compatible way,
especially due to flattening.
The variables b, y, x and c haven't been explicitly defined hence I was having issues implementing this method. Can someone please provide the same.
I am unable to comment due to reputation.
But I think the variables are referencing the position and size of the Max Pooling window. x and y are the x and y position of the kernel as it moves along the input matrix and b and c are the width and height of the kernel. You would set b and c in kernel size.
If you are having a problem implementing max pooling with argmax it has little to do with these variables. You might want to specify the issue you are having with Max Pooling.

Dot product of patches in tensorflow

I have two square matrices of the same size and the dimensions of a square patch. I'd like to compute the dot product between every pair of patches. Essentially I would like to implement the following operation:
def patch_dot(A, B, patch_dim):
res_dim = A.shape[0] - patch_dim + 1
res = np.zeros([res_dim, res_dim, res_dim, res_dim])
for i in xrange(res_dim):
for j in xrange(res_dim):
for k in xrange(res_dim):
for l in xrange(res_dim):
res[i, j, k, l] = (A[i:i + patch_dim, j:j + patch_dim] *
B[k:k + patch_dim, l:l + patch_dim]).sum()
return res
Obviously this would be an extremely inefficient implementation. Tensorflow's tf.nn.conv2d seems like a natural solution to this as I'm essentially doing a convolution, however my filter matrix isn't fixed. Is there a natural solution to this in Tensorflow, or should I start looking at implementing my own tf-op?
The natural way to do this is to first extract overlapping image patches of matrix B using tf.extract_image_patches, then to apply the tf.nn.conv2D function on A and each B sub-patch using tf.map_fn.
Note that prior to use tf.extract_image_patches and tf.nn.conv2D you need to reshape your matrices as 4D tensors of shape [1, width, height, 1] using tf.reshape.
Also, prior to use tf.map_fn, you would also need to use the tf.transpose op so that the B sub-patches are indexed by the first dimension of the tensor you use as the elems argument of tf.map_fn.

numpy row pair sum of squared row wise differences without for loops (only api calls)

For those who can read Latex, this is what I am trying to compute:
$$k_{xyi} = \sum_{j}\left ( \left ( x_{i}-x_{j} \right )^{2}+\left ( y_{i}-y_{j} \right )^{2} \right )$$
where x and y are rows of a matrix A.
For computer language only folk this would translate as:
k(x,y,i) = sum_j( (xi - xj)^2 + (yi - yj)^2 )
where x and y are rows of a matrix A.
So k is a 3d matrix.
Can this be done with API calls only? (no for loops)
Here is testing startup:
import numpy as np
A = np.random.rand(4,4)
k = np.empty((4,4,4))
for ix in range(4):
for iy in range(4):
x = A[ix,]
y = A[iy,]
sx = np.power(x - x[:,np.newaxis],2)
sy = np.power(y - y[:,np.newaxis],2)
k[ix,iy] = (sx + sy).sum(axis=1).T
And now for the master coders, please replace the two for loops with numpy API calls.
Update:
Forgot to mention that I need a method that saves up RAM space, my A matrices are usually 20-30 thousand squared. So it would be great if your answer does not create huge temporary multidimensional arrays.
I would change your latex to look something more like the following- it is much less confusing imo:
From this I assume the last line in your expression should really be:
k[ix,iy] = (sx + sy).sum(axis=-1)
If so, you can compute the above expression as follows:
Axij = (A[:, None, :] - A[..., None])**2
k = np.sum(Axij[:, None, :, :] + Axij, axis=-1)
The above first expands out a memory intensive 4D array. You can skip this if you are worried about memory by introducing a new for loop:
k = np.empty((4,4,4))
Axij = (A[:, None, :] - A[..., None])**2
for xi in range(A.shape[0]):
k[xi] = np.sum(Axij[xi, None, :, :] + Axij, axis=-1)
This will be slower, but not by as much as you would think since you still do a lot of the operations in numpy. You could probably skip the 3D Axij intermediate, but again you are going to take a performance penalty doing so.
If your matrices are really 20k on an edge your 3D output will be 64TB. You are not going to do this in numpy or even in memory (unless you have a large scale distributed memory system).

Categories