let’s say I have a Tensor X with dimensions [batch, channels, H, W]
then I have another tensor b that holds bias values for each channel which has dims [channels,]
I want y = x + b (per sample)
Is there a nice way to broadcast this over H and W for each channel for each sample in the batch without using a loop.
If i’m convolving I know I can use the bias field in the function to achieve this, but I’m just wondering if this can be achieved just with primitive ops (not using explicit looping)
Link to PyTorch forum question
y = x + b[None, :, None, None] (basically expand b into x's axis template)
Related
My pytorch code is running too slow due to it not being vectorized and I am unsure how to go about vectorizing it as I am relatively new to PyTorch. Can someone help me do this or point me in the right direction?
level_stride = 8
loc = torch.zeros(H * W, 2)
for i in range(H):
for j in range(W):
loc[i * H + j][0] = level_stride * (j + 0.5)
loc[i * H + j][1] = level_stride * (i + 0.5)
First of all, you defined the tensor to be of size (H*W, 2). This is of course entirely optional, but it might be more expressive to preserve the dimensionality explicitly, by having H and W be separate dimension in the tensor. That makes some operations later on easier as well.
The values you compute to fill the tensor with originate from ranges. We can use the torch.arange function to get the same ranges, already in form of a tensor, ready to be put into your loc tensor. If that is done, you could see it as, completely leaving out the for loops, and just treating j and i as different tensors.
If you're not familiar with tensors, this might seem confusing, but operations between single number values and tensors work just as well, so not much of the rest of the code has to be changed.
I'll give you a expression of how your code could look like with these changes applied:
level_stride = 8
loc = torch.zeros(H, W, 2)
j = torch.arange(W).expand((H, W))
loc[:, :, 0] = level_stride * (j + 0.5)
i = torch.arange(H).expand((W, H)).T
loc[:,:,1] = level_stride * (i + 0.5)
The most notable changes are the assignments to j and i, and the usage of slicing to fill the data into loc.
For completeness, let's go over the expressions that are assigned to i and j.
j starts as a torch.arange(W) which is just like a regular range, just in form of a tensor. Then .expand is applied, which you could see as the tensor being repeated. For example, if H had been 5 and W 2, then a range of 2 would have been created, and expanded to a size of (5, 2). The size of this tensor thereby matches the first two sizes in the loc tensor.
i starts just the same, only that W and H swap positions. This is due to i originating from a range based on H rather then W. Notable here is that .T is applied at the end of that expression. The reason for this is that the i tensor still has to match the first two dimensions of loc. .T transposes the tensor for this reason.
If you have a subject specific reason to have the loc tensor in the (H*W, 2) shape, but are otherwise happy with this solution, you could reshape the tensor in the end with loc.reshape(H*W,2).
I find it really hard to visualize reshaping 4D 5D arrays in numpy/pytorch. (I assume both reshape in similar patter, I am using pytorch currently!).
Like suppose I have videos with dimension [N x C x D x H x W]
(num videos x channels video x frames video x height video x width video)
Suppose I want to reshape video into frames as [N x C x H x W], how should I proceed in reshape.
Simple applying x = x.reshape(N*D, C, H, W) doesn't actually do it, it gives wrong order of elements.
Can you help me with how to do this, and any slight of intuition of pattern you used?
On a sidenote, if i have one video (i.e suppose 1x3x100x256x256 I use :
the following code approach:
x = x.squeeze(0).T.reshape((100,3,256,256))[:,:,None,:,:] and it works
great. Couldnt figure out for more than 1 video.
Thanks!
As per the request :
input = np.random.randn(N,C,D,H,W)
output = np.zeros((N*D,C,H,W))
As per the request, a for loop based code to show what I want
for h in range(N):
for i in range(D):
for j in range(C):
for k in range(H):
for l in range(W):
output[h*D + i,j,k,l] = input[h,j,i,k,l]
Simply swap the second and third axes, and then merge the new second axis (old third one) with the first one with reshaping -
output = input_array.swapaxes(1,2).reshape(N*D,C,H,W)
We can also use transpose : input_array.transpose(0,2,1,3,4) to get the same swapping axes effect.
For a general intuitive method, please refer to Intuition and idea behind reshaping 4D array to 2D array in NumPy.
I have a tensor with shape B x H x W x C and I like to apply dilation to only H. Do you know any way to have this small trick with tf.nn.atrous_conv2d?
There is no such feature in tf.nn.atrous_conv2d, but you can use tf.layers.conv2d and set dilation_rate=(2, 1) to achieve the same effect.
Im getting this error when passing the input data to the Linear (Fully Connected Layer) in PyTorch:
matrices expected, got 4D, 2D tensors
I fully understand the problem since the input data has a shape (N,C,H,W) (from a Convolutional+MaxPool layer) where:
N: Data Samples
C: Channels of the data
H,W: Height and Width
Nevertheless I was expecting PyTorch to do the "reshaping" of the data form:
[ N , D1,...Dn] --> [ N, D] where D = D1*D2*....Dn
I try to reshape the Variable.data, but I've read that this approach is not recommended since the gradients will conserve the previous shape, and that in general you should not mutate a Variable.data shape.
I am pretty sure there is a simple solution that goes along with the framework, but i haven't find it.
Is there a good solution for this?
PD: The Fully connected layer has as input size the value C * H * W
After reading some Examples I found the solution. here is how you do it without messing up the forward/backward pass flow:
(_, C, H, W) = x.data.size()
x = x.view( -1 , C * H * W)
A more general solution (would work regardless of how many dimensions x has) is to take the product of all dimension sizes but the first one (the "batch size"):
n_features = np.prod(x.size()[1:])
x = x.view(-1, n_features)
It is common to save the batch size and infer the other dimension in a flatten:
batch_size = x.shape[0]
...
x = x.view(batch_size, -1)
I have two square matrices of the same size and the dimensions of a square patch. I'd like to compute the dot product between every pair of patches. Essentially I would like to implement the following operation:
def patch_dot(A, B, patch_dim):
res_dim = A.shape[0] - patch_dim + 1
res = np.zeros([res_dim, res_dim, res_dim, res_dim])
for i in xrange(res_dim):
for j in xrange(res_dim):
for k in xrange(res_dim):
for l in xrange(res_dim):
res[i, j, k, l] = (A[i:i + patch_dim, j:j + patch_dim] *
B[k:k + patch_dim, l:l + patch_dim]).sum()
return res
Obviously this would be an extremely inefficient implementation. Tensorflow's tf.nn.conv2d seems like a natural solution to this as I'm essentially doing a convolution, however my filter matrix isn't fixed. Is there a natural solution to this in Tensorflow, or should I start looking at implementing my own tf-op?
The natural way to do this is to first extract overlapping image patches of matrix B using tf.extract_image_patches, then to apply the tf.nn.conv2D function on A and each B sub-patch using tf.map_fn.
Note that prior to use tf.extract_image_patches and tf.nn.conv2D you need to reshape your matrices as 4D tensors of shape [1, width, height, 1] using tf.reshape.
Also, prior to use tf.map_fn, you would also need to use the tf.transpose op so that the B sub-patches are indexed by the first dimension of the tensor you use as the elems argument of tf.map_fn.