I find it really hard to visualize reshaping 4D 5D arrays in numpy/pytorch. (I assume both reshape in similar patter, I am using pytorch currently!).
Like suppose I have videos with dimension [N x C x D x H x W]
(num videos x channels video x frames video x height video x width video)
Suppose I want to reshape video into frames as [N x C x H x W], how should I proceed in reshape.
Simple applying x = x.reshape(N*D, C, H, W) doesn't actually do it, it gives wrong order of elements.
Can you help me with how to do this, and any slight of intuition of pattern you used?
On a sidenote, if i have one video (i.e suppose 1x3x100x256x256 I use :
the following code approach:
x = x.squeeze(0).T.reshape((100,3,256,256))[:,:,None,:,:] and it works
great. Couldnt figure out for more than 1 video.
Thanks!
As per the request :
input = np.random.randn(N,C,D,H,W)
output = np.zeros((N*D,C,H,W))
As per the request, a for loop based code to show what I want
for h in range(N):
for i in range(D):
for j in range(C):
for k in range(H):
for l in range(W):
output[h*D + i,j,k,l] = input[h,j,i,k,l]
Simply swap the second and third axes, and then merge the new second axis (old third one) with the first one with reshaping -
output = input_array.swapaxes(1,2).reshape(N*D,C,H,W)
We can also use transpose : input_array.transpose(0,2,1,3,4) to get the same swapping axes effect.
For a general intuitive method, please refer to Intuition and idea behind reshaping 4D array to 2D array in NumPy.
Related
let’s say I have a Tensor X with dimensions [batch, channels, H, W]
then I have another tensor b that holds bias values for each channel which has dims [channels,]
I want y = x + b (per sample)
Is there a nice way to broadcast this over H and W for each channel for each sample in the batch without using a loop.
If i’m convolving I know I can use the bias field in the function to achieve this, but I’m just wondering if this can be achieved just with primitive ops (not using explicit looping)
Link to PyTorch forum question
y = x + b[None, :, None, None] (basically expand b into x's axis template)
My pytorch code is running too slow due to it not being vectorized and I am unsure how to go about vectorizing it as I am relatively new to PyTorch. Can someone help me do this or point me in the right direction?
level_stride = 8
loc = torch.zeros(H * W, 2)
for i in range(H):
for j in range(W):
loc[i * H + j][0] = level_stride * (j + 0.5)
loc[i * H + j][1] = level_stride * (i + 0.5)
First of all, you defined the tensor to be of size (H*W, 2). This is of course entirely optional, but it might be more expressive to preserve the dimensionality explicitly, by having H and W be separate dimension in the tensor. That makes some operations later on easier as well.
The values you compute to fill the tensor with originate from ranges. We can use the torch.arange function to get the same ranges, already in form of a tensor, ready to be put into your loc tensor. If that is done, you could see it as, completely leaving out the for loops, and just treating j and i as different tensors.
If you're not familiar with tensors, this might seem confusing, but operations between single number values and tensors work just as well, so not much of the rest of the code has to be changed.
I'll give you a expression of how your code could look like with these changes applied:
level_stride = 8
loc = torch.zeros(H, W, 2)
j = torch.arange(W).expand((H, W))
loc[:, :, 0] = level_stride * (j + 0.5)
i = torch.arange(H).expand((W, H)).T
loc[:,:,1] = level_stride * (i + 0.5)
The most notable changes are the assignments to j and i, and the usage of slicing to fill the data into loc.
For completeness, let's go over the expressions that are assigned to i and j.
j starts as a torch.arange(W) which is just like a regular range, just in form of a tensor. Then .expand is applied, which you could see as the tensor being repeated. For example, if H had been 5 and W 2, then a range of 2 would have been created, and expanded to a size of (5, 2). The size of this tensor thereby matches the first two sizes in the loc tensor.
i starts just the same, only that W and H swap positions. This is due to i originating from a range based on H rather then W. Notable here is that .T is applied at the end of that expression. The reason for this is that the i tensor still has to match the first two dimensions of loc. .T transposes the tensor for this reason.
If you have a subject specific reason to have the loc tensor in the (H*W, 2) shape, but are otherwise happy with this solution, you could reshape the tensor in the end with loc.reshape(H*W,2).
I have a batch of images (4d tensor/array with dimensions "batchsize x channels x height x width" and I would like to draw horizontal bars of zeros of size s on each image, but across different rows for each image. I can do this trivially with a for loop, but I haven't been able to figure out a vectorized implementation.
Ideally I would generate a 1-D tensor r of "batchsize" random starting points, and do something like
t[:,:,r:r+s,:] = 0. If I try this I get TypeError: only integer scalar arrays can be converted to a scalar index
If I do a toy example and just try to pull out two different sections of a batch with only two images, doing something like t[:,:,torch.tensor(([1,2],[2,3])),:] I get back a 5D tensor because it is pulling both of those sections from both images in the batch. How do I grab those different sections but only one for each image? In this case if the input were 2xCxHxW I would want 2xCx2xW where the first item corresponds to rows 1 and 2 of the first image, and the second item corresponds to rows 2 and 3 of the second image. Thank you.
You can use this function which will create a mask where you can perform operations across the y or x axis by their index. You can do this by arranging the x values of the index to be set to their y index.
bsg = sgs.data
device = sgs.device
bs, _, x, y = bsg.shape
max_y = y-size-1
rs = torch.randint(0, max_y, (bs,1), device=device)
m = torch.arange(y,device=device).repeat(bs, x)
gpumask = ((m < rs) | (m > (rs+size))).view(bs, 1, x, -1)
gpumask*bsg
What I am trying to do is take a numpy array representing 3D image data and calculate the hessian matrix for every voxel. My input is a matrix of shape (Z,X,Y) and I can easily take a slice along z and retrieve a single original image.
gx, gy, gz = np.gradient(imgs)
gxx, gxy, gxz = np.gradient(gx)
gyx, gyy, gyz = np.gradient(gy)
gzx, gzy, gzz = np.gradient(gz)
And I can access the hessian for an individual voxel as follows:
x = 100
y = 100
z = 63
H = [[gxx[z][x][y], gxy[z][x][y], gxz[z][x][y]],
[gyx[z][x][y], gyy[z][x][y], gyz[z][x][y]],
[gzx[z][x][y], gzy[z][x][y], gzz[z][x][y]]]
But this is cumbersome and I can't easily slice the data.
I have tried using reshape as follows
H = H.reshape(Z, X, Y, 3, 3)
But when I test this by retrieving the hessian for a specific voxel the, the value returned from the reshaped array is completely different than the original array.
I think I could use zip somehow but I have only been able to find that for making lists of tuples.
Bonus: If there's a faster way to accomplish this please let me know, I essentially need to calculate the three eigenvalues of the hessian matrix for every voxel in the 3D data set. Calculating the hessian values is really fast but finding the eigenvalues for a single 2D image slice takes about 20 seconds. Are there any GPUs or tensor flow accelerated libraries for image processing?
We can use a list comprehension to get the hessians -
H_all = np.array([np.gradient(i) for i in np.gradient(imgs)]).transpose(2,3,4,0,1)
Just to give it a bit of explanation : [np.gradient(i) for i in np.gradient(imgs)] loops through the two levels of outputs from np.gradient calls, resulting in a (3 x 3) shaped tensor at the outer two axes. We need these two as the last two axes in the final output. So, we push those at the end with the transpose.
Thus, H_all holds all the hessians and hence we can extract our specific hessian given x,y,z, like so -
x = 100
y = 100
z = 63
H = H_all[z,y,x]
I have two square matrices of the same size and the dimensions of a square patch. I'd like to compute the dot product between every pair of patches. Essentially I would like to implement the following operation:
def patch_dot(A, B, patch_dim):
res_dim = A.shape[0] - patch_dim + 1
res = np.zeros([res_dim, res_dim, res_dim, res_dim])
for i in xrange(res_dim):
for j in xrange(res_dim):
for k in xrange(res_dim):
for l in xrange(res_dim):
res[i, j, k, l] = (A[i:i + patch_dim, j:j + patch_dim] *
B[k:k + patch_dim, l:l + patch_dim]).sum()
return res
Obviously this would be an extremely inefficient implementation. Tensorflow's tf.nn.conv2d seems like a natural solution to this as I'm essentially doing a convolution, however my filter matrix isn't fixed. Is there a natural solution to this in Tensorflow, or should I start looking at implementing my own tf-op?
The natural way to do this is to first extract overlapping image patches of matrix B using tf.extract_image_patches, then to apply the tf.nn.conv2D function on A and each B sub-patch using tf.map_fn.
Note that prior to use tf.extract_image_patches and tf.nn.conv2D you need to reshape your matrices as 4D tensors of shape [1, width, height, 1] using tf.reshape.
Also, prior to use tf.map_fn, you would also need to use the tf.transpose op so that the B sub-patches are indexed by the first dimension of the tensor you use as the elems argument of tf.map_fn.