currently I'm working on a neural network that can classify the numbers in the Street View House Number dataset (http://ufldl.stanford.edu/housenumbers/). For now, I'm just trying to do it on the second format, the one similar to the MNIST dataset.
The problem I've encountered is that the shapes of the train and test arrays of examples are (HEIGHT, WIDTH, CHANNELS, EXAMPLES) rather than (EXAMPLES, HEIGHT, WIDTH, CHANNELS).
Is there a simple way to reshape the array to what I want without using many nested loops?
I'm not sure if the object you are trying to reshape is a Tensor or numpy.ndarray.
If it is a numpy.ndarray, you can use np.transpose. For example:
import numpy as np
a = np.zeros((299, 299, 3, 50))
print(a.shape) # (299, 299, 3, 50) H x W x C x M
b = np.transpose(a, [3, 0, 1, 2])
print(b.shape) # (50, 299, 299, 3)
If it is a Tensor, You can use tf.transpose to change the order of the dimension in exactly the same way as np.transpose. For example:
import tensorflow as tf
a = tf.zeros((299, 299, 3, 50), dtype=tf.int32)
print(a.shape.as_list()) # [299, 299, 3, 50]
b = tf.transpose(a, [3, 0, 1, 2])
print(b.shape.as_list()) # [50, 299, 299, 3]
Related
During training, I load image and disparity data. The image tensor is of shape: [2, 3, 256, 256], and disparity/depth tensor is of shape: [2, 1, 256, 256] (batch size, channels, height, width).
I want to use Conv3D, so I need to combine these two tensors and create a new tensor of shape: [2, 3, 256, 256, 256] (batch size, channels, depth, height, width).
The depth values range from 0-400, and a possibility is to divide that into intervals, e.g., 4 intervals of 100. I want the resulting tensor to look like a voxel, similarly to the technique used in this paper. The training loop that iterates over the data is below:
for batch_id, sample in enumerate(train_loader):
sample = {name: tensor.cuda() for name, tensor in sample.items()}
# image tensor [2, 3, 256, 256]
rgb_image = transforms.Lambda(lambda x: x.mul(255))(sample["frame"])
# translate disparity to depth
depth_from_disparity_frame = 132.28 / sample["disparity_frame"]
# depth tensor [2, 1, 256, 256]
depth_image = depth_from_disparity_frame.unsqueeze(1)
From the article your linked:
We create a
3D voxel representation, with the same height and width as
the original image, and with a depth determined by the difference between the maximum and minimum depth values
found in the images. Each RGB-D pixel of an image is then
placed at the same position in the voxel grid but at its corresponding depth.
This is what Ivan suggested, more or less. If you know that your depth will always be 0-400 and I imagine that you can skip the first part of "depth determined by the difference between the maximum and minimum depth values". This could always be normalized before-hand or later.
Code using dummy data:
import torch
import torch.nn.functional as F
# Declarations (dummy tensors)
rgb_im = torch.randint(0, 255, [1, 3, 256, 256])
depth = torch.randint(0, 400, [1, 1, 256, 256])
# Calculations
depth_ohe = F.one_hot(depth, num_classes=400) # of shape (batch, channel, height, width, binary)
bchwd_tensor = rgb_im.unsqueeze(-1)*depth_ohe # of shape (batch, channel, height, width, depth)
bcdhw_tensor = bchwd_tensor.permute(0, 1, 4, 2, 3) # of shape (batch, channel, depth, height, width)
I'm trying to use Perceptron to reduce a tensor of size: [1, 24, 768] to another tensor with size of [1, 1, 768]. The only way I could use was to first reshape the input tensor to [1, 1, 24*768] and then pass it through linear layers. I'm wondering if there's a more elegant way of this transformation --other than using RNNs cause I do not want to use that. Are there other methods generally for the transformation that I want to make? Here is my code for doing the above operation:
lin = nn.Linear(24*768, 768)
# x is in shape of [1, 24, 768]
# out is in shape of [1, 1, 768]
x = x.view(1,1,-1)
out = lin(x)
If the broadcasting is what's bothering you, you could use a nn.Flatten to do it:
>>> m = nn.Sequential(
... nn.Flatten(),
... nn.Linear(24*768, 768))
>>> x = torch.rand(1, 24, 768)
>>> m(x).shape
torch.Size([1, 768])
If you really want the extra dimension you can unsqueeze the tensor on axis=1:
>>> m(x).unsqueeze(1).shape
torch.Size([1, 1, 768])
Here I have one tensor with (1084, 1625, 3) shape.
And I need to reshape it to (none,none,none ,3).
how can i do that?
I used this code but it does not work.
image = tf.cast(img, tf.float32)
image = (image / 127.5) - 1
I don't think you can do that. I think what you're trying to do is turn a 3D tensor into a 4D tensor. I'm guessing this is the origin of your problem. You can do this to add a 4th dimension, because Tensorflow needs it:
import tensorflow as tf
tensor = tf.random.uniform((100, 100, 3), 0, 256, dtype=tf.int32)
new = tf.expand_dims(tensor, axis=0)
print(new.shape)
Out[14]: TensorShape([1, 100, 100, 3])
But then I could be wrong. If this is the case you can provide your error traceback and code.
I want to resize a 3-D RBG tensor in pytorch. I know how to resize a 4-D tensor, but unfortunalty this method does not work for 3-D.
The input is:
#input shape: [3, 100, 200] ---> desired output shape: [3, 80, 120]
if I have a 4-D vector it works fine.
#input shape: [2, 3, 100, 200]
out = torch.nn.functional.interpolate(T,size=(100,80), mode='bilinear')
Any suggestions? Thanks in advance!
Thanks to jodag I found the answer:
# input shape [3, 200, 120]
T = T.unsqueeze(0)
T = torch.nn.functional.interpolate(T,size=(100,80), mode='bilinear')
T = T.squeeze(0)
# output shape [3, 100, 80]
There is a 25*15 image, and i want to identify what it is by using CNN.
When training my CNN, I input a numpy named 'img' as datasets which shape is (200, 375):
sess.run(train, feed_dict={X: imgs, Y: labels}
This numpy contains 200 sample ,each of them have 375 features.
But when i reshape this numpy to a (-1, 25, 15, 1) Tensor:
X = tf.placeholder(tf.float32, [None, 375])
X = tf.reshape(X,[-1,25,15,1])
Something wrong happened:
Cannot feed value of shape (200, 375) for Tensor 'Reshape:0', which has shape '(?, 25, 15, 1)'
I don't know why it can't work, 25*15 is indeed 375.
Thank you!
You don't seem to reshape the dict variable you are feeding to the placeholder. You have to reshape your img variable as well into shape [-1, 25, 15, 1]