In the following,
x_6 = torch.cat((x_1, x_2_1, x_3_1, x_5_1), dim=-3)
Sizes of tensors x_1, x_2_1, x_3_1, x_5_1 are
torch.Size([1, 256, 7, 7])
torch.Size([1, 256, 7, 7])
torch.Size([1, 256, 7, 7])
torch.Size([1, 256, 7, 7]) respectively.
The size of x_6 turns out to be torch.Size([1, 1024, 7, 7])
I couldn't understand & visualise this concatenation along a negative dimension(-3 in this case).
What exactly is happening here?
How does the same go if dim = 3?
Is there any constraint on dim for a given set of tensors?
The answer by danin is not completely correct, actually wrong when looked from the perspective of tensor algebra, since the answer indicates that the problem has to do with accessing or indexing a Python list. It isn't.
The -3 means that we concatenate the tensors along the 2nd dimension. (you could've very well used 1 instead of the confusing -3).
From taking a closer look at the tensor shapes, it seems that they represent (b, c, h, w) where b stands for batch_size, c stands for number of channels, h stands for height and w stands for width.
This is usually the case, somewhere at the final stages of encoding (possibly) images in a deep neural network and we arrive at these feature maps.
The torch.cat() operation with dim=-3 is meant to say that we concatenate these 4 tensors along the dimension of channels c (see above).
4 * 256 => 1024
Hence, the resultant tensor ends up with a shape torch.Size([1, 1024, 7, 7]).
Notes: It is hard to visualize a 4 dimensional space since we humans live in an inherently 3D world. Nevertheless, here are some answers that I wrote a while ago which will help to get some mental picture.
How to understand the term tensor in TensorFlow?
Very Basic Numpy array dimension visualization
Python provides negative indexing, so you can access elements starting from the end of the list e.g, -1 is the last element of a list.
In this case the tensor has 4 dimensions, so -3 is actually the 2nd element.
Related
I have a tensor T of the shape (8, 5, 300), where 8 is the batch size, 5 is the number of documents in each batch, and 300 is the encoding of each of the document. If I reshape the Tensor as follows, does the properties of my Tensor remain the same?
T = T.reshape(5, 300, 8)
T.shape
>> Size[5, 300, 8]
So, does this new Tensor indicate the same properties as the original one? By the properties, I mean, can I say that this is also a Tensor of batch size 8, with 5 documents for each batch, and a 300 dimensional encoding for each document?
Does this affect the training of the model? If reshaping of Tensor messes up the datapoints, then there is no point in training. For example, If reshaping like above gives output as a batch of 5 samples, with 300 documents of size 8 each. If it happens so, then it's useless, since I do not have 300 documents, neither do I have batch of 5 samples.
I need to reshape it like this because my model in between produces output of the shape [8, 5, 300], and the next layer accepts input as [5, 300, 8].
NO
You need to understand the difference between reshape/view and permute.
reshape and view only changes the "shape" of the tensor, without re-ordering the elements. Therefore
orig = torch.rand((8, 5, 300))
resh = orig.reshape(5, 300, 8)
orig[0, 0, :] != resh[0, :, 0]
If you want to change the order of the elements as well, you need to permute it:
perm = orig.permute(1, 2, 0)
orig[0, 0, :] == perm[0, :, 0]
NOOO!
I made a similar mistake.
Imagine you converting 2-d Tensor( Matrix) into 1-D Tensor(Array) and applying transform functionality on it. This would create serious issues in code as your new tensor has characteristic of an array.
Hope you got my point.
Let me express the title with an example:
Let A be a tensor of shape [16, 15, 128, 128] (which means [batchsize, channels, height, width])
Let B be a tensor of shape [16, 3, 128, 128] (which means [batchsize, channels, height, width])
I want to output a tensor of shape [16, 5, 128, 128] (which means [batchsize, channels, height, width])
Where the i_th channel of the 5 channels of the output is computed by
multiplying elementwise B with the i_th slice of 3 channels of A and them performing a sum along the channel dimension.
How would you do that operation in pytorch?
Thanks!
PD: It's very difficult to express what I want from the operation, if I wasn't clear, please ask me and I'll try to reexplain it
I think you are looking for torch.repeat_interleave to help you "extend" B tensor to have 15 channels (5 groups of the 3 input channels):
extB = torch.repeat_interleave(B, 3, dim=1) # extend B to have 15 channels
m = A * extB # element wise multiplication of A with the extended version of B
# using some reshaping and mean we can get the 5 output channels you want
out = m.view(m.shape[0], 5, 3, *m.shape[2:]).mean(dim=2)
I am bit puzzled by how to read and understand a simple line of code:
I have a tensor input of shape (19,4,64,64,3).
The line of code input[:, None] returns a tensor of shape (19, 1,
4, 64, 64, 3).
How should I understand the behavior of that line? It seems that None is adding a dimension, with a size of 1. But why is this added on that specific position (between 19 and 4)?
Indeed, None adds a new dimension. You can also use tf.newaxis for this which is a bit more explicit IMHO.
The new dimension is added in axis 1 because that's where it appears in the index. E.g. input[:, :, None] should result in shape (19, 4, 1, 64, 64, 3) and so on.
It might get clearer if we write all the dimensions in the slicing: input[:, None, :, :, :, :]. In slicing, : simply means taking all elements of the dimension. So by using one :, we take all elements of dimension 0 and then "move on" to dimension 1. Since None appears here, we know that the new size-1 axis should be in dimension 1. Accordingly, the remaining dimensions get "pushed back".
Use of unsqueeze():
input = torch.Tensor(2, 4, 3) # input: 2 x 4 x 3
print(input.unsqueeze(0).size()) # prints - torch.size([1, 2, 4, 3])
Use of view():
input = torch.Tensor(2, 4, 3) # input: 2 x 4 x 3
print(input.view(1, -1, -1, -1).size()) # prints - torch.size([1, 2, 4, 3])
According to documentation, unsqueeze() inserts singleton dim at position given as parameter and view() creates a view with different dimensions of the storage associated with tensor.
What view() does is clear to me, but I am unable to distinguish it from unsqueeze(). Moreover, I don't understand when to use view() and when to use unsqueeze()?
Any help with good explanation would be appreciated!
view() can only take a single -1 argument.
So, if you want to add a singleton dimension, you would need to provide all the dimensions as arguments. For e.g., if A is a 2x3x4 tensor, to add a singleton dimension, you would need to do A:view(2, 1, 3, 4).
However, sometimes, the dimensionality of the input is unknown when the operation is being used. Thus, we dont know that A is 2x3x4, but we would still like to insert a singleton dimension. This happens a lot when using minibatches of tensors, where the last dimension is usually unknown. In these cases, the nn.Unsqueeze is useful and lets us insert the dimension without explicitly being aware of the other dimensions when writing the code.
unsqueeze() is a special case of view()
For convenience, many python libraries have short-hand aliases for common uses of more general functions.
view() reshapes a tensor to the specified shape
unsqueeze() reshapes a tensor by adding a new dimension of depth 1
(i.e. turning an n.d tensor into an n+1.d tensor)
When to use unsqueeze()?
Some example use cases:
You have a model designed to intake RGB image tensors (3d: CxHxW), but your data is 2d greyscale images (HxW)
Your model is designed to intake batches of data (batch_size x dim1 x dim2 x ...), and you want to feed it a single sample (i.e. a batch of size 1).
Say I have a tensor with shape (?, 5, 1, 20)
For each occurence of the last dimension I do some computation (getting the k max values) on the last dimension that produces a smaller tensor b. What do I do if want to replace the last dimension of my original tensor with b?
What (preferably pure tensorflow) path should I take?
You're doing some computation on last dimension...That is you want to go from (?, 5, 1, 20) -> (?, 5, 1, b) if I understood correctly?
What kind of computation?
You could reshape your tensor, do the computation (such as matrix multiplication) and reshape back.
a = tf.reshape(X, [-1, 20])
a = tf.matmul(a, X)
a = tf.reshape(a, [-1, b])
Or you could use tf.einsum() to achieve similar feat. For non-linear computation depends what you want to do.
EDIT:
Also you could hack it with Conv2D and using filter of size [1,1, 20, b]. Does the same thing and more efficiently