I have a custom loss function (in keras) which receives as an input two batches, so their shapes (being the same for both) usually is (batch_size 128, 128, 3) being an image. Now, I want to perform two operations, which have the followings shapes:
(batch_size, 128, 128)
(batch_size)
Now, what I'd like to do is to sum the two tensor, but obviously I can't, so the idea is simple: expand the second to have the same shapes of the first, copying the single values in the other vacant positions.
How can I do that, considering it's a loss function?
Note: I already tried to reduce the first tensor and sum that with the second as scalars, but this solution doesn't work as I want it to.
If you wish to broadcast the same values multiple times, this is already supported. You just need to make sure your dimensions are lined up for broadcasting.
import tensorflow as tf
a = tf.constant([[[1,2],[2,3]],[[3,4],[4,5]],[[5,6],[6,7]]])
b = tf.constant([1,2,3])
print(a + b[:,None,None])
Related
I'm trying to recreate a transformer that was written in Pytorch and make it Tensorflow. Everything was going pretty well until each version of MultiHeadAttention started giving extremely different outputs. Both methods are an implementation of multi-headed attention as described in the paper "Attention is all you Need", so they should be able to achieve the same output.
I'm converting
self_attn = nn.MultiheadAttention(dModel, nheads, dropout=dropout)
to
self_attn = MultiHeadAttention(num_heads=nheads, key_dim=dModel, dropout=dropout)
For my tests, dropout is 0.
I'm calling them with:
self_attn(x,x,x)
where x is a tensor with shape=(10, 128, 50)
As expected from the documentation, the Pytorch version returns a tuple, (the target sequence length, embedding dimension), both with dimensions [10, 128, 50].
I'm having trouble getting the TensorFlow version to do the same thing. With Tensorflow I only get one tensor back, (size [10, 128, 50]) and it looks like neither the target sequence length or embedding dimension tensor from pytorch.
Based on the Tensorflow documentation I should be getting something comparable.
How can I get them to operate the same way? I'm guessing I'm doing something wrong with Tensorflow but I can't figure out what.
nn.MultiheadAttention outputs by default tuple with two tensors:
attn_output -- result of self-attention operation
attn_output_weights -- attention weights averaged(!) over heads
At the same time tf.keras.layers.MultiHeadAttention outputs by default only one tensor attention_output (which corresponds to attn_output of pytorch). Attention weights of all heads also will be returned if parameter return_attention_scores is set to True, like:
output, scores = self_attn(x, x, x, return_attention_scores=True)
Tensor scores also should be averaged to achieve full correspondence with pytorch:
scores = tf.math.reduce_mean(scores, 1)
While rewriting keep in mind that by default (as in snippet in question) nn.MultiheadAttention expects input in form (seq_length, batch_size, embed_dim), but tf.keras.layers.MultiHeadAttention expects it in form (batch_size, seq_length, embed_dim).
I'm working in the field of machine learning.
For the stronger Network, I'm going to adopt the techniques concerning Conv1D.
The input data is an one-dimension list data so I just would've thought that Conv1D is the best choice.
What would happen if the input size is (1, 740)? Would it be okay the input channel is 1?
I mean,I have a feeling that the (1, 740) tensor's conv1D output should be the same with that of a simple Linear networks.
Of course I'll also include other conv1d layer, like below.
self.conv1 = torch.nn.Conv1d(in_channels=1, out_channels=64, kernel_size=5)
self.conv2 = torch.nn.Conv1d(in_channels=64,out_channels=64, kernel_size=5)
self.conv3 = torch.nn.Conv1d(in_channels=64, out_channels=64, kernel_size=5)
self.conv4 = torch.nn.Conv1d(in_channels=64, out_channels=64, kernel_size=5)
Would it make sense when an input channel is 1?
Thanks in advance. :)
I think it's fine.
Note that the input of Conv1D should be (B, N, M), where B is the batch size, N is the number of channels (e.g. for RGB is 3) and M is the number of features.
The out_channels refers to the number of 5x5 filters to use. look at the output shape of the following code:
k = nn.Conv1d(1,64,kernel_size=5)
input = torch.randn(1, 1, 740)
print(k(input).shape) # -> torch.Size([1, 64, 736])
The 736 is the result of not using padding the dimension isn't kept.
The nn.Conv1d layer takes an input of shape (b, c, w) (where b is the batch size, c the number of channels, and w the input width). Its kernel size is one-dimensional. It performs a convolution operation over the input dimension (batch and channel axes aside). This means the kernel will apply the same operation over the whole input (wether 1D, 2D, or 3D). Like a 'sliding window'. As such, it only has kernel_size parameters. This is the main characteristic of a convolution layer.
Conv1d allows to extract features on the input regardless of where it's located in the input data: at the beginning or at the end of your w-width input. This would make sense if your input is temporal (input sequence over time) or spatial data (an image).
On the other hand, a nn.Linear takes a 1D tensor as input and returns another 1D tensor. You could consider w to be the number of neurons. You would end up having w*output_dim parameters. If your input contains components which are independant from one another (like a One/Multi-Hot-Encoding) then a fully connected layer as nn.Linear implements would be prefered.
These two behave differently. When using a nn.Linear - in scenarios where you should use a nn.Conv1d - your training would ideally result in having neurons of equal weights, if that makes sense... but you probably won't. Fully-densely-connected layers were used in the past in deep learning for computer vision. Today convolutions are used because there are much more efficient and suitable for these types of tasks.
I am trying to reshape an array of size (14,14,3) to (None, 14,14,3). I have seen that the output of each layer in convolutional neural network has shape in the format(None, n, n, m).
Consider that the name of my array is arr
I tried arr[None,:,:] but it converts it to a dimension of (1,14,14,3).
How should I do it?
https://www.tensorflow.org/api_docs/python/tf/TensorShape
A TensorShape represents a possibly-partial shape specification for a Tensor. It may be one of the following:
Partially-known shape: has a known number of dimensions, and an unknown size for one or more dimension. e.g. TensorShape([None, 256])
That is not possible in numpy. All dimensions of a ndarray are known.
arr[None,:,:] notation adds a new size 1 dimension, (1,14,14,3). Under broadcasting rules, such a dimension may be changed to match a dimension of another array. In that sense we often treat the None as a flexible dimension.
I haven't worked with tensorflow though I see a lot of questions with both tags. tensorflow should have mechanisms for transfering values to and from tensors. It knows about numpy, but numpy does not 'know' anything about tensorflow.
A ndarray is an object with known values, and its shape is used to access those values in a multidimensional way. In contrast a tensor does not have values:
https://www.tensorflow.org/api_docs/python/tf/Tensor
It does not hold the values of that operation's output, but instead provides a means of computing those values
Looks like you can create a TensorProt from an array (and return an array from one as well):
https://www.tensorflow.org/api_docs/python/tf/make_tensor_proto
and to make a Tensor from an array:
https://www.tensorflow.org/api_docs/python/tf/convert_to_tensor
The shape (None, 14,14,3) represent ,(batch_size,imgH,imgW,imgChannel) now imgH and imgW can be use interchangeably depends on the network and the problem.
But the batchsize is given as "None" in the neural network because we don't want to restrict our batchsize to some specific value as our batchsize depends on a lot of factors like memory available for our model to run etc.
So lets say you have 4 images of size 14x14x3 then you can append each image into the array say L1, and now the L1 will have the shape 4x14x14x3 i.e you made a batch of 4 images and now you can feed this to your neural network.
NOTE here None will be replaced by 4 and for the whole training process it will be 4. Similarly when you feed your network only one image it assumes the batchsize of 1 and set None equal to 1 giving you the shape (1X14X14X3)
I would like to create a simple Keras neural network that accepts an input matrix of dimension (rows, columns) = (n, m), flattens the matrix to a dimension (n*m, 1), sends the flattened matrix through a number of arbitrary layers, and in the final layer, once more unflattens the matrix to a dimension of (n, m) before releasing this final matrix as an output.
The issue I'm having is that I haven't found any documentation for an Unflatten layer at the keras.io page, and I'm wondering whether there is a reason that such a seemingly standard common use layer doesn't exist. Is there a much more natural and easy way to do what I'm proposing?
You can use the Reshape layer for this purpose. It accepts the desired output shape as its argument and would reshape the input tensor to that shape. For example:
from keras.layers import Reshape
rsh_inp = Reshape((n*m, 1))(inp) # if you don't want the last axis with dimension 1, you can also use Flatten layer
# rsh_inp goes through a number of arbitrary layers ...
# reshape back the output
out = Reshape((n,m))(out_rsh_inp)
Given 2 3D tensors t1 = [?, 1, 1, 1, 2048] and t2 = [?, 3, 1, 1, 256] seen in the image, how would these be concatenated? Currently, I am using:
tf.concat([t1, t2], 4)
However, given that my architecture has a large amount of layers with many concatenations, I eventually have a tensor that is too large (in terms of channels/features) to initialize. Is this the correct way to implement a concatenation layer?
First of all, the shapes of tensors in the inception layer are not like you define. 1x1, 1x3 and 3x1 are the shapes of the filters applied to the image. There are two more parameters in convolution: padding and striding, and depending on their exact values, the result shape can be very different.
In this particular case, the spatial shape doesn't change, only the channels dimension will be 2048 and 256, that's why they can be concatenated. The concatenation of your original t1 and t2 will result in error.
Is this the correct way to implement a concatenation layer?
Yes, feature map concatenation is one of key ideas of inception network and its implementation indeed uses tf.concat (e.g. see inception v1 source code).
Note that this tensor will grow in one direction (channels / features), but contract in spatial dimensions because of downsampling, so it won't get too large. Also note that this tensor is the transformed input data (image), hence unlike the weights, it's not initialized, but rather flows through the network. The weights will be the tensors 1x1x2048=2048, 1x3x224=672, 3x1x256=768, etc - as you can see they are not very big at all, and that's another idea of the inception network.