I have a 3D tensor containing a batch of sentences. Therefore the shape is [batch size, number of words in the sentence, embedding dimension]. In a toy example I'm working on the actual dimension is [256, 84, 32]. Note, in this example I zero pad the number of words in the sentences. I want to pass this tensor through a dense layer using tf.keras.layers.Dense. I defined the dense layer as dense=tf.keras.layers.Dense(32). However when I send the 3D tensor to this dense layer I get the following error:
ValueError: Tensor's shape (128, 128) is not compatible with supplied shape [32, 32]
Thanks!
Related
I'm trying to get my head around 1D convolution - specifically, how the padding comes into it.
Suppose I have an input sequence of shape (batch,128,1) and run it through the following Keras layer:
tf.keras.layers.Conv1D(32, 5, strides=2, padding="same")
I get an output of shape (batch,64,32), but I don't understand why the sequence length has reduced from 128 to 64... I thought the padding="same" parameter kept the output length the same as the input? I suppose that's only true if strides=1; so in this case I'm confused about what padding="same" actually means.
According to the TensorFlow documents in your case we have:
filters (Number of filters - output dimension) = 32
kernelSize (The filter size) = 5
strides (The unit to move in input data by the convolution layer in each dimensions after applying each convolution) = 2
So applying input in shape (batch, 128, 1) will be to apply 32 kernels (in shape 5) and jump two unit after each convolution - so we have 128 / 2 = 64 value corresponding to each filter and at the end output would be in shape (batch, 64, 32).
padding="same" is just determining the the convolution on borders. For more details you can check here.
I'm currently working with the TensorFlow Addons SpatialPyramidPooling2D layer for image classification and I got the following error when I tried to fit the model.
ValueError: Dimensions must be equal, but are 8 and 20 for '{{node MatMul}} = BatchMatMulV2[T=DT_FLOAT, adj_x=false, adj_y=false](feature, transpose_1)' with input shapes: [?,20,8], [8,20,?]
I doubt that it's something to do with the output shape of the model. The last layer is supposed to be (None,<number_of_classes>) but I got (None,<number_of_channels>,<number_of_classes>). Because the output of SpatialPyraidPooling2D is a 3D tensor.
I tried to solve it by adding a Flatten layer right after SpatialPyramidPooling2D but it ends up the softmax layer giving me an error as below
ValueError: Input 0 of layer dense is incompatible with the layer: expected axis -1 of input shape to have value 1280 but received input with shape [None, 25600]
If you want output of shape (None, 8), I suggest you add a 1D pooling layer after the pyramid pooling thing.
import tensorflow as tf
x = tf.random.uniform((10, 20, 8), dtype=tf.float32)
pool = tf.keras.layers.GlobalAveragePooling1D()
print(pool(x).shape)
TensorShape([10, 8])
I'm building an image classifier model which classifies Handwritten digits MNIST 28x28 grayscale images using CNN
Here is my layer defination
model = keras.Sequential()
model.add(keras.layers.Conv2D(64,(3,3),activation='relu',input_shape=(28,28,1)))
model.add(keras.layers.MaxPool2D((2,2)))
model.add(keras.layers.Conv2D(64,(3,3),activation='relu'))
model.add(keras.layers.MaxPool2D((2,2)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(200,activation='relu'))
model.add(keras.layers.Dense(10,activation='softmax'))
But i get this error when i fit the model
ValueError: Input 0 of layer sequential_6 is incompatible with the layer: : expected min_ndim=4, found ndim=3. Full shape received: [32, 28, 28]
And also i want to know why we should mention 1 in input_shape in Conv2D layer.The image shape is 28x28 but we should mention 1 there.
The minimal change that should work is to change the line:
model.add(keras.layers.Conv2D(64,(3,3),activation='relu',input_shape=(28,28,1)))
to this, dropping the 1:
model.add(keras.layers.Conv2D(64,(3,3),activation='relu',input_shape=(28,28)))
The reason you have the error is that your input image is 28x28 and the batch size you feed into the network has 32 images, thus an array of dimension [32, 28, 28]. Unfortunately I don't see how you feed the input to the network. But what your current code expect is an array of dimension [32, 28, 28, 1]. If that's a numpy array that you can manipulate, just reshape() it to such dimension will solve the problem.
What I suggested above is to do the other way round, ask the network to expect each image of 2D array of dimension [28,28] instead of 3D array of dimension [28,28,1]
Update:
You provided the following code change that made it work:
train_image=train_image.reshape(60000, 28, 28, 1)
train_image=train_image / 255.0
test_image = test_image.reshape(10000, 28, 28, 1)
test_image=test_image/255.0
What this does is that your input images are in a single huge numpy array and you fit your model with it directly. The model fit function will select "tensors" from this array from its first dimension and create a batch for each training step. The batch size is 32, so it will implicitly create an array of shape (32, 28, 28, 1) and pass it down the layers. The 2nd to 4th dimension is merely copied from the original array.
The reshape() command is to change the dimension of the array. Your original array before reshape was (60000, 28, 28) and if you lay it out as a single sequence of numbers, there will be 6000x28x28 floats. What reshape() does is to pick up these numbers and fill them into a (60000, 28, 28, 1) array, which expects 60000x28x28x1 numbers, so it can be filled exactly.
This seems to be one of the most common questions about LSTMs in PyTorch, but I am still unable to figure out what should be the input shape to PyTorch LSTM.
Even after following several posts (1, 2, 3) and trying out the solutions, it doesn't seem to work.
Background: I have encoded text sequences (variable length) in a batch of size 12 and the sequences are padded and packed using pad_packed_sequence functionality. MAX_LEN for each sequence is 384 and each token (or word) in the sequence has a dimension of 768. Hence my batch tensor could have one of the following shapes: [12, 384, 768] or [384, 12, 768].
The batch will be my input to the PyTorch rnn module (lstm here).
According to the PyTorch documentation for LSTMs, its input dimensions are (seq_len, batch, input_size) which I understand as following.
seq_len - the number of time steps in each input stream (feature vector length).
batch - the size of each batch of input sequences.
input_size - the dimension for each input token or time step.
lstm = nn.LSTM(input_size=?, hidden_size=?, batch_first=True)
What should be the exact input_size and hidden_size values here?
You have explained the structure of your input, but you haven't made the connection between your input dimensions and the LSTM's expected input dimensions.
Let's break down your input (assigning names to the dimensions):
batch_size: 12
seq_len: 384
input_size / num_features: 768
That means the input_size of the LSTM needs to be 768.
The hidden_size is not dependent on your input, but rather how many features the LSTM should create, which is then used for the hidden state as well as the output, since that is the last hidden state. You have to decide how many features you want to use for the LSTM.
Finally, for the input shape, setting batch_first=True requires the input to have the shape [batch_size, seq_len, input_size], in your case that would be [12, 384, 768].
import torch
import torch.nn as nn
# Size: [batch_size, seq_len, input_size]
input = torch.randn(12, 384, 768)
lstm = nn.LSTM(input_size=768, hidden_size=512, batch_first=True)
output, _ = lstm(input)
output.size() # => torch.Size([12, 384, 512])
The image passed to CNN layer and lstm layer,the feature map shape changes like this
BCHW->BCHW(BxCx1xW),
the CNN's output shape should has the height 1.
then sqeeze the dim of height.
BCHW->BCW
in rnn ,shape name changes,[batch ,seqlen,input_size],in image,[batch,width,channel],
**BCW->BWC,**this is batch_first tensor for LSTM layer(like pytorch).
Finally:
BWC is [batch,seqlen,channel].
I'd like to do transfer learning from a pre-trained model. I'm following the guide for retrain from Tensorflow.
However, I'm stuck in an error tensorflow.python.framework.errors_impl.InvalidArgumentError: Shapes must be equal rank, but are 3 and 2 for 'input_1/BottleneckInputPlaceholder' (op: 'PlaceholderWithDefault') with input shapes: [1,?,128].
# Last layer of pre-trained model
# `[<tf.Tensor 'embeddings:0' shape=(?, 128) dtype=float32>]`
with tf.name_scope('input'):
bottleneck_input = tf.placeholder_with_default(
bottleneck_tensor,
shape=[None, 128],
name='BottleneckInputPlaceholder')
Any ideas?
This is happening because your bottleneck_tensor is of shape [1, ?, 128] and you are explicitly stating that the shape should be [?, 128]. You can use tf.squeeze to reduce convert your tensor in the required shape as
tf.squeeze(bottleneck_tensor)