In keras documentation, input tensor for dense layer takes the input as:
Input shape
nD tensor with shape: (batch_size, ..., input_dim). The most common
situation would be a 2D input with shape (batch_size, input_dim).
To my understanding, batch size in input tensor is the amount of examples you give for training or predicting.
For the batch_size in model.fit,
batch_size: Integer or None.
Number of samples per gradient update. If unspecified, batch_size will
default to 32.
So are the 2 batch size doing the same thing, reducing the input data so as to prevent memory from filling up completely?
Also, I understand that the batch_size in input shape is optional, as keras puts a None if not specified. Is specifying batch_size necessary in model.fit?
Both batch_size arguments are referring to the same thing, i.e. what you described as how many examples to feed into the model at once.
As for your other answer, it is not necessary for the model.fit function from the official keras website (https://keras.io/models/model/) under the model.fit function "batch_size: Integer or None. Number of samples per gradient update. If unspecified, batch_size will default to 32" similar to the input shape.
Related
I'm working in the field of machine learning.
For the stronger Network, I'm going to adopt the techniques concerning Conv1D.
The input data is an one-dimension list data so I just would've thought that Conv1D is the best choice.
What would happen if the input size is (1, 740)? Would it be okay the input channel is 1?
I mean,I have a feeling that the (1, 740) tensor's conv1D output should be the same with that of a simple Linear networks.
Of course I'll also include other conv1d layer, like below.
self.conv1 = torch.nn.Conv1d(in_channels=1, out_channels=64, kernel_size=5)
self.conv2 = torch.nn.Conv1d(in_channels=64,out_channels=64, kernel_size=5)
self.conv3 = torch.nn.Conv1d(in_channels=64, out_channels=64, kernel_size=5)
self.conv4 = torch.nn.Conv1d(in_channels=64, out_channels=64, kernel_size=5)
Would it make sense when an input channel is 1?
Thanks in advance. :)
I think it's fine.
Note that the input of Conv1D should be (B, N, M), where B is the batch size, N is the number of channels (e.g. for RGB is 3) and M is the number of features.
The out_channels refers to the number of 5x5 filters to use. look at the output shape of the following code:
k = nn.Conv1d(1,64,kernel_size=5)
input = torch.randn(1, 1, 740)
print(k(input).shape) # -> torch.Size([1, 64, 736])
The 736 is the result of not using padding the dimension isn't kept.
The nn.Conv1d layer takes an input of shape (b, c, w) (where b is the batch size, c the number of channels, and w the input width). Its kernel size is one-dimensional. It performs a convolution operation over the input dimension (batch and channel axes aside). This means the kernel will apply the same operation over the whole input (wether 1D, 2D, or 3D). Like a 'sliding window'. As such, it only has kernel_size parameters. This is the main characteristic of a convolution layer.
Conv1d allows to extract features on the input regardless of where it's located in the input data: at the beginning or at the end of your w-width input. This would make sense if your input is temporal (input sequence over time) or spatial data (an image).
On the other hand, a nn.Linear takes a 1D tensor as input and returns another 1D tensor. You could consider w to be the number of neurons. You would end up having w*output_dim parameters. If your input contains components which are independant from one another (like a One/Multi-Hot-Encoding) then a fully connected layer as nn.Linear implements would be prefered.
These two behave differently. When using a nn.Linear - in scenarios where you should use a nn.Conv1d - your training would ideally result in having neurons of equal weights, if that makes sense... but you probably won't. Fully-densely-connected layers were used in the past in deep learning for computer vision. Today convolutions are used because there are much more efficient and suitable for these types of tasks.
I would like to create a 'Sequential' model (a Time Series model as you might have guessed), that takes 20 days of past data with a feature size of 2, and predict 1 day into the future with the same feature size of 2.
I found out you need to specify the batch size for a stateful LSTM model, so if I specify a batch size of 32 for example, the final output shape of the model is (32, 2), which I think means the model is predicting 32 days into the future rathen than 1.
How would I go on fixing it?
Also, asking before I arrive to the problem; if I specify a batch size of 32 for example, but I want to predict on an input of shape (1, 20, 2), would the model predict correctly or what, since I changed to batch size from 32 to 1. Thank you.
You don't need to specify batch_size. But you should feed 3-d tensor:
import tensorflow as tf
from tensorflow.keras.layers import Input, LSTM, Dense
from tensorflow.keras import Model, Sequential
features = 2
dim = 128
new_model = Sequential([
LSTM(dim, stateful=True, return_sequences = True),
Dense(2)
])
number_of_sequences = 1000
sequence_length = 20
input = tf.random.uniform([number_of_sequences, sequence_length, features], dtype=tf.float32)
output = new_model(input) # shape is (number_of_sequences, sequence_length, features)
predicted = output[:,-1] # shape is (number_of_sequences, 1, features)
Shape of (32, 2) means that your sequence length is 32.
Batch size is a parameter of training (how many sequences should be feeded to the model before backpropagating error - see stochastic graient descent method). It doesn't affect your data (which shoud be 3-d - (number of sequences, length of sequence, feature)).
If you need to predict only one sequence - just feed tensor of shape (1, 20, 2) to the model.
This seems to be one of the most common questions about LSTMs in PyTorch, but I am still unable to figure out what should be the input shape to PyTorch LSTM.
Even after following several posts (1, 2, 3) and trying out the solutions, it doesn't seem to work.
Background: I have encoded text sequences (variable length) in a batch of size 12 and the sequences are padded and packed using pad_packed_sequence functionality. MAX_LEN for each sequence is 384 and each token (or word) in the sequence has a dimension of 768. Hence my batch tensor could have one of the following shapes: [12, 384, 768] or [384, 12, 768].
The batch will be my input to the PyTorch rnn module (lstm here).
According to the PyTorch documentation for LSTMs, its input dimensions are (seq_len, batch, input_size) which I understand as following.
seq_len - the number of time steps in each input stream (feature vector length).
batch - the size of each batch of input sequences.
input_size - the dimension for each input token or time step.
lstm = nn.LSTM(input_size=?, hidden_size=?, batch_first=True)
What should be the exact input_size and hidden_size values here?
You have explained the structure of your input, but you haven't made the connection between your input dimensions and the LSTM's expected input dimensions.
Let's break down your input (assigning names to the dimensions):
batch_size: 12
seq_len: 384
input_size / num_features: 768
That means the input_size of the LSTM needs to be 768.
The hidden_size is not dependent on your input, but rather how many features the LSTM should create, which is then used for the hidden state as well as the output, since that is the last hidden state. You have to decide how many features you want to use for the LSTM.
Finally, for the input shape, setting batch_first=True requires the input to have the shape [batch_size, seq_len, input_size], in your case that would be [12, 384, 768].
import torch
import torch.nn as nn
# Size: [batch_size, seq_len, input_size]
input = torch.randn(12, 384, 768)
lstm = nn.LSTM(input_size=768, hidden_size=512, batch_first=True)
output, _ = lstm(input)
output.size() # => torch.Size([12, 384, 512])
The image passed to CNN layer and lstm layer,the feature map shape changes like this
BCHW->BCHW(BxCx1xW),
the CNN's output shape should has the height 1.
then sqeeze the dim of height.
BCHW->BCW
in rnn ,shape name changes,[batch ,seqlen,input_size],in image,[batch,width,channel],
**BCW->BWC,**this is batch_first tensor for LSTM layer(like pytorch).
Finally:
BWC is [batch,seqlen,channel].
I am looking here at the Bahdanau attention class. I noticed that the final shape of the context vector is (batch_size, hidden_size). I am wondering how they got that shape given that attention_weights has shape (batch_size, 64, 1) and features has shape (batch_size, 64, embedding_dim). They multiplied the two (I believe it is a matrix product) and then summed up over the first axis. Where is the hidden size coming from in the context vector?
The context vector resulting from Bahdanau attention is a weighted average of all the hidden states of the encoder. The following image from Ref shows how this is calculated. Essentially we do the following.
Compute attention weights, which is a (batch size, encoder time steps, 1) sized tensor
Multiply each hidden state (batch size, hidden size) element-wise with e values. Resulting in (batch_size, encoder timesteps, hidden size)
Average over the time dimension, resulting in (batch size, hidden size)
The answer given is incorrect. Let me explain why first, before I share what the actual answer is.
Take a look at the concerned code in the hyperlink provided. The 'hidden size' in the code refers to the dimensions of the hidden state of the decoder and NOT the hidden state(s) of the encoder as the answer above has assumed. The above multiplication in code will yield (batch_size, embedding_dim) as the question-framer mg_nt rightly points out.. The context is a weighted sum of encoder output and SHOULD have the SAME dimension as the encoder o/ps. Mathematically also one should NOT get (batch size, hidden size).
Of course in this case they are using Attention over a CNN. So there is no encoder as such but the image is broken down into features. These features are collected from the last but 1 layer and each feature is a specific component of the overall image. The hidden state from the decoder..ie the query, 'attend's to all these features and decides which ones are important and need to be given a higher weightage to determine the next word in the caption. The features shape in the above code is (batch_size, embedding_dim) and hence the context shape after being magnified or diminished by the attention weights will also be (batch_size, embedding_dim)!
This is simply a mistake in the comments of the concerned code (the code functionality itself seems right). The shape mentioned in the comments are incorrect. If you search the code for 'hidden_size' there is no such variable. It is only mentioned in the comments. If you further look at the declaration of the encoder and decoder they are using the same embedding size for both. So the code works, but the comments in the code are misleading and incorrect. That is all there is to it.
These is the image of the code LSTM model please help me to give appropriate input_dim value for the first LSTM layer
For the code you gave, all possible answers are wrong.
A LSTM layer accepts a 3D input tensor of shape (batch_dim, time_dim, feat_dim), and you should write input_shape=(time_dim, feat_dim) in the layer definition.
However, since you useX_train = np.expand_dims(X_train, axis=0), it implies there is only one training sample in your data, which completely makes no sense. I therefore suspect what you really want to do is
X_train = np.expand_dims(X_train, axis=-1)
which has X_train.shape[0] of samples, X_train.shape[1] of time steps, and only 1 feature dimension, which is sort of common in many time-series analysis problem.
If my guess is correct, then your input shape of LSTM should be of shape (X_train.shape[1], 1).
NOTE: the batch_dim is intentionally not specified by keras's setting, which makes sense, because if you include it in your model definition, then you have to use to this particular batch size for both training and testing, while this is very inconvenient.