I am getting confused about the input shape to GRU layer.
I have a batch of 128 images and I extracted 9 features from each images.
So now my shape is (1,128,9).
This is the GRU layer
gru=torch.nn.GRU(input_size=128,hidden_size=8,batch_first=True)
Question 1: Is the input_size=128 correctly defined?
Here is the code of forward function
def forward(features):
features=features.permute(0,2,1)#[1, 9, 128]
x2,_=self.gru(features)
Question 2: Is the `code in forward function is correctly defined?
Thanks
No, input_size is not correctly defined. Here, input_size means the number of features in a single input vector of the sequence. The input to the GRU is a sequence of vectors, each input being a 1-D tensor of length input_size. In case of batched input, the input to GRU is a batch of sequence of vectors, so the shape should be (batch_size, sequence_length, input_size) when batch_first=True otherwise the expected shape is (sequence_length, batch_size, input_size) when batch_first=False
import torch
batch_size = 128
input_size = 9 # features in the input
seq_len = 5 # seqence length - how many input vectors in one sequence
hidden_size = 20 # the no of fetures in the output of GRU
gru=torch.nn.GRU(input_size=input_size,hidden_size=hidden_size,batch_first=True)
X = torch.rand( (batch_size, seq_len, input_size), dtype = torch.float32 )
print(f'{X.shape=}')
Y,_ = gru(X)
print(f'{Y.shape=}')
output
"""
X.shape=torch.Size([128, 5, 9])
Y.shape=torch.Size([128, 5, 20])
"""
Using batch_first=False
gru=torch.nn.GRU(input_size=input_size,hidden_size=hidden_size,batch_first=False)
X = torch.rand( (seq_len, batch_size, input_size), dtype = torch.float32 )
print(f'{X.shape=}')
Y,_ = gru(X)
print(f'{Y.shape=}')
output
"""
X.shape=torch.Size([5, 128, 9])
Y.shape=torch.Size([5, 128, 20])
"""
Related
I am trying to build a binary temporal image classifier by combining ResNet18 and an LSTM. However, I have never really used RNNs before and have been struggling on getting the correct output shape.
I am using a batch size of 128 and a sequence size of 32. The images are 80x80 grayscale images.
The current model is:
class CNNLSTM(nn.Module):
def __init__(self):
super(CNNLSTM, self).__init__()
self.resnet = models.resnet18(pretrained=False)
self.resnet.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3)
self.resnet.fc = nn.Sequential(nn.Linear(in_features=512, out_features=256, bias=True))
self.lstm = nn.LSTM(input_size=256, hidden_size=256, num_layers=3)
self.fc1 = nn.Linear(256, 128)
self.fc2 = nn.Linear(128, 1)
def forward(self, x_3d):
#x3d: torch.Size([128, 32, 1, 80, 80])
hidden = None
toret = []
for t in range(x_3d.size(1)):
x = self.resnet(x_3d[:, t, :, :, :])
out, hidden = self.lstm(x.unsqueeze(0), hidden)
x = self.fc1(out[-1, :, :])
x = F.relu(x)
x = self.fc2(x)
print("x shape: ", x.shape)
toret.append(x)
return torch.stack(toret)
Which returns a tensor of shape torch.Size([32, 128, 1]) which, according to what I understand, means that every nth row represents the nth time step of each element in the sequence.
How can I get output of shape 128x1x32 instead?
And is there a better way to do this?
You could permute the dimensions:
a = torch.rand(32, 128, 1)
a = a.permute(1, 2, 0) # these are the indices of the original dimensions
print(a.shape)
>> torch.Size([128, 1, 32])
But you could also set batch_first=True in the LSTM module:
self.lstm = nn.LSTM(input_size=256, hidden_size=256, num_layers=3, batch_first=True)
This will expect that the input to the LSTM has the shape batch-size x seq-len x features and will output a tensor in the same way.
I have a network which outputs a 3D tensor of size (batch_size, max_len, num_classes). My groud truth is in the shape (batch_size, max_len). If I do perform one-hot encoding on the labels, it'll be of shape (batch_size, max_len, num_classes) i.e the values in max_len are integers in the range [0, num_classes]. Since the original code is too long, I have written a simpler version that reproduces the original error.
criterion = nn.CrossEntropyLoss()
batch_size = 32
max_len = 350
num_classes = 1000
pred = torch.randn([batch_size, max_len, num_classes])
label = torch.randint(0, num_classes,[batch_size, max_len])
pred = nn.Softmax(dim = 2)(pred)
criterion(pred, label)
the shape of pred and label are respectively,torch.Size([32, 350, 1000]) and torch.Size([32, 350])
The error encountered is
ValueError: Expected target size (32, 1000), got torch.Size([32, 350, 1000])
If I one-hot encode labels for computing the loss
x = nn.functional.one_hot(label)
criterion(pred, x)
it'll throw the following error
ValueError: Expected target size (32, 1000), got torch.Size([32, 350, 1000])
From the Pytorch documentation, CrossEntropyLoss expects the shape of its input to be (N, C, ...), so the second dimension is always the number of classes. Your code should work if you reshape preds to be of size (batch_size, num_classes, max_len).
I'm having some inconsistencies with the output of a encoder I got from this github .
The encoder looks as follows:
class Encoder(nn.Module):
r"""Applies a multi-layer LSTM to an variable length input sequence.
"""
def __init__(self, input_size, hidden_size, num_layers,
dropout=0.0, bidirectional=True, rnn_type='lstm'):
super(Encoder, self).__init__()
self.input_size = 40
self.hidden_size = 512
self.num_layers = 8
self.bidirectional = True
self.rnn_type = 'lstm'
self.dropout = 0.0
if self.rnn_type == 'lstm':
self.rnn = nn.LSTM(input_size, hidden_size, num_layers,
batch_first=True,
dropout=dropout,
bidirectional=bidirectional)
def forward(self, padded_input, input_lengths):
"""
Args:
padded_input: N x T x D
input_lengths: N
Returns: output, hidden
- **output**: N x T x H
- **hidden**: (num_layers * num_directions) x N x H
"""
total_length = padded_input.size(1) # get the max sequence length
packed_input = pack_padded_sequence(padded_input, input_lengths,
batch_first=True,enforce_sorted=False)
packed_output, hidden = self.rnn(packed_input)
pdb.set_trace()
output, _ = pad_packed_sequence(packed_output, batch_first=True, total_length=total_length)
return output, hidden
So it only consists of a rnn lstm cell, if I print the encoder this is the output:
LSTM(40, 512, num_layers=8, batch_first=True, bidirectional=True)
So it should have a 512 sized output right? But when I feed a tensor with size torch.Size([16, 1025, 40]) 16 samples of 1025 vectors with size 40 (that gets packed to fit the RNN) the output that I get from the RNN has a new encoded size of 1024 torch.Size([16, 1025, 1024]) when it should have been encoded to 512 right?
Is there something Im missing?
Setting bidirectional=True makes the LSTM bidirectional, which means there will be two LSTMs, one that goes from left to right and the other that goes from right to left.
From the nn.LSTM documentation - Outputs:
output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features (h_t) from the last layer of the LSTM, for each t. If a torch.nn.utils.rnn.PackedSequence has been given as the input, the output will also be a packed sequence.
For the unpacked case, the directions can be separated using output.view(seq_len, batch, num_directions, hidden_size), with forward and backward being direction 0 and 1 respectively. Similarly, the directions can be separated in the packed case.
Your output has the size [batch, seq_len, 2 * hidden_size] (batch and seq_len are swapped in your case due to setting batch_first=True) because of using a bidirectional LSTM. The outputs of the two are concatenated in order to have the information of both, which you could easily separate if you wanted to treat them differently.
I want to create a basic LSTM network that accept sequences of 5 dimensional vectors (for example as a N x 5 arrays) and returns the corresponding sequences of 4 dimensional hidden- and cell-vectors (N x 4 arrays), where N is the number of time steps.
How can I do it TensorFlow?
ADDED
So, far I got the following code working:
num_units = 4
lstm = tf.nn.rnn_cell.LSTMCell(num_units = num_units)
timesteps = 18
num_input = 5
X = tf.placeholder("float", [None, timesteps, num_input])
x = tf.unstack(X, timesteps, 1)
outputs, states = tf.contrib.rnn.static_rnn(lstm, x, dtype=tf.float32)
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
x_val = np.random.normal(size = (12,18,5))
res = sess.run(outputs, feed_dict = {X:x_val})
sess.close()
However, there are many open questions:
Why number of time steps is preset? Shouldn't LSTM be able to accept sequences of arbitrary length?
Why do we split data by time-steps (using unstack)?
How to interpret the "outputs" and "states"?
Why number of time steps is preset? Shouldn't LSTM be able to accept
sequences of arbitrary length?
If you want to accept sequences of arbitrary length, I recommend using dynamic_rnn.You can refer here to understand the difference between them.
For example:
num_units = 4
lstm = tf.nn.rnn_cell.LSTMCell(num_units = num_units)
num_input = 5
X = tf.placeholder("float", [None, None, num_input])
outputs, states = tf.nn.dynamic_rnn(lstm, X, dtype=tf.float32)
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
x_val = np.random.normal(size = (12,18,5))
res = sess.run(outputs, feed_dict = {X:x_val})
x_val = np.random.normal(size = (12,16,5))
res = sess.run(outputs, feed_dict = {X:x_val})
sess.close()
dynamic_rnn require same length in one batch , but you can specify every length using the sequence_length parameter after you pad batch data when you need arbitrary length in one batch.
We do we split data by time-steps (using unstack)?
Just static_rnn needs to split data with unstack,this depending on their different input requirements. The input shape of static_rnn is [timesteps,batch_size, features], which is a list of 2D tensors of shape [batch_size, features]. But the input shape of dynamic_rnn is either [timesteps,batch_size, features] or [batch_size,timesteps, features] depending on time_major is True or False.
How to interpret the "outputs" and "states"?
The shape of states is [2,batch_size,num_units ] in LSTMCell, one [batch_size, num_units ] represents C and the other [batch_size, num_units ] represents h. You can see pictures below.
In the same way, You will get the shape of states is [batch_size, num_units ] in GRUCell.
outputs represents the output of each time step, so by default(time_major=False) its shape is [batch_size, timesteps, num_units]. And You can easily conclude that
state[1, batch_size, : ] == outputs[ batch_size, -1, : ].
I am creating an RNN with pytorch, it looks like this:
class MyRNN(nn.Module):
def __init__(self, batch_size, n_inputs, n_neurons, n_outputs):
super(MyRNN, self).__init__()
self.n_neurons = n_neurons
self.batch_size = batch_size
self.n_inputs = n_inputs
self.n_outputs = n_outputs
self.basic_rnn = nn.RNN(self.n_inputs, self.n_neurons)
self.FC = nn.Linear(self.n_neurons, self.n_outputs)
def init_hidden(self, ):
# (num_layers, batch_size, n_neurons)
return torch.zeros(1, self.batch_size, self.n_neurons)
def forward(self, X):
self.batch_size = X.size(0)
self.hidden = self.init_hidden()
lstm_out, self.hidden = self.basic_rnn(X, self.hidden)
out = self.FC(self.hidden)
return out.view(-1, self.n_outputs)
My input x looks like this:
tensor([[-1.0173e-04, -1.5003e-04, -1.0218e-04, -7.4541e-05, -2.2869e-05,
-7.7171e-02, -4.4630e-03, -5.0750e-05, -1.7911e-04, -2.8082e-04,
-9.2992e-06, -1.5608e-05, -3.5471e-05, -4.9127e-05, -3.2883e-01],
[-1.1193e-04, -1.6928e-04, -1.0218e-04, -7.4541e-05, -2.2869e-05,
-7.7171e-02, -4.4630e-03, -5.0750e-05, -1.7911e-04, -2.8082e-04,
-9.2992e-06, -1.5608e-05, -3.5471e-05, -4.9127e-05, -3.2883e-01],
...
[-6.9490e-05, -8.9197e-05, -1.0218e-04, -7.4541e-05, -2.2869e-05,
-7.7171e-02, -4.4630e-03, -5.0750e-05, -1.7911e-04, -2.8082e-04,
-9.2992e-06, -1.5608e-05, -3.5471e-05, -4.9127e-05, -3.2883e-01]],
dtype=torch.float64)
and is a batch of 64 vectors with size 15.
When trying to test this model by doing:
BATCH_SIZE = 64
N_INPUTS = 15
N_NEURONS = 150
N_OUTPUTS = 1
model = MyRNN(BATCH_SIZE, N_INPUTS, N_NEURONS, N_OUTPUTS)
model(x)
I get the following error:
File "/home/tt/anaconda3/envs/venv/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 126, in check_forward_args
expected_input_dim, input.dim()))
RuntimeError: input must have 3 dimensions, got 2
How can I fix it?
You are missing one of the required dimensions for the RNN layer.
Per the documentation, your input size needs to be of shape (sequence length, batch, input size).
So - with the example above, you are missing one of these. Based on your variable names, it appears you are trying to pass 64 examples of 15 inputs each... if that’s true, you are missing sequence length.
With an RNN, the sequence length is the number of times you want the layer to recur. For example, in NLP your sequence length might be equal to the number of words in a sentence, while batch size would be the number of sentences you are passing, and input size would be the vector size of each word.
You might not need an RNN here if you are just trying to do use 64 samples of size 15.
See the documentation, the RNN layer expects
input of shape (seq_len, batch, input_size): tensor containing the features of the input sequence.
In your case it seems that your "size" is the length of the sequence, and you have one feature at every timestep. Edited for 15 features, one timestep
# 15 features, 150 neurons
rnn = nn.RNN(15, 150)
# sequence of length 1, batch size 64, 15 features
x = torch.rand(1, 64, 15)
res, _ = rnn(x)
print(res.shape)
# => torch.Size([1, 64, 150])
Also note that you don't need to prespecify batch size.