Pytorch inconsistent size with pad_packed_sequence, seq2seq - python

I'm having some inconsistencies with the output of a encoder I got from this github .
The encoder looks as follows:
class Encoder(nn.Module):
r"""Applies a multi-layer LSTM to an variable length input sequence.
"""
def __init__(self, input_size, hidden_size, num_layers,
dropout=0.0, bidirectional=True, rnn_type='lstm'):
super(Encoder, self).__init__()
self.input_size = 40
self.hidden_size = 512
self.num_layers = 8
self.bidirectional = True
self.rnn_type = 'lstm'
self.dropout = 0.0
if self.rnn_type == 'lstm':
self.rnn = nn.LSTM(input_size, hidden_size, num_layers,
batch_first=True,
dropout=dropout,
bidirectional=bidirectional)
def forward(self, padded_input, input_lengths):
"""
Args:
padded_input: N x T x D
input_lengths: N
Returns: output, hidden
- **output**: N x T x H
- **hidden**: (num_layers * num_directions) x N x H
"""
total_length = padded_input.size(1) # get the max sequence length
packed_input = pack_padded_sequence(padded_input, input_lengths,
batch_first=True,enforce_sorted=False)
packed_output, hidden = self.rnn(packed_input)
pdb.set_trace()
output, _ = pad_packed_sequence(packed_output, batch_first=True, total_length=total_length)
return output, hidden
So it only consists of a rnn lstm cell, if I print the encoder this is the output:
LSTM(40, 512, num_layers=8, batch_first=True, bidirectional=True)
So it should have a 512 sized output right? But when I feed a tensor with size torch.Size([16, 1025, 40]) 16 samples of 1025 vectors with size 40 (that gets packed to fit the RNN) the output that I get from the RNN has a new encoded size of 1024 torch.Size([16, 1025, 1024]) when it should have been encoded to 512 right?
Is there something Im missing?

Setting bidirectional=True makes the LSTM bidirectional, which means there will be two LSTMs, one that goes from left to right and the other that goes from right to left.
From the nn.LSTM documentation - Outputs:
output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features (h_t) from the last layer of the LSTM, for each t. If a torch.nn.utils.rnn.PackedSequence has been given as the input, the output will also be a packed sequence.
For the unpacked case, the directions can be separated using output.view(seq_len, batch, num_directions, hidden_size), with forward and backward being direction 0 and 1 respectively. Similarly, the directions can be separated in the packed case.
Your output has the size [batch, seq_len, 2 * hidden_size] (batch and seq_len are swapped in your case due to setting batch_first=True) because of using a bidirectional LSTM. The outputs of the two are concatenated in order to have the information of both, which you could easily separate if you wanted to treat them differently.

Related

Input and output shape to GRU layer in PyTorch

I am getting confused about the input shape to GRU layer.
I have a batch of 128 images and I extracted 9 features from each images.
So now my shape is (1,128,9).
This is the GRU layer
gru=torch.nn.GRU(input_size=128,hidden_size=8,batch_first=True)
Question 1: Is the input_size=128 correctly defined?
Here is the code of forward function
def forward(features):
features=features.permute(0,2,1)#[1, 9, 128]
x2,_=self.gru(features)
Question 2: Is the `code in forward function is correctly defined?
Thanks
No, input_size is not correctly defined. Here, input_size means the number of features in a single input vector of the sequence. The input to the GRU is a sequence of vectors, each input being a 1-D tensor of length input_size. In case of batched input, the input to GRU is a batch of sequence of vectors, so the shape should be (batch_size, sequence_length, input_size) when batch_first=True otherwise the expected shape is (sequence_length, batch_size, input_size) when batch_first=False
import torch
batch_size = 128
input_size = 9 # features in the input
seq_len = 5 # seqence length - how many input vectors in one sequence
hidden_size = 20 # the no of fetures in the output of GRU
gru=torch.nn.GRU(input_size=input_size,hidden_size=hidden_size,batch_first=True)
X = torch.rand( (batch_size, seq_len, input_size), dtype = torch.float32 )
print(f'{X.shape=}')
Y,_ = gru(X)
print(f'{Y.shape=}')
output
"""
X.shape=torch.Size([128, 5, 9])
Y.shape=torch.Size([128, 5, 20])
"""
Using batch_first=False
gru=torch.nn.GRU(input_size=input_size,hidden_size=hidden_size,batch_first=False)
X = torch.rand( (seq_len, batch_size, input_size), dtype = torch.float32 )
print(f'{X.shape=}')
Y,_ = gru(X)
print(f'{Y.shape=}')
output
"""
X.shape=torch.Size([5, 128, 9])
Y.shape=torch.Size([5, 128, 20])
"""

PyTorch: Sizes of tensors must match on 2 input neural network

I am attempting to recreate a 2 input neural network from this article: https://towardsdatascience.com/moving-from-keras-to-pytorch-f0d4fff4ce79
I have copied the network described in the post and adjusted it so that it fits my data. The first input is from GloVe Word embeddings while the other is numerical features about the text data.
class Net(nn.Module):
def __init__(self,hidden_size,lin_size, embedding_matrix=embedding_weights):
super(Alex_NeuralNet_Meta, self).__init__()
# Initialize some parameters for your model
self.hidden_size = hidden_size
drp = 0.1
# Layer 1: Embeddings.
self.embedding = nn.Embedding(size_of_vocabulary, pretrained_embedding_dim)
self.embedding.weight = nn.Parameter(torch.tensor(embedding_matrix, dtype=torch.float32))
self.embedding.weight.requires_grad = False
# Layer 2: Dropout1D(0.1)
self.embedding_dropout = nn.Dropout2d(0.1)
# Layer 3: Bidirectional CuDNNLSTM
self.lstm = nn.LSTM(pretrained_embedding_dim, hidden_size, bidirectional=True, batch_first=True)
# Layer 4: Bidirectional CuDNNGRU
self.gru = nn.GRU(hidden_size*2, hidden_size, bidirectional=True, batch_first=True)
# Layer 7: A dense layer
self.linear = nn.Linear(hidden_size*6 + X2_train.shape[1], lin_size)
self.relu = nn.ReLU()
# Layer 8: A dropout layer
self.dropout = nn.Dropout(drp)
# Layer 9: Output dense layer with one output for our Binary Classification problem.
self.out = nn.Linear(lin_size, 1)
def forward(self, x):
'''
here x[0] represents the first element of the input that is going to be passed.
We are going to pass a tuple where first one contains the sequences(x[0])
and the second one is a additional feature vector(x[1])
'''
h_embedding = self.embedding(x[0].long())
h_embedding = torch.squeeze(self.embedding_dropout(torch.unsqueeze(h_embedding, 0)))
#print("emb", h_embedding.size())
h_lstm, _ = self.lstm(h_embedding)
# print("lst",h_lstm.size())
h_gru, hh_gru = self.gru(h_lstm)
hh_gru = hh_gru.view(-1, 2*self.hidden_size )
print("gru", h_gru.size())
print("h_gru", hh_gru.size())
# Layer 5: is defined dynamically as an operation on tensors.
avg_pool = torch.mean(h_gru, 1)
max_pool, _ = torch.max(h_gru, 1)
print("avg_pool", avg_pool.size())
print("max_pool", max_pool.size())
# the extra features you want to give to the model
f = torch.tensor(x[1], dtype=torch.float).cuda()
print("f", f.size())
# Layer 6: A concatenation of the last state, maximum pool, average pool and
# additional features
conc = torch.cat(( hh_gru, avg_pool, max_pool, f), 1)
#print("conc", conc.size())
# passing conc through linear and relu ops
conc = self.relu(self.linear(conc))
conc = self.dropout(conc)
out = self.out(conc)
# return the final output
return out
And during runtime I get an error on the concatenation line:
RuntimeError: Sizes of tensors must match except in dimension 0. Got 33164 and 20 (The offending index is 0)
From the dimensions of the outputs, I can see where the problem lies but I am not sure how I can fix it
The data inputs to the network is:
torch.Size([20, 150])
torch.Size([33164, 40])
The sizes of each layer output is:
gru torch.Size([20, 150, 80])
h_gru torch.Size([20, 80])
avg_pool torch.Size([20, 80])
max_pool torch.Size([20, 80])
f torch.Size([33164, 40])
For the example above the batch size is 20, hidden_size is 40, the number of rows in numerical data features is 33164 and its feature size is 40.
Thanks for any help in advance

Which output should I use for prediction with LSTM for sequenced data?

I'm still new to machine learning and deep learning. I am currently trying to predict time series data with LSTM in PyTorch. The problem I am having is that I don't understand which output should I use for my final prediction.
My code is given below:
class Model(nn.Module):
def __init__(self, input_size, hidden_size, output_size, seq_len, dropout):
super(Model, self).__init__()
self.input_size = input_size
self.hidden_size = hidden_size
self.output_size = output_size
self.dropout = dropout
self.seq_len = seq_len
self.lstm = nn.LSTM(
input_size = self.input_size,
hidden_size = self.hidden_size,
dropout = self.dropout
)
self.linear = nn.Linear(self.hidden_size, self.output_size)
def reset_hidden_state(self):
self.hidden = (
torch.zeros(1, self.seq_len, self.hidden_size),
torch.zeros(1, self.seq_len, self.hidden_size)
)
def forward(self, sequences):
lstm_out, self.hidden = self.lstm(sequences, self.hidden)
y_pred = self.linear(lstm_out[-1, :, :])
return y_pred
mymodel = Model(5, 10, 1, 3, 0.0)
inps = torch.randn(10, 3, 5) #input
#print(inps)
mymodel.reset_hidden_state()
out = mymodel.forward(inps)
print(out.shape)
print(out)
output:
torch.Size([3, 1])
tensor([[-0.0996],
[-0.0587],
[-0.0421]], grad_fn=)
As you can see, this gives me three outputs, but my output size is 1 as I am trying to predict only 1 variable. So, in this case which variable should I use for my final prediction? Or, is it is even possible to predict only 1 value for sequential data like this?
NB: My python version is 3.7.4
and my PyTorch version is 1.4.0
And, sorry if I have made any mistake while asking the question. This is my first time asking question here.
You are already using the correct output of the LSTM, which is the last hidden state. Conveniently that is also the last element in lstm_out, which you are using as lstm_out[-1, :, :].
The input of the model inps are multiple sequences, because their size is [seq_len, batch_size, num_featuers] = [10, 3, 5]. That means you have 3 independent sequences, which have 10 time steps each with 5 features per time step.
Therefore, the out (size: [3, 1]) contains the predictions for each of the 3 sequences. out[0][0] is the prediction of the first sequence, out[1][0] of the second, and out[2][0] of the third. You can also get rid of the singular second sequence with out.unsqueeze(1), so you have a 1D tensor with the 3 predictions.
If you want to predict only a single sequence, you would use a batch size of 1, which means the input would have size [10, 1, 5] instead, then you get a single value back, even though it's in a tensor of size [1, 1].

Looping over pytorch LSTM

I am training a seq2seq model for machine translation in pytorch. I would like to gather the cell state at every time step, while still having the flexibility of multiple layers and bidirectionality, that you can find in the LSTM module of pytorch, for example.
To this end, I have the following encoder and forward method, where I loop over the LSTM module. The problem is, that the model does not train very well. Right after the loop terminates, you can see the normal way to use the LSTM module and with that, the model trains.
So, is the loop not a valid way to do this?
class encoder(nn.Module):
def __init__(self, input_dim, emb_dim, hid_dim, n_layers, dropout):
super().__init__()
self.input_dim = input_dim
self.emb_dim = emb_dim
self.hid_dim = hid_dim
self.n_layers = n_layers
self.dropout = dropout
self.embedding = nn.Embedding(input_dim, emb_dim)
self.rnn = nn.LSTM(emb_dim, hid_dim, n_layers, dropout = dropout)
self.dropout = nn.Dropout(dropout)
def forward(self, src):
#src = [src sent len, batch size]
embedded = self.dropout(self.embedding(src))
#embedded = [src sent len, batch size, emb dim]
hidden_all = []
for i in range(len(embedded[:,1,1])):
outputs, hidden = self.rnn(embedded[i,:,:].unsqueeze(0))
hidden_all.append(hidden)
#outputs, hidden = self.rnn(embedded)
#outputs = [src sent len, batch size, hid dim * n directions]
#hidden = [n layers * n directions, batch size, hid dim]
#cell = [n layers * n directions, batch size, hid dim]
None
#outputs are always from the top hidden layer
return hidden
Okay, so the fix is very simple, you can just run the first timestep outside, to get a hidden tuple to input in the LSTM module.

Input dimension error on pytorch's forward check

I am creating an RNN with pytorch, it looks like this:
class MyRNN(nn.Module):
def __init__(self, batch_size, n_inputs, n_neurons, n_outputs):
super(MyRNN, self).__init__()
self.n_neurons = n_neurons
self.batch_size = batch_size
self.n_inputs = n_inputs
self.n_outputs = n_outputs
self.basic_rnn = nn.RNN(self.n_inputs, self.n_neurons)
self.FC = nn.Linear(self.n_neurons, self.n_outputs)
def init_hidden(self, ):
# (num_layers, batch_size, n_neurons)
return torch.zeros(1, self.batch_size, self.n_neurons)
def forward(self, X):
self.batch_size = X.size(0)
self.hidden = self.init_hidden()
lstm_out, self.hidden = self.basic_rnn(X, self.hidden)
out = self.FC(self.hidden)
return out.view(-1, self.n_outputs)
My input x looks like this:
tensor([[-1.0173e-04, -1.5003e-04, -1.0218e-04, -7.4541e-05, -2.2869e-05,
-7.7171e-02, -4.4630e-03, -5.0750e-05, -1.7911e-04, -2.8082e-04,
-9.2992e-06, -1.5608e-05, -3.5471e-05, -4.9127e-05, -3.2883e-01],
[-1.1193e-04, -1.6928e-04, -1.0218e-04, -7.4541e-05, -2.2869e-05,
-7.7171e-02, -4.4630e-03, -5.0750e-05, -1.7911e-04, -2.8082e-04,
-9.2992e-06, -1.5608e-05, -3.5471e-05, -4.9127e-05, -3.2883e-01],
...
[-6.9490e-05, -8.9197e-05, -1.0218e-04, -7.4541e-05, -2.2869e-05,
-7.7171e-02, -4.4630e-03, -5.0750e-05, -1.7911e-04, -2.8082e-04,
-9.2992e-06, -1.5608e-05, -3.5471e-05, -4.9127e-05, -3.2883e-01]],
dtype=torch.float64)
and is a batch of 64 vectors with size 15.
When trying to test this model by doing:
BATCH_SIZE = 64
N_INPUTS = 15
N_NEURONS = 150
N_OUTPUTS = 1
model = MyRNN(BATCH_SIZE, N_INPUTS, N_NEURONS, N_OUTPUTS)
model(x)
I get the following error:
File "/home/tt/anaconda3/envs/venv/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 126, in check_forward_args
expected_input_dim, input.dim()))
RuntimeError: input must have 3 dimensions, got 2
How can I fix it?
You are missing one of the required dimensions for the RNN layer.
Per the documentation, your input size needs to be of shape (sequence length, batch, input size).
So - with the example above, you are missing one of these. Based on your variable names, it appears you are trying to pass 64 examples of 15 inputs each... if that’s true, you are missing sequence length.
With an RNN, the sequence length is the number of times you want the layer to recur. For example, in NLP your sequence length might be equal to the number of words in a sentence, while batch size would be the number of sentences you are passing, and input size would be the vector size of each word.
You might not need an RNN here if you are just trying to do use 64 samples of size 15.
See the documentation, the RNN layer expects
input of shape (seq_len, batch, input_size): tensor containing the features of the input sequence.
In your case it seems that your "size" is the length of the sequence, and you have one feature at every timestep. Edited for 15 features, one timestep
# 15 features, 150 neurons
rnn = nn.RNN(15, 150)
# sequence of length 1, batch size 64, 15 features
x = torch.rand(1, 64, 15)
res, _ = rnn(x)
print(res.shape)
# => torch.Size([1, 64, 150])
Also note that you don't need to prespecify batch size.

Categories