I am trying to use RNN to do a binary classification. But when my model is training, it gets stuck at loss.backward().
Here is my model:
class RNN2(nn.Module):
def __init__(self, input_size, hidden_size, output_size=2, num_layers=1):
super(RNN2, self).__init__()
self.rnn = nn.RNN(input_size, hidden_size, num_layers)
self.reg = nn.Linear(hidden_size, output_size)
#self.softmax = nn.LogSoftmax(dim=1)
def forward(self,x):
x, hidden = self.rnn(x)
return self.reg(x[:,2])
rnn = RNN2(13,10)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(rnn.parameters(), lr=learning_rate)
for e in range(10):
out = rnn(train_X)
optimizer.zero_grad()
print(out[0])
print(out.shape)
print(train_Y.shape)
loss = criterion(out, train_Y)
print(loss)
loss.backward()
print("1")
optimizer.step()
print("2")
The shape of train_X is 420000*3*13 and the shape of train_Y is 420000
So it can print loss. Can anyone tell me why it gets stuck at loss.backward(). It can't print 1.
You have to know that in RRNs, computing the backward function for a sequence of length 420000 is extremely slow. If you run your code on a machine with a GPU (or google colab) and add the following lines before the for loop, your code finishes executing in less than two minutes.
rnn = rnn.cuda()
train_X = train_X.cuda()
train_Y = train_Y.cuda()
Note that by default, the second input dimension passed to RNN will be treated as the batch size. Therefore, if the 420000 is the number of batches, pass batch_first=True to the RNN constructor.
self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True)
This would significantly speed up the process (less than one second in google colab). However, if that is not the case, you should try chunking the sequences into smaller parts and increasing the batch size from 3 to a larger value.
Related
I’m trying to solve an nlp classification problem with a LSTM. The code for the model is defined here:
class LSTM(nn.Module):
def __init__(self, hidden_size, embedding_size=66 ):
super().__init__()
self.lstm = nn.LSTM(embedding_size, hidden_size, batch_first = True, bidirectional = True)
self.fc = nn.Linear(2*hidden_size,2)
def forward(self, input_seq):
output, (hidden_state, cell_state) = self.lstm(input_seq)
hidden_state = torch.cat((hidden_state[-1,:], hidden_state[-2,:]), -1)
logits = self.fc(hidden_state)
return nn.LogSoftmax(dim=1)(logits)
And the function I’m using to train this model is here:
def train_loop(dataloader, model, loss_fn, optimizer):
loss_fn = loss_fn
size = len(dataloader.dataset)
model.train()
zeros = 0
for batch, (X, y) in enumerate(dataloader):
# Transform string into tensor
tensor = torch.zeros(1,len(X[0]),66)
for i in range(len(X[0])):
tensor[0][i][ctoi[X[0][i]]] = 1
pred = model(tensor)
target = torch.zeros(2, dtype=torch.long)
target[y] = 1
if batch % 100 == 0:
print(pred.squeeze(), target)
loss = loss_fn(pred.squeeze(), target)
# Backpropagation
optimizer.zero_grad()
loss.backward()
optimizer.step()
if pred.squeeze().argmax() == 0:
zeros += 1
if batch % 100 == 0:
loss, current = loss.item(), batch * len(X)
print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]")
print(f'In trainning predicted {zeros} zeroes out of {size} samples')
The X’s are still strings, that’s why I need to convert them to tensors before running it through the model. The y’s are either a 0 or 1 (since its a binary classification problem), that I need to convert to a tensor of shape (2,) to run through the loss function.
For some reason I keep getting the same class predicted for every input. The classes are not even that unbalanced (~45% to 55%), and I’ve tried changing the weights of the classes in the loss function with no improvements, it either converges to predicting always a 0 or always a 1. Most of the time it it converges to predicting always a 0, which makes even less sense because what happens usually is that the class 0 has less samples than class 1.
Since you're training a binary classification model, your output dim should be 1 (corresponding to a single probability P(y|x)). This means that the y you're retrieving from your dataloader should be the y used in your loss function (assuming a cross-entropy loss). The predicted class is therefore y_hat = round(pred) (i.e., is the prediction >= 0.5).
As a point of clarity, it would be much easier to follow your logic if the one-hot encoding happened within your dataset (either in __getitem__ or __iter__). It's also worth noting that you don't use embeddings, so the code of your classifier is a bit misleading.
Hello I'm trying to understand the following thing:
I have created the following neural network model using PyTorch to run a regression task.
class Model(nn.Module):
def __init__(self, in_features, h1, h2, out_features=0):
super(Model, self).__init__()
self.fc1 = nn.Linear(in_features,h1) # input layer
self.fc2 = nn.Linear(h1, h2) # hidden layer
self.out = nn.Linear(h2, out_features) # output layer
def forward(self, x):
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.out(x)
return x
model = Model(in_features=59, h1=64, h2=32, out_features=1)
Then we get to the training where I run the following code:
epochs = 300
losses = []
for i in range(epochs):
y_pred = model(X_train)
loss = criterion(y_pred, y_train)
losses.append(loss.detach().numpy())
optimizer.zero_grad()
loss.backward()
optimizer.step()
Everything works fine but through the model's forward() method my y_pred gets the shape [1359, 1] (I guess it should be [1359] cause my y_train matches this shape and I get the following warning:
C:\Users\hp\anaconda3\lib\site-packages\torch\nn\modules\loss.py:528: UserWarning: Using a target size (torch.Size([1359])) that is different to the input size (torch.Size([1359, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
return F.mse_loss(input, target, reduction=self.reduction)
This also happens when I try to evaluate my model
with torch.no_grad():
y_val = model(X_test)
loss = criterion(y_val.flatten(), y_test)
print(loss)
Indeed you have a shape mismatch, your model will output a tensor of shape (batch_size, 1) while your target is shaped (batch_size,). You have to explicitly broadcast your tensor such that the inputs of your criterion have shape.
Either by reshaping the prediction y_val itself:
>>> loss = criterion(y_val[:,0], y_test)
Or the target tensor y_test:
>>> loss = criterion(y_val, y_test[:,None])
This is my first time posting in stack overflow so forgive me if I do any sort of mistake.
I have 10000 data, and each data has a label of 0 and 1. I want to perform classification using LSTM as this is time series data.
input_dim = 1
hidden_dim = 32
num_layers = 2
output_dim = 1
# Here we define our model as a class
class LSTM(nn.Module):
def __init__(self, input_dim, hidden_dim, num_layers, output_dim):
super(LSTM, self).__init__()
self.hidden_dim = hidden_dim
self.num_layers = num_layers
self.lstm = nn.LSTM(input_dim, hidden_dim, num_layers, batch_first=True)
self.fc = nn.Linear(hidden_dim, output_dim)
def forward(self, x):
#Initialize hidden layer and cell state
h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_dim).requires_grad_()
c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_dim).requires_grad_()
# We need to detach as we are doing truncated backpropagation through time (BPTT)
# If we don't, we'll backprop all the way to the start even after going through another batch
out, (hn, cn) = self.lstm(x, (h0.detach(), c0.detach()))
# Index hidden state of last time step
# out.size() --> 100, 32, 100
# out[:, -1, :] --> 100, 100 --> just want last time step hidden states!
out = self.fc(out[:, -1, :])
# For binomial Classification
m = torch.sigmoid(out)
return m
model = LSTM(input_dim=input_dim, hidden_dim=hidden_dim, output_dim=output_dim, num_layers=num_layers)
loss = nn.BCELoss()
optimiser = torch.optim.Adam(model.parameters(), lr=0.00001, weight_decay=0.00006)
num_epochs = 100
# Number of steps to unroll
seq_dim =look_back-1
for t in range(num_epochs):
y_train_class = model(x_train)
output = loss(y_train_class, y_train)
# Zero out gradient, else they will accumulate between epochs
optimiser.zero_grad(set_to_none=True)
# Backward pass
output.backward()
# Update parameters
optimiser.step()
This is an example of what the result looks like
This code is initially from kaggle, I edited them for classification. Please, can you tell me what I am doing wrong?
EDIT 1:
Add dataloader
from torch.utils.data import DataLoader
from torch.utils.data import TensorDataset
x_train = torch.from_numpy(x_train).type(torch.Tensor)
y_train = torch.from_numpy(y_train).type(torch.Tensor)
x_test = torch.from_numpy(x_test).type(torch.Tensor)
y_test = torch.from_numpy(y_test).type(torch.Tensor)
train_dataloader = DataLoader(TensorDataset(x_train, y_train), batch_size=128, shuffle=True)
test_dataloader = DataLoader(TensorDataset(x_test, y_test), batch_size=128, shuffle=True)
I realized I had forgotten to inverse the transformation before checking the result. When I did that, I got different values from classification, however all values are in the scale of 0.001-0.009, so when I round them, the result is same. Label 0 for all classification.
A common phenomenon in NN training is that they will initially converge to a very naive solution to the problem where they output a constant prediction that minimizes the error on the training data. My guess is that in your training data, the ratio between 0 and 1 classes is close to 0.5423. Depending on whether your model is of sufficient complexity, it might learn to make more specific predictions based on the input when given more learning steps.
While increasing the number of epochs could help, there is something better you can do with your current setup. Currently, you are only performing a single optimizer step per epoch. Typically, you would want a step per batch and loop over your data in (mini)batches of, say, 32 inputs for example. To do this, it would be best to use a DataLoader where you can define a batch size, and loop over the dataloader inside your epoch loop similar to this example.
I am training a seq2seq model for machine translation in pytorch. I would like to gather the cell state at every time step, while still having the flexibility of multiple layers and bidirectionality, that you can find in the LSTM module of pytorch, for example.
To this end, I have the following encoder and forward method, where I loop over the LSTM module. The problem is, that the model does not train very well. Right after the loop terminates, you can see the normal way to use the LSTM module and with that, the model trains.
So, is the loop not a valid way to do this?
class encoder(nn.Module):
def __init__(self, input_dim, emb_dim, hid_dim, n_layers, dropout):
super().__init__()
self.input_dim = input_dim
self.emb_dim = emb_dim
self.hid_dim = hid_dim
self.n_layers = n_layers
self.dropout = dropout
self.embedding = nn.Embedding(input_dim, emb_dim)
self.rnn = nn.LSTM(emb_dim, hid_dim, n_layers, dropout = dropout)
self.dropout = nn.Dropout(dropout)
def forward(self, src):
#src = [src sent len, batch size]
embedded = self.dropout(self.embedding(src))
#embedded = [src sent len, batch size, emb dim]
hidden_all = []
for i in range(len(embedded[:,1,1])):
outputs, hidden = self.rnn(embedded[i,:,:].unsqueeze(0))
hidden_all.append(hidden)
#outputs, hidden = self.rnn(embedded)
#outputs = [src sent len, batch size, hid dim * n directions]
#hidden = [n layers * n directions, batch size, hid dim]
#cell = [n layers * n directions, batch size, hid dim]
None
#outputs are always from the top hidden layer
return hidden
Okay, so the fix is very simple, you can just run the first timestep outside, to get a hidden tuple to input in the LSTM module.
I am new to PyTorch and LSTMs and I am trying to train a classification model that takes a sentences where each word is encoded via word2vec (pre-trained vectors) and outputs one class after it saw the full sentence. I have four different classes. The sentences have variable length.
My code is running without errors, but it always predicts the same class, no matter how many epochs I train my model. So I think the gradients are not properly backpropagated. Here is my code:
class LSTM(nn.Module):
def __init__(self, embedding_dim, hidden_dim, tagset_size):
super(LSTM, self).__init__()
self.hidden_dim = hidden_dim
self.lstm = nn.LSTM(embedding_dim, hidden_dim)
self.hidden2tag = nn.Linear(hidden_dim, tagset_size)
self.hidden = self.init_hidden()
def init_hidden(self):
# The axes semantics are (num_layers, minibatch_size, hidden_dim)
return (torch.zeros(1, 1, self.hidden_dim).to(device),
torch.zeros(1, 1, self.hidden_dim).to(device))
def forward(self, sentence):
lstm_out, self.hidden = self.lstm(sentence.view(len(sentence), 1, -1), self.hidden)
tag_space = self.hidden2tag(lstm_out.view(len(sentence), -1))
tag_scores = F.log_softmax(tag_space, dim=1)
return tag_scores
EMBEDDING_DIM = len(training_data[0][0][0])
HIDDEN_DIM = 256
model = LSTM(EMBEDDING_DIM, HIDDEN_DIM, 4)
model.to(device)
loss_function = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)
for epoch in tqdm(range(n_epochs)):
for sentence, tag in tqdm(training_data):
model.zero_grad()
model.hidden = model.init_hidden()
sentence_in = torch.tensor(sentence, dtype=torch.float).to(device)
targets = torch.tensor([label_to_idx[tag]], dtype=torch.long).to(device)
tag_scores = model(sentence_in)
res = torch.tensor(tag_scores[-1], dtype=torch.float).view(1,-1).to(device)
# I THINK THIS IS WRONG???
print(res) # tensor([[-10.6328, -10.6783, -10.6667, -0.0001]], device='cuda:0', grad_fn=<CopyBackwards>)
print(targets) # tensor([3], device='cuda:0')
loss = loss_function(res, targets)
loss.backward()
optimizer.step()
The code is largely inspired by https://pytorch.org/tutorials/beginner/nlp/sequence_models_tutorial.html
The difference is that they have a sequence-to-sequence model and I have a sequence-to-ONE model.
I am not sure what the problem is, but I guess that the scores returned by the model contain a score for each tag and my ground truth only contains the index of the correct class? How would this be handled correctly?
Or is the loss function maybe not the correct one for my use case? Also I am not sure if this is done correctly:
res = torch.tensor(tag_scores[-1], dtype=torch.float).view(1,-1).to(device)
By taking tag_scores[-1] I want to get the scores after the last word has been given to the network because tag_scores contains the scores after each step, if I understand correctly.
And this is how I evaluate:
with torch.no_grad():
preds = []
gts = []
for sentence, tag in tqdm(test_data):
inputs = torch.tensor(sentence, dtype=torch.float).to(device)
tag_scores = model(inputs)
# find index with max value (this is the class to be predicted)
pred = [j for j,v in enumerate(tag_scores[-1]) if v == max(tag_scores[-1])][0]
print(pred, idx_to_label[pred], tag)
preds.append(pred)
gts.append(label_to_idx[tag])
print(f1_score(gts, preds, average='micro'))
print(classification_report(gts, preds))
EDIT:
When shuffling the data before training it seems to work. But why?
EDIT 2:
I think the reason why shuffling is needed is that my training data contains samples for each class in groups. So when training them each after the other, the model will only see the same class in the last N iterations and therefore it will only predict this class. Another reason might also be that I am currently using mini-batches of only one sample because I haven't figured out yet how to use other sizes.
Because you are trying to use a whole sentence to classify, so the following line:
self.hidden2tag(lstm_out.view(len(sentence), -1))
should be changed to, so it takes the final features to the classifier.
self.hidden2tag(lstm_out.view(sentence[-1], -1))
But I am also not so sure since I am not familiar with LSTM.