Custom LearningRateScheduler in Keras

Custom LearningRateScheduler in Keras - python

I am implementing a decaying learning rate based on accuracy from the previous epoch.
Capturing Metrics:
class CustomMetrics(tf.keras.callbacks.Callback):
def on_train_begin(self, logs={}):
self.metrics={'loss': [],'accuracy': [],'val_loss': [],'val_accuracy': []}
self.lr=[]
def on_epoch_end(self, epoch, logs={}):
print(f"\nEPOCH {epoch} Callng from METRICS CLASS")
self.metrics['loss'].append(logs.get('loss'))
self.metrics['accuracy'].append(logs.get('accuracy'))
self.metrics['val_loss'].append(logs.get('val_loss'))
self.metrics['val_accuracy'].append(logs.get('val_accuracy'))
Custom Learning Decay:
from tensorflow.keras.callbacks import LearningRateScheduler
def changeLearningRate(epoch):
initial_learningrate=0.1
#print(f"EPOCH {epoch}, Calling from ChangeLearningRate:")
lr = 0.0
if epoch != 0:
if custom_metrics_dict.metrics['accuracy'][epoch] < custom_metrics_dict.metrics['accuracy'][epoch-1]:
print(f"Accuracy # epoch {epoch} is less than acuracy at epoch {epoch-1}")
print("[INFO] Decreasing Learning Rate.....")
lr = initial_learningrate*(0.1)
print(f"LR Changed to {lr}")
return lr
Model Preparation:
input_layer = Input(shape=(2))
layer1 = Dense(32,activation='tanh',kernel_initializer=tf.random_uniform_initializer(0,1,seed=30))(input_layer)
output = Dense(2,activation='softmax',kernel_initializer=tf.random_uniform_initializer(0,1,seed=30))(layer1)
model = Model(inputs=input_layer,outputs=output)
custom_metrics_dict=CustomMetrics()
lrschedule = LearningRateScheduler(changeLearningRate, verbose=1)
optimizer = tf.keras.optimizers.SGD(learning_rate=0.1,momentum=0.9)
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train,Y_train,epochs=4, validation_data=(X_test,Y_test), batch_size=16 ,callbacks=[custom_metrics_dict,lrschedule])
It's erroring out with index out of range error. From what I noticed, per epoch, LRScheduler code is being called more than once. I am unable to figure a way to make appropriate function calls. What can I try next?

The signature of the scheduler function is def scheduler(epoch, lr): which means you should take the lr from that parameter.
You shouldn't write the initial_learningrate = 0.1, if you do that your lr will not decay, you will always return the same when the accuracy decrease.
For the out of range exception you check than epoch is not 0, which means than at for epoch = 1 you're checking custom_metrics_dict.metrics['accuracy'][epoch] and custom_metrics_dict.metrics['accuracy'][epoch-1], but you stored only one accuracy value, epoch 0 has no accuracy value so this array custom_metrics_dict.metrics['accuracy'] has only one value in it
I've run your code correctly with this function
from tensorflow.keras.callbacks import LearningRateScheduler
def changeLearningRate(epoch, lr):
print(f"EPOCH {epoch}, Calling from ChangeLearningRate: {custom_metrics_dict.metrics['accuracy']}")
if epoch > 1:
if custom_metrics_dict.metrics['accuracy'][epoch - 1] > custom_metrics_dict.metrics['accuracy'][epoch-2]:
print(f"Accuracy # epoch {epoch} is less than acuracy at epoch {epoch-1}")
print("[INFO] Decreasing Learning Rate.....")
lr = lr*(0.1)
print(f"LR Changed to {lr}")
return lr

Related

Walk Forward Validation in Pytorch LSTM

Im currently building an LSTM Model for predicting stock prices in pytorch. I now want to implement a walk forward validation method, but I couldnt find any resource in how to do that.
This is my current training loop:
#%%
lstm1 = LSTM1(num_classes, input_size, hidden_dim, num_layers, X_train_tensors_final.shape[1])
criterion = torch.nn.L1Loss()
optimizer = torch.optim.Adam(lstm1.parameters(), lr=learning_rate)
for epoch in range(num_epochs):
outputs = lstm1.forward(X_train_tensors_final)
optimizer.zero_grad() #clear gradients
loss = criterion(outputs, y_train_tensors)
loss.backward() #calculates the loss of the loss function
optimizer.step() #improve from loss, i.e backprop
if epoch % 100 == 0:
print("Epoch: %d, loss: %1.5f" % (epoch, loss.item()))
df_X_ss = ss.transform(df.iloc[:, 0:-1])
df_y_mm = ss.transform(df.iloc[:, 0:1])
df_X_ss = Variable(torch.Tensor(df_X_ss))
df_y_mm = Variable(torch.Tensor(df_y_mm))
df_X_ss = torch.reshape(df_X_ss, (df_X_ss.shape[0], 1, df_X_ss.shape[1]))
train_predict = lstm1(df_X_ss)
data_predict = train_predict.data.numpy()
The model should now predict one step into the future, then calculate the absolute percentage error. For the next step, the model should use the actual y value instead of the predicted yhat to make its next prediction. What would be the best way of implementing this? Or is there some build in function in pytorch that would do this ?

Mistake in training loop or data preparation for BERT-based model leads to non-convergence of loss

I'm trying to train a multimodal transformer (BERT-based) for text classification on both text and tabular data, but my models are reducing neither training nor validation loss in a significant way. When training these multimodal models I've noticed that they can't get to the same performance of a classifier that's using the tabular data only. To investigate this, I wrote a simple version of BertForSequenceClassification and noticed that this model (BERT + Linear classifier only) doesn't converge on training or validation loss. From this I deduced that I must've made some mistake in my training loop (I'm not using the transformers Trainer API as the multimodal-transformers requires transformers v3.1.0 which has a memory leak in that versions' Trainer class). My dataset and classification model are as follows:
class myDataset(torch.utils.data.Dataset):
def __init__(self, df, num_classes):
self.num_classes = num_classes
self.y = np.array(df["rating_num"]) # ints, 9 labels
self.length = len(self.y)
self.values = df["economy"] # texts
def __len__(self):
return self.length
def __getitem__(self, idx):
txt = self.values[idx]
label = self.y[idx]
return txt, label
def collate_fn(data, tokenizer=tokenizer, device=device):
texts, labels = zip(*data)
toks = tokenizer(texts, padding="max_length", truncation=True, return_tensors="pt")
labels = torch.LongTensor(labels)
return toks.to(device), labels.to(device)
train_dl = DataLoader(train_ds, shuffle=True, batch_size=BATCH_SIZE, collate_fn=collate_fn)
class myClassificationBERT(nn.Module):
def __init__(self, bert, output_dim, dropout):
super(myClassificationBERT, self).__init__()
self.bert=bert
clf_input_dim = bert.config.to_dict()["hidden_size"]
self.classifier = nn.Linear(clf_input_dim, output_dim)
self.dropout = nn.Dropout(dropout)
def forward(self, x):
output = bert(**x)[1] # pooled_output
output = self.dropout(output)
output = self.classifier(output)
return output
As you can see, the model is fairly simple and is only meant to understand where the mistake in my pipeline is. I believe that the dataset and dataloader classes should be fine, since decoding the input_ids of the loaded data works fine. Model init and training loop follow:
# model init
bert = BertModel.from_pretrained("bert-base-uncased")
model = myClassificationBERT(bert=bert, output_dim=num_class, dropout=.2).to(device)
opt = torch.optim.AdamW(model.parameters(), lr=5e-4)
loss_fct = nn.CrossEntropyLoss()
##########
#training
##########
best_val_loss = np.inf
best_epoch_idx = None
path = "models/mybert/"
num_labels = num_class
train_losses = {}
val_losses = {}
EPOCHS = 25
for epoch in range(EPOCHS):
model.train()
temp_loss = 0
for batch, y in train_dl:
preds = model(batch)
loss = loss_fct(preds.view(-1, num_labels), y.view(-1))
temp_loss += loss
# update model
loss.backward()
opt.step()
opt.zero_grad()
model.zero_grad()
temp_loss = temp_loss / len(train_dl)
train_losses[epoch] = temp_loss
model.eval()
# validation
with torch.no_grad():
val_loss = 0
for batch, y in val_dl:
preds = model(batch)
val_loss += loss_fct(preds.view(-1, num_labels), y.view(-1))
val_loss = val_loss/len(val_dl)
val_losses[epoch]=val_loss
if val_loss < best_val_loss:
torch.save(model.state_dict(), path+f"ep{epoch}_loss{val_loss}.pt")
best_val_loss = val_loss
best_epoch_index = epoch
print(f">>>>>>>Epoch {epoch}\tloss: {temp_loss}\tval_loss: {val_loss}")
Exemplary output for first 6 epochs:
>>>>>>>Epoch 0 loss: 2.0026087760925293 val_loss: 1.9571226835250854
>>>>>>>Epoch 1 loss: 2.002939224243164 val_loss: 1.9474005699157715
>>>>>>>Epoch 2 loss: 1.9835222959518433 val_loss: 2.017913579940796
>>>>>>>Epoch 3 loss: 2.0059845447540283 val_loss: 1.9957032203674316
>>>>>>>Epoch 4 loss: 1.9796814918518066 val_loss: 1.9774249792099
>>>>>>>Epoch 5 loss: 1.9920653104782104 val_loss: 2.1032602787017822
When running this loop, neither the training loss nor the validation loss reduces, instead staying inside the initial loss values with some (seemingly random) deviation. I know that such a simple Bert classifier should be able to classify my texts well, as I've tested it on the same data using a ClassificationModel from the simpletransformers library, which in turn uses a BertForSequenceClassification model and reaches a validation loss of 0.32 in just 10 epochs. This phenomenon also happens with different optimizers and learning rates. Therefore I must be doing something wrong in either my training loop or even beforehand.
Version info:
transformers 3.1.0
torch 1.8.1+cu111
multimodal-transformers 0.1.2-alpha
numpy 1.19.5
pandas 1.4.1

how to reach the epoch number in which the early stopping criteria is met

I use callbacks to stop the training process if certain criteria are met. I was wondering how I can access the epoch number in which the training is stopped due to the callback.
import numpy as np
import random
import tensorflow as tf
from tensorflow import keras
class stopAtLossValue(tf.keras.callbacks.Callback):
def on_batch_end(self, batch, logs={}):
eps = 0.01
if logs.get('loss') <= eps:
self.model.stop_training = True
training_input= np.random.random ([30,10])
training_output = np.random.random ([30,1])
model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=(10,)),
tf.keras.layers.Dense(15,activation=tf.keras.activations.linear),
tf.keras.layers.Dense(15, activation='relu'),
tf.keras.layers.Dense(1)
])
model.compile(loss="mse",optimizer = tf.keras.optimizers.Adam(learning_rate=0.01))
hist = model.fit(training_input, training_output, epochs=100, batch_size=100, verbose=1, callbacks=[stopAtLossValue()])
For this example, my training is completed at the 66th epoch since the loss is under 0.01.
Epoch 66/100
1/1 [==============================] - 0s 5ms/step - loss: 0.0099
-----------------------------------------------------------------

The simple way would be to get the length of the history.history object:
len(model.history.history['loss'])
The more intricate way would be to get the number of iterations from the optimizer:
model.optimizer._iterations

If you want to get epoch number in callback, you should use on_epoch_end function instead of on_batch_end. See the below code for callback function
def on_epoch_end(self, epoch, logs={}):
eps = 0.01
print(epoch) # This will print the number of epoch
if logs.get('loss') <= eps:
self.model.stop_training = True

pytorch "log_softmax_lastdim_kernel_impl" not implemented for 'torch.LongTensor'

I am trying to use my own dataset to classify text according to https://github.com/bentrevett/pytorch-sentiment-analysis/blob/master/5%20-%20Multi-class%20Sentiment%20Analysis.ipynb. My dataset is a csv of sentences and a class associated with it. there are 6 different classes:
sent class
'the fox is brown' animal
'the house is big' object
'one water is drinkable' water
...
When running:
N_EPOCHS = 5
best_valid_loss = float('inf')
for epoch in range(N_EPOCHS):
start_time = time.time()
print(start_time)
train_loss, train_acc = train(model, train_iterator, optimizer, criterion)
print(train_loss.type())
print(train_acc.type())
valid_loss, valid_acc = evaluate(model, valid_iterator, criterion)
end_time = time.time()
epoch_mins, epoch_secs = epoch_time(start_time, end_time)
if valid_loss < best_valid_loss:
best_valid_loss = valid_loss
torch.save(model.state_dict(), 'tut5-model.pt')
print(f'Epoch: {epoch+1:02} | Epoch Time: {epoch_mins}m {epoch_secs}s')
print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')
print(f'\t Val. Loss: {valid_loss:.3f} | Val. Acc: {valid_acc*100:.2f}%')
, I receive the following error
RuntimeError: "log_softmax_lastdim_kernel_impl" not implemented for 'torch.LongTensor'
pointing to:
<ipython-input-38-9c6cff70d2aa> in train(model, iterator, optimizer, criterion)
14 print('pred'+ predictions.type())
15 #batch.label = batch.label.type(torch.LongTensor)
---> 16 loss = criterion(predictions.long(), batch.label)**
The solution posted here https://github.com/pytorch/pytorch/issues/14224 suggests I need to use long/int.
I had to add .long() at line ** in order to fix this earlier error:
RuntimeError: Expected object of scalar type Long but got scalar type Float for argument #2 'target'
The specific lines of code are:
def train(model, iterator, optimizer, criterion):
epoch_loss = 0
epoch_acc = 0
model.train()
for batch in iterator:
optimizer.zero_grad()
predictions = model(batch.text)
print('pred'+ predictions.type())
#batch.label = batch.label.type(torch.LongTensor)
loss = criterion(predictions.long(), batch.label)**
acc = categorical_accuracy(predictions, batch.label)
loss.backward()
optimizer.step()
epoch_loss += loss.item()
epoch_acc += acc.item()
return epoch_loss / len(iterator), epoch_acc / len(iterator)
Note, the ** was originally loss = criterion(predictions, batch.label)
Any other suggestions to fix this issue?

criterion is defined as torch.nn.CrossEntropyLoss() in your notebook. As mentioned in documentation of CrossEntropyLoss, it expects probability values returned by model for each of the 'K' classes and corresponding value for ground-truth label as input. Now, probability values are float tensors, while ground-truth label should be a long tensor representing a class (class can not be a float, e.g. 2.3 can not represent a class). hence:
loss = criterion(predictions, batch.label.long())
should work.

if using gpu, instead of defining the long type under loss criterion, you should probably define it before using cuda. I face the same error. resolved it with below:
# move data to GPU, if available
if train_on_gpu:
inp = inp.cuda()
target = target.long()
target=target.cuda()
h = tuple([each.data for each in hidden])
# perform backpropagation and optimization
#zero accumulated gradient
rnn.zero_grad()
#getting out_put from model
output,h = rnn(inp,h)
#calculating loss and performing back_propagation
loss = criterion(output.squeeze(), target)

Neural networks pytorch

I am very new in pytorch and implementing my own network of image classifier. However I see for each epoch training accuracy is very good but validation accuracy is 0.i noted till 5th epoch. I am using Adam optimizer and have learning rate .001. also resampling the whole data set after each epoch into training n validation set. Please help where I am going wrong.
Here is my code:
### where is data?
data_dir_train = '/home/sup/PycharmProjects/deep_learning/CNN_Data/training_set'
data_dir_test = '/home/sup/PycharmProjects/deep_learning/CNN_Data/test_set'
# Define your batch_size
batch_size = 64
allData = datasets.ImageFolder(root=data_dir_train,transform=transformArr)
# We need to further split our training dataset into training and validation sets.
def split_train_validation():
# Define the indices
num_train = len(allData)
indices = list(range(num_train)) # start with all the indices in training set
split = int(np.floor(0.2 * num_train)) # define the split size
#train_idx, valid_idx = indices[split:], indices[:split]
# Random, non-contiguous split
validation_idx = np.random.choice(indices, size=split, replace=False)
train_idx = list(set(indices) - set(validation_idx))
# define our samplers -- we use a SubsetRandomSampler because it will return
# a random subset of the split defined by the given indices without replacement
train_sampler = SubsetRandomSampler(train_idx)
validation_sampler = SubsetRandomSampler(validation_idx)
#train_loader = DataLoader(allData,batch_size=batch_size,sampler=train_sampler,shuffle=False,num_workers=4)
#validation_loader = DataLoader(dataset=allData,batch_size=1, sampler=validation_sampler)
return (train_sampler,validation_sampler)
Training
from torch.optim import Adam
import torch
import createNN
import torch.nn as nn
import loadData as ld
from torch.autograd import Variable
from torch.utils.data import DataLoader
# check if cuda - GPU support available
cuda = torch.cuda.is_available()
#create model, optimizer and loss function
model = createNN.ConvNet(class_num=2)
optimizer = Adam(model.parameters(),lr=.001,weight_decay=.0001)
loss_func = nn.CrossEntropyLoss()
if cuda:
model.cuda()
# function to save model
def save_model(epoch):
torch.save(model.load_state_dict(),'imageClassifier_{}.model'.format(epoch))
print('saved model at epoch',epoch)
def exp_lr_scheduler ( epoch , init_lr = args.lr, weight_decay = args.weight_decay, lr_decay_epoch = cf.lr_decay_epoch):
lr = init_lr * ( 0.5 ** (epoch // lr_decay_epoch))
def train(num_epochs):
best_acc = 0.0
for epoch in range(num_epochs):
print('\n\nEpoch {}'.format(epoch))
train_sampler, validation_sampler = ld.split_train_validation()
train_loader = DataLoader(ld.allData, batch_size=30, sampler=train_sampler, shuffle=False)
validation_loader = DataLoader(dataset=ld.allData, batch_size=1, sampler=validation_sampler)
model.train()
acc = 0.0
loss = 0.0
total = 0
# train model with training data
for i,(images,labels) in enumerate(train_loader):
# if cuda then move to GPU
if cuda:
images = images.cuda()
labels = labels.cuda()
# Variable class wraps a tensor and we can calculate grad
images = Variable(images)
labels = Variable(labels)
# reset accumulated gradients for each batch
optimizer.zero_grad()
# pass images to model which returns preiction
output = model(images)
#calculate the loss based on prediction and actual
loss = loss_func(output,labels)
# backpropagate the loss and compute gradient
loss.backward()
# update weights as per the computed gradients
optimizer.step()
# prediction class
predVal , predClass = torch.max(output.data, 1)
acc += torch.sum(predClass == labels.data)
loss += loss.cpu().data[0]
total += labels.size(0)
# print the statistics
train_acc = acc/total
train_loss = loss / total
print('Mean train acc = {} over epoch = {}'.format(epoch,acc))
print('Mean train loss = {} over epoch = {}'.format(epoch, loss))
# Valid model with validataion data
model.eval()
acc = 0.0
loss = 0.0
total = 0
for i,(images,labels) in enumerate(validation_loader):
# if cuda then move to GPU
if cuda:
images = images.cuda()
labels = labels.cuda()
# Variable class wraps a tensor and we can calculate grad
images = Variable(images)
labels = Variable(labels)
# reset accumulated gradients for each batch
optimizer.zero_grad()
# pass images to model which returns preiction
output = model(images)
#calculate the loss based on prediction and actual
loss = loss_func(output,labels)
# backpropagate the loss and compute gradient
loss.backward()
# update weights as per the computed gradients
optimizer.step()
# prediction class
predVal, predClass = torch.max(output.data, 1)
acc += torch.sum(predClass == labels.data)
loss += loss.cpu().data[0]
total += labels.size(0)
# print the statistics
valid_acc = acc / total
valid_loss = loss / total
print('Mean train acc = {} over epoch = {}'.format(epoch, valid_acc))
print('Mean train loss = {} over epoch = {}'.format(epoch, valid_loss))
if(best_acc<valid_acc):
best_acc = valid_acc
save_model(epoch)
# at 30th epoch we save the model
if (epoch == 30):
save_model(epoch)
train(20)

I think you did not take into account that acc += torch.sum(predClass == labels.data) returns a tensor instead of a float value. Depending on the version of pytorch you are using I think you should change it to:
acc += torch.sum(predClass == labels.data).cpu().data[0] #pytorch 0.3
acc += torch.sum(predClass == labels.data).item() #pytorch 0.4
Although your code seems to be working for old pytorch version, I would recommend you to upgrade to the 0.4 version.
Also, I mentioned other problems/typos in your code.
You are loading the dataset for every epoch.
for epoch in range(num_epochs):
print('\n\nEpoch {}'.format(epoch))
train_sampler, validation_sampler = ld.split_train_validation()
train_loader = DataLoader(ld.allData, batch_size=30, sampler=train_sampler, shuffle=False)
validation_loader = DataLoader(dataset=ld.allData, batch_size=1, sampler=validation_sampler)
...
That should not happen, it should be enough loading it once
train_sampler, validation_sampler = ld.split_train_validation()
train_loader = DataLoader(ld.allData, batch_size=30, sampler=train_sampler, shuffle=False)
validation_loader = DataLoader(dataset=ld.allData, batch_size=1, sampler=validation_sampler)
for epoch in range(num_epochs):
print('\n\nEpoch {}'.format(epoch))
...
In the training part you have (this does not happen in the validation):
train_acc = acc/total
train_loss = loss / total
print('Mean train acc = {} over epoch = {}'.format(epoch,acc))
print('Mean train loss = {} over epoch = {}'.format(epoch, loss))
Where you are printing acc instead of train_acc
Also, in the validation part I mentioned that you are printing print('Mean train acc = {} over epoch = {}'.format(epoch, valid_acc)) when it should be something like 'Mean val acc'.
Changing this lines of code, using a standard model I created and CIFAR dataset the training seems to converge, accuracy increases at every epoch while mean loss value decreases.
I Hope I could help you!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Custom LearningRateScheduler in Keras - python

Related

Walk Forward Validation in Pytorch LSTM

Mistake in training loop or data preparation for BERT-based model leads to non-convergence of loss

how to reach the epoch number in which the early stopping criteria is met

pytorch "log_softmax_lastdim_kernel_impl" not implemented for 'torch.LongTensor'

Neural networks pytorch

Categories

Resources