I am new to PyTorch and want to efficiently evaluate among others F1 during my Training and my Validation Loop.
So far, my approach was to calculate the predictions on GPU, then push them to CPU and append them to a vector for both Training and Validation. After Training and Validation, I would evaluate both for each epoch using sklearn. However, profiling my code it showed, that pushing to cpu is quite a bottleneck.
for epoch in range(n_epochs):
model.train()
avg_loss = 0
avg_val_loss = 0
train_pred = np.array([])
val_pred = np.array([])
# Training loop (transpose X_batch to fit pretrained (features, samples) style)
for X_batch, y_batch in train_loader:
scores = model(X_batch)
y_pred = F.softmax(scores, dim=1)
train_pred = np.append(train_pred, self.get_vector(y_pred.detach().cpu().numpy()))
loss = loss_fn(scores, self.get_vector(y_batch))
optimizer.zero_grad()
loss.backward()
optimizer.step()
avg_loss += loss.item() / len(train_loader)
model.eval()
# Validation loop
for X_batch, y_batch in val_loader:
with torch.no_grad():
scores = model(X_batch)
y_pred = F.softmax(scores, dim=1)
val_pred = np.append(val_pred, self.get_vector(y_pred.detach().cpu().numpy()))
loss = loss_fn(scores, self.get_vector(y_batch))
avg_val_loss += loss.item() / len(val_loader)
# Model Checkpoint for best validation f1
val_f1 = self.calculate_metrics(train_targets[val_index], val_pred, f1_only=True)
if val_f1 > best_val_f1:
prev_best_val_f1 = best_val_f1
best_val_f1 = val_f1
torch.save(model.state_dict(), self.PATHS['xlm'])
evaluated_epoch = epoch
# Calc the metrics
self.save_metrics(train_targets[train_index], train_pred, avg_loss, 'train')
self.save_metrics(train_targets[val_index], val_pred, avg_val_loss, 'val')
I am certain there is a more efficient way to
a) store the predictions without having to push them to cpu each batch. b) calculate the metrics on GPU directly?
As I am new to PyTorch, I am very grateful for any hints and feedback :)
You can compute the F-score yourself in pytorch. The F1-score is defined for single-class (true/false) classification only. The only thing you need is to aggregating the number of:
Count of the class in the ground truth target data;
Count of the class in the predictions;
Count how many times the class was correctly predicted.
Let's assume you want to compute F1 score for the class with index 0 in your softmax. In every batch, you can do:
predicted_classes = torch.argmax(y_pred, dim=1) == 0
target_classes = self.get_vector(y_batch)
target_true += torch.sum(target_classes == 0).float()
predicted_true += torch.sum(predicted_classes).float()
correct_true += torch.sum(
predicted_classes == target_classes * predicted_classes == 0).float()
When all batches are processed:
recall = correct_true / target_true
precision = correct_true / predicted_true
f1_score = 2 * precission * recall / (precision + recall)
Don't forget to take care of cases when precision and recall are zero and when then desired class was not predicted at all.
Related
my data are divided to 2 parts training and validation one . I Used load_dataset and dataloader functions . I convert data in dataset to torch format using traindataset.set_format
when starting training I got error
new(): invalid data type 'numpy.str_'
in this line
for step,batch in enumerate(train_dataloader):
so how can i fix this error?
model= MixModel()
#model.load_state_dict(torch.load(r"/media/sh/saved_weightscnnbert.pt"))
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
traindataset = load_dataset('csv', data_files='/content/drive//My Drive/Colab Notebooks/newdataset/newdata_train2',split='train')
testdataset = load_dataset('csv', data_files='/content/drive//My Drive/Colab Notebooks/newdataset/newdata_valid2',split='train')
traindataset =traindataset.map(encode)
testdataset1 = testdataset.map(encode)
traindataset =traindataset.map(lambda examples: {'labels': examples['symptoms']}, batched=True)
testdataset =testdataset1.map(lambda examples: {'labels': examples['symptoms']}, batched=True)
traindataset.set_format(type='torch', columns=['input_ids', 'attention_mask', 'labels'])
testdataset.set_format(type='torch', columns=['input_ids', 'attention_mask', 'labels'])
train_dataloader = torch.utils.data.DataLoader(traindataset, batch_size= 64)
test_dataloader = torch.utils.data.DataLoader(testdataset, batch_size= 64)
# function to train the model
def train():
model.train()
total_loss, total_accuracy = 0, 0
# empty list to save model predictions
total_preds=[]
Labels=[]
# iterate over batches
for step,batch in enumerate(train_dataloader):
# progress update after every 50 batches.
if step % 100 == 0 and not step == 0:
print(' Batch {:>5,} of {:>5,}.'.format(step, len(train_dataloader)))
sent_id, mask, labels = batch['input_ids'],batch['attention_mask'],batch['labels']
# clear previously calculated gradients
model.zero_grad()
# get model predictions for the current batch
preds = model(sent_id, mask, labels)
# compute the loss between actual and predicted values
alpha=0.25
gamma=2
ce_loss = loss_fn(preds, labels)
#pt = torch.exp(-ce_loss)
#focal_loss = (alpha * (1-pt)**gamma * ce_loss).mean() # mean over the batch
# add on to the total loss
total_loss = total_loss + ce_loss.item()
# backward pass to calculate the gradients
ce_loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
# update parameters
optimizer.step()
preds =torch.argmax(preds, dim=1)
total_preds.append(preds)
total_accuracy += (preds == labels).float().sum()
# compute the training loss of the epoch
avg_loss = total_loss / len(train_dataloader)
avg_accuracy = total_accuracy / len(traindataset)
# predictions are in the form of (no. of batches, size of batch, no. of classes).
# reshape the predictions in form of (number of samples, no. of classes)
total_preds = np.concatenate(total_preds, axis=0)
#returns the loss and predictions
return avg_loss, total_preds, avg_accuracy
I am working on a very imbalanced data, 15% labeled as 1 and the rest as 0, using BERT.
the code i wrote uses maxing outputs which gives me predictions of 0 for everything.
How do I include thresholds in my code to maximise my predictions of 1.
nsteps=215
nepoch=3
best_val_acc = 0
for epoch in range(nepoch):
model.train()
print(f"epoch n°{epoch+1}:")
av_epoch_loss=0
progress_bar = tqdm(range(nsteps))
for batch in trainloader:
batch = {k:v.cuda() for k,v in batch.items()}
outputs = model(**batch)
loss = criterion(outputs, *batch)
av_epoch_loss += loss
loss.backward()
optim.step()
optim.zero_grad()
predictions=torch.argmax(outputs.logits, dim=-1)
f1.add_batch(predictions=predictions, references=batch["labels"])
acc.add_batch(predictions=predictions, references=batch["labels"])
progress_bar.update(1)
av_epoch_loss /= nsteps
print(f"Training Loss: {av_epoch_loss: .2f}")
acc_res = acc.compute()["accuracy"]
print(f"Training Accuracy: {acc_res:.2f}")
f_res = f1.compute()["f1"]
print(f"Training F1-score: {f_res:.2f}")
model.eval()
val_acc = validate(model)
if val_acc > best_val_acc:
print("Achieved best validation accuracy so far. Saving model.")
best_val_acc = val_acc
best_model_state = deepcopy(model.state_dict())
print("\n\n")
I looked in pytorch documentation but i couldn't figure it out.
I am training a single-layer neural network using PyTorch and saving the model after the validation loss decreases. Once the network has finished training, I load the saved model and pass my test set features through that (rather than the model from the last epoch) to see how well it does. However, more often that not, the validation loss will stop decreasing after about 150 epochs, and I'm worried that the network is overfitting the data. Would it be better for me to load the saved model during training if the validation loss has not decreased for some number of iterations (say, after 5 epochs), and then train on that saved model instead?
Also, are there any recommendations for how to avoid a situation where the validation loss stops decreasing? I've had some models where the validation loss continues to decrease even after 500 epochs and others where it stops decreasing after 100. Here is my code so far:
class NeuralNetwork(nn.Module):
def __init__(self, input_dim, output_dim, nodes):
super(NeuralNetwork, self).__init__()
self.linear1 = nn.Linear(input_dim, nodes)
self.tanh = nn.Tanh()
self.linear2 = nn.Linear(nodes, output_dim)
def forward(self, x):
output = self.linear1(x)
output = self.tanh(output)
output = self.linear2(output)
return output
epochs = 500 # (start small for now)
learning_rate = 0.01
w_decay = 0.1
momentum = 0.9
input_dim = 4
output_dim = 1
nodes = 8
model = NeuralNetwork(input_dim, output_dim, nodes)
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, momentum=momentum, weight_decay=w_decay)
scheduler = ReduceLROnPlateau(optimizer, 'min', patience=5)
losses = []
val_losses = []
min_validation_loss = np.inf
means = [] # we want to store the mean and standard deviation for the test set later
stdevs = []
torch.save({
'epoch': 0,
'model_state_dict': model.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),
'training_loss': 0.0,
'validation_loss': 0.0,
'means': [],
'stdevs': [],
}, new_model_path)
new_model_saved = True
for epoch in range(epochs):
curr_loss = 0.0
validation_loss = 0.0
if new_model_saved:
checkpoint = torch.load(new_model_path)
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
means = checkpoint['means']
stdevs = checkpoint['stdevs']
new_model_saved = False
model.train()
for i, batch in enumerate(train_dataloader):
x, y = batch
x, new_mean, new_std = normalize_data(x, means, stdevs)
means = new_mean
stdevs = new_std
optimizer.zero_grad()
predicted_outputs = model(x)
loss = criterion(torch.squeeze(predicted_outputs), y)
loss.backward()
optimizer.step()
curr_loss += loss.item()
model.eval()
for x_val, y_val in val_dataloader:
x_val, val_means, val_std = normalize_data(x_val, means, stdevs)
predicted_y = model(x_val)
loss = criterion(torch.squeeze(predicted_y), y_val)
validation_loss += loss.item()
curr_lr = optimizer.param_groups[0]['lr']
if epoch % 10 == 0:
print(f'Epoch {epoch} \t\t Training Loss: {curr_loss/len(train_dataloader)} \t\t Validation Loss: {validation_loss/len(val_dataloader)} \t\t Learning rate: {curr_lr}')
if min_validation_loss > validation_loss:
print(f' For epoch {epoch}, validation loss decreased ({min_validation_loss:.6f}--->{validation_loss:.6f}) \t learning rate: {curr_lr} \t saving the model')
min_validation_loss = validation_loss
torch.save({
'epoch': epoch,
'model_state_dict': model.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),
'training_loss': curr_loss/len(train_dataloader),
'validation_loss': validation_loss/len(val_dataloader),
'means': means,
'stdevs': stdevs
}, new_model_path)
new_model_saved = True
losses.append(curr_loss/len(train_dataloader))
val_losses.append(validation_loss/len(val_dataloader))
scheduler.step(curr_loss/len(train_dataloader))
The phenomenon of the validation loss increases whereas the training loss decreases is called overfitting. Overfitting is a problem when training a model and should be avoided. please read more on this topic here. Overfitting may occur after any number of epochs and id dependent on a lot of variables(learning rate, database side, database diversity and more). as a rule of thumb, test your model at the "pivot point", i.e. exactly where the validation loss begins to increase (and the training continues to decrease). This means that my recommendation is to save the model after each iteration where the validation loss decreases. If it keeps increasing after any X number of epochs, it probably means that you reach a "deep" minimum for the loss and it will not be beneficial to keep training (again, this has some exceptions but for this level of discussion it is enough)
I encourage you to read and learn more about this subject, It is very interesting and has significant implications.
I tried to make loss function with R2in nn.LSTM but i couldnt find any documentation about it . I already use RMSE and MAE loss from pytorch.
My data is a time series and im doing time series forecasting
This is the code where i use the loss function of RMSE in data training
model = LSTM_model(input_size=1, output_size=1, hidden_size=512, num_layers=2, dropout=0).to(device)
criterion = nn.MSELoss(reduction="sum")
optimizer = optim.Adam(model.parameters(), lr=0.001)
callback = Callback(model, early_stop_patience=10 ,outdir="model/lstm", plot_every=20,)
from tqdm.auto import tqdm
def loop_fn(mode, dataset, dataloader, model, criterion, optimizer,device):
if mode =="train":
model.train()
elif mode =="test":
model.eval()
cost = 0
for feature, target in tqdm(dataloader, desc=mode.title()):
feature, target = feature.to(device), target.to(device)
output , hidden = model(feature,None)
loss = torch.sqrt(criterion(output,target))
if mode =="train":
loss.backward()
optimizer.step()
optimizer.zero_grad()
cost += loss.item() * feature.shape[0]
cost = cost / len(dataset)
return cost
And this is the code to start data training
while True :
train_cost = loop_fn("train", train_set, trainloader, model, criterion, optimizer,device)
with torch.no_grad():
test_cost = loop_fn("test", test_set, testloader, model, criterion, optimizer,device)
callback.log(train_cost, test_cost)
callback.save_checkpoint()
callback.cost_runtime_plotting()
if callback.early_stopping(model, monitor="test_cost"):
callback.plot_cost()
break
Can anyone help me with the R2 loss function ? Thank you in advance
Here is an implemention,
"""
From https://en.wikipedia.org/wiki/Coefficient_of_determination
"""
def r2_loss(output, target):
target_mean = torch.mean(target)
ss_tot = torch.sum((target - target_mean) ** 2)
ss_res = torch.sum((target - output) ** 2)
r2 = 1 - ss_res / ss_tot
return r2
You can use it as below,
loss = r2_loss(output, target)
loss.backward()
The following library function already implements the comments I have made on Melike's solution:
from torchmetrics.functional import r2_score
loss = r2_score(output, target)
loss.backward()
I am very new in pytorch and implementing my own network of image classifier. However I see for each epoch training accuracy is very good but validation accuracy is 0.i noted till 5th epoch. I am using Adam optimizer and have learning rate .001. also resampling the whole data set after each epoch into training n validation set. Please help where I am going wrong.
Here is my code:
### where is data?
data_dir_train = '/home/sup/PycharmProjects/deep_learning/CNN_Data/training_set'
data_dir_test = '/home/sup/PycharmProjects/deep_learning/CNN_Data/test_set'
# Define your batch_size
batch_size = 64
allData = datasets.ImageFolder(root=data_dir_train,transform=transformArr)
# We need to further split our training dataset into training and validation sets.
def split_train_validation():
# Define the indices
num_train = len(allData)
indices = list(range(num_train)) # start with all the indices in training set
split = int(np.floor(0.2 * num_train)) # define the split size
#train_idx, valid_idx = indices[split:], indices[:split]
# Random, non-contiguous split
validation_idx = np.random.choice(indices, size=split, replace=False)
train_idx = list(set(indices) - set(validation_idx))
# define our samplers -- we use a SubsetRandomSampler because it will return
# a random subset of the split defined by the given indices without replacement
train_sampler = SubsetRandomSampler(train_idx)
validation_sampler = SubsetRandomSampler(validation_idx)
#train_loader = DataLoader(allData,batch_size=batch_size,sampler=train_sampler,shuffle=False,num_workers=4)
#validation_loader = DataLoader(dataset=allData,batch_size=1, sampler=validation_sampler)
return (train_sampler,validation_sampler)
Training
from torch.optim import Adam
import torch
import createNN
import torch.nn as nn
import loadData as ld
from torch.autograd import Variable
from torch.utils.data import DataLoader
# check if cuda - GPU support available
cuda = torch.cuda.is_available()
#create model, optimizer and loss function
model = createNN.ConvNet(class_num=2)
optimizer = Adam(model.parameters(),lr=.001,weight_decay=.0001)
loss_func = nn.CrossEntropyLoss()
if cuda:
model.cuda()
# function to save model
def save_model(epoch):
torch.save(model.load_state_dict(),'imageClassifier_{}.model'.format(epoch))
print('saved model at epoch',epoch)
def exp_lr_scheduler ( epoch , init_lr = args.lr, weight_decay = args.weight_decay, lr_decay_epoch = cf.lr_decay_epoch):
lr = init_lr * ( 0.5 ** (epoch // lr_decay_epoch))
def train(num_epochs):
best_acc = 0.0
for epoch in range(num_epochs):
print('\n\nEpoch {}'.format(epoch))
train_sampler, validation_sampler = ld.split_train_validation()
train_loader = DataLoader(ld.allData, batch_size=30, sampler=train_sampler, shuffle=False)
validation_loader = DataLoader(dataset=ld.allData, batch_size=1, sampler=validation_sampler)
model.train()
acc = 0.0
loss = 0.0
total = 0
# train model with training data
for i,(images,labels) in enumerate(train_loader):
# if cuda then move to GPU
if cuda:
images = images.cuda()
labels = labels.cuda()
# Variable class wraps a tensor and we can calculate grad
images = Variable(images)
labels = Variable(labels)
# reset accumulated gradients for each batch
optimizer.zero_grad()
# pass images to model which returns preiction
output = model(images)
#calculate the loss based on prediction and actual
loss = loss_func(output,labels)
# backpropagate the loss and compute gradient
loss.backward()
# update weights as per the computed gradients
optimizer.step()
# prediction class
predVal , predClass = torch.max(output.data, 1)
acc += torch.sum(predClass == labels.data)
loss += loss.cpu().data[0]
total += labels.size(0)
# print the statistics
train_acc = acc/total
train_loss = loss / total
print('Mean train acc = {} over epoch = {}'.format(epoch,acc))
print('Mean train loss = {} over epoch = {}'.format(epoch, loss))
# Valid model with validataion data
model.eval()
acc = 0.0
loss = 0.0
total = 0
for i,(images,labels) in enumerate(validation_loader):
# if cuda then move to GPU
if cuda:
images = images.cuda()
labels = labels.cuda()
# Variable class wraps a tensor and we can calculate grad
images = Variable(images)
labels = Variable(labels)
# reset accumulated gradients for each batch
optimizer.zero_grad()
# pass images to model which returns preiction
output = model(images)
#calculate the loss based on prediction and actual
loss = loss_func(output,labels)
# backpropagate the loss and compute gradient
loss.backward()
# update weights as per the computed gradients
optimizer.step()
# prediction class
predVal, predClass = torch.max(output.data, 1)
acc += torch.sum(predClass == labels.data)
loss += loss.cpu().data[0]
total += labels.size(0)
# print the statistics
valid_acc = acc / total
valid_loss = loss / total
print('Mean train acc = {} over epoch = {}'.format(epoch, valid_acc))
print('Mean train loss = {} over epoch = {}'.format(epoch, valid_loss))
if(best_acc<valid_acc):
best_acc = valid_acc
save_model(epoch)
# at 30th epoch we save the model
if (epoch == 30):
save_model(epoch)
train(20)
I think you did not take into account that acc += torch.sum(predClass == labels.data) returns a tensor instead of a float value. Depending on the version of pytorch you are using I think you should change it to:
acc += torch.sum(predClass == labels.data).cpu().data[0] #pytorch 0.3
acc += torch.sum(predClass == labels.data).item() #pytorch 0.4
Although your code seems to be working for old pytorch version, I would recommend you to upgrade to the 0.4 version.
Also, I mentioned other problems/typos in your code.
You are loading the dataset for every epoch.
for epoch in range(num_epochs):
print('\n\nEpoch {}'.format(epoch))
train_sampler, validation_sampler = ld.split_train_validation()
train_loader = DataLoader(ld.allData, batch_size=30, sampler=train_sampler, shuffle=False)
validation_loader = DataLoader(dataset=ld.allData, batch_size=1, sampler=validation_sampler)
...
That should not happen, it should be enough loading it once
train_sampler, validation_sampler = ld.split_train_validation()
train_loader = DataLoader(ld.allData, batch_size=30, sampler=train_sampler, shuffle=False)
validation_loader = DataLoader(dataset=ld.allData, batch_size=1, sampler=validation_sampler)
for epoch in range(num_epochs):
print('\n\nEpoch {}'.format(epoch))
...
In the training part you have (this does not happen in the validation):
train_acc = acc/total
train_loss = loss / total
print('Mean train acc = {} over epoch = {}'.format(epoch,acc))
print('Mean train loss = {} over epoch = {}'.format(epoch, loss))
Where you are printing acc instead of train_acc
Also, in the validation part I mentioned that you are printing print('Mean train acc = {} over epoch = {}'.format(epoch, valid_acc)) when it should be something like 'Mean val acc'.
Changing this lines of code, using a standard model I created and CIFAR dataset the training seems to converge, accuracy increases at every epoch while mean loss value decreases.
I Hope I could help you!