PyTorch: accuracy of validation set greater than 100% during training

PyTorch: accuracy of validation set greater than 100% during training - python

1 ) Problem
I observe an odd behaviour during training where my validation-accuracy is above 100% right from the start.
Epoch 0/3
----------
100%|██████████| 194/194 [00:50<00:00, 3.82it/s]
train Loss: 1.8653 Acc: 0.4796
100%|██████████| 194/194 [00:32<00:00, 5.99it/s]
val Loss: 1.7611 Acc: 1.2939
Epoch 1/3
----------
100%|██████████| 194/194 [00:42<00:00, 4.61it/s]
train Loss: 0.8704 Acc: 0.7467
100%|██████████| 194/194 [00:31<00:00, 6.11it/s]
val Loss: 1.0801 Acc: 1.4694
The output indicates that one epoch iterates over 194 batches, which does seem to be correct for the training data (which has a length of 6186, batch_size is 32, hence 32*194 = 6208 and this is ≈6186) but does not match the size of the validation-data (length of 3447, batch_size = 32).
Hence I would expect my validation-loop to generate 108 (3447 / 32 ≈ 108) batches insted of 194.
I thought this behaviour is handled within my for loop at:
for dataset in tqdm(dataloaders[phase]):
But somehow I can't figure out what is wrong here. See point 3) below for my entire code.
2 ) Question
If my assumption above is correct i.e. that this error stems from the for-loop within in my code then I would like to know the following:
How do I need to adjust the for-loop during the validation phase to handle the number of batches that are being used for validation correctly?
3 ) Background:
Following two tutorials, one on how to do transfer-learning (https://discuss.pytorch.org/t/transfer-learning-using-vgg16/20653) and one on how to do data-loading (https://pytorch.org/tutorials/beginner/data_loading_tutorial.html) in pytorch, I am trying to customize the code such that I can perform transfer-learning on a new custom dataset which I want to provide via pandas dataframes.
As such, my training- and validation-data is provided via two dataframes (df_train & df_val) which both contain two columns, one for the path and one for the target. E.g. like this:
url target
0 C:/Users/aaron/Desktop/pics/4ebd... 9
1 C:/Users/aaron/Desktop/pics/7153... 3
2 C:/Users/aaron/Desktop/pics/3ee6... 3
3 C:/Users/aaron/Desktop/pics/4652... 16
4 C:/Users/aaron/Desktop/pics/28ce... 15
...
And their respective length:
print(len(df_train))
print(len(df_val))
>> 6186
>> 3447
My pipeline looks like this:
class CustomDataset(Dataset):
def __init__(self, df, transform=None):
self.dataframe = df_train
self.transform = transform
def __len__(self):
return len(self.dataframe)
def __getitem__(self, idx):
img_name = self.dataframe.iloc[idx, 0]
img = Image.open(img_name)
img_normalized = self.transform(img)
landmarks = self.dataframe.iloc[idx, 1]
sample = {'data': img_normalized, 'label': int(landmarks)}
return sample
train_dataset = CustomDataset(df_train,transform=transforms.Compose([
transforms.Resize(224),
transforms.ToTensor(),transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])]))
val_dataset = CustomDataset(df_val,transform=transforms.Compose([
transforms.Resize(224),
transforms.ToTensor(),transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])]))
train_loader = torch.utils.data.DataLoader(train_dataset,batch_size=32,shuffle=True, num_workers=0)
val_loader = torch.utils.data.DataLoader(val_dataset,batch_size=32,shuffle=True, num_workers=0)
dataloaders = {'train': train_loader, 'val': val_loader}
dataset_sizes = {'train': len(df_train) ,'val': len(df_val)}
################### Training
from tqdm import tqdm
def train_model(model, criterion, optimizer, scheduler, num_epochs=25):
since = time.time()
best_model_wts = copy.deepcopy(model.state_dict())
best_acc = 0.0
for epoch in range(num_epochs):
print('Epoch {}/{}'.format(epoch, num_epochs - 1))
print('-' * 10)
# Each epoch has a training and validation phase
for phase in ['train', 'val']:
if phase == 'train':
scheduler.step()
model.train() # Set model to training mode
else:
model.eval() # Set model to evaluate mode
running_loss = 0.0
running_corrects = 0
# Iterate over data.
for dataset in tqdm(dataloaders[phase]):
inputs, labels = dataset["data"], dataset["label"]
#print(inputs.type())
inputs = inputs.to(device, dtype=torch.float)
labels = labels.to(device,dtype=torch.long)
# zero the parameter gradients
optimizer.zero_grad()
# forward
# track history if only in train
with torch.set_grad_enabled(phase == 'train'):
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)
# backward + optimize only if in training phase
if phase == 'train':
loss.backward()
optimizer.step()
# statistics
running_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds == labels.data)
epoch_loss = running_loss / dataset_sizes[phase]
epoch_acc = running_corrects.double() / dataset_sizes[phase]
print('{} Loss: {:.4f} Acc: {:.4f}'.format(
phase, epoch_loss, epoch_acc))
# deep copy the model
if phase == 'val' and epoch_acc > best_acc:
best_acc = epoch_acc
best_model_wts = copy.deepcopy(model.state_dict())
print()
time_elapsed = time.time() - since
print('Training complete in {:.0f}m {:.0f}s'.format(
time_elapsed // 60, time_elapsed % 60))
print('Best val Acc: {:4f}'.format(best_acc))
# load best model weights
model.load_state_dict(best_model_wts)
return model
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model_ft = models.resnet18(pretrained=True)
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs, len(le.classes_))
model_ft = model_ft.to(device)
criterion = nn.CrossEntropyLoss()
# Observe that all parameters are being optimized
optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9)
# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)
model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler,
num_epochs=4)

Your problem appears to be here:
class CustomDataset(Dataset):
def __init__(self, df, transform=None):
>>>>> self.dataframe = df_train
This should be
self.dataframe = df
In your case, you are inadvertently setting both the train and val CustomDataset to df_train ...

Related

Understanding model training and evaluation in Pytorch

I am following a Pytorch code on deep learning. Where I saw model evaluation taking place within the training epoch!
Q) Should the torch.no_grad and model.eval() be out of the training epoch loop?
Q) And how to determine that, which parameter (weight) are getting optimised by the optimiser during the back-propagation?
...
for l in range(1):
model = GTN(num_edge=A.shape[-1],
num_channels=num_channels,w_in = node_features.shape[1],w_out = node_dim,
num_class=num_classes,num_layers=num_layers,norm=norm)
if adaptive_lr == 'false':
optimizer = torch.optim.Adam(model.parameters(), lr=0.005, weight_decay=0.001)
else:
optimizer = torch.optim.Adam([{'params':model.weight},{'params':model.linear1.parameters()},{'params':model.linear2.parameters()},
{"params":model.layers.parameters(), "lr":0.5}], lr=0.005, weight_decay=0.001)
loss = nn.CrossEntropyLoss()
# Train & Valid & Test
best_val_loss = 10000
best_train_loss = 10000
best_train_f1 = 0
best_val_f1 = 0
for i in range(epochs):
print('Epoch: ',i+1)
model.zero_grad()
model.train()
loss,y_train,Ws = model(A, node_features, train_node, train_target)
train_f1 = torch.mean(f1_score(torch.argmax(y_train.detach(),dim=1), train_target, num_classes=num_classes)).cpu().numpy()
print('Train - Loss: {}, Macro_F1: {}'.format(loss.detach().cpu().numpy(), train_f1))
loss.backward()
optimizer.step()
model.eval()
# Valid
with torch.no_grad():
val_loss, y_valid,_ = model.forward(A, node_features, valid_node, valid_target)
val_f1 = torch.mean(f1_score(torch.argmax(y_valid,dim=1), valid_target, num_classes=num_classes)).cpu().numpy()
if val_f1 > best_val_f1:
best_val_loss = val_loss.detach().cpu().numpy()
best_train_loss = loss.detach().cpu().numpy()
best_train_f1 = train_f1
best_val_f1 = val_f1
print('---------------Best Results--------------------')
print('Train - Loss: {}, Macro_F1: {}'.format(best_train_loss, best_train_f1))
print('Valid - Loss: {}, Macro_F1: {}'.format(best_val_loss, best_val_f1))
final_f1 += best_test_f1

For each epoch, you are doing train, followed by validation/test.
For validation/test you are moving the model to evaluation model
using model.eval() and then doing forward propagation with
torch.no_grad() which is correct. Again, you are moving back the
model back to train model using model.train() at the start of
train. There is no issue with the code and you are using the model
modes correctly.
In your code, if adaptive_lr if False then you are optimizing the parameters given by model.parameters() and when adaptive_lr
is True then you are optimizing:
model.weight
model.linear1.parameters()
model.linear2.parameters()
model.layers.parameters()

Low accuracy binary classification with Pytorch

In practicing deep learning for binary classification with Pytorch on Breast-Cancer-Wisconsin-Diagnostic-DataSet.
I've tried different approaches, and the best I can get as below, the accuracy is still low at 61%.
What's the way to improve the accuracy?
Thank you.
import pandas as pd
import io
dataset = pd.read_excel(base_dir + "Breast-Cancer-Wisconsin-Diagnostic.xlsx")
number_of_columns = dataset.shape[1]
# training and testing split of 70:30
dataset['diagnosis'] = pd.Categorical(dataset['diagnosis']).codes
dataset = dataset.sample(frac=1, random_state=1234)
train_input = dataset.values[:398, :number_of_columns-1]
train_target = dataset.values[:398, number_of_columns-1]
test_input = dataset.values[398:, :number_of_columns-1]
test_target = dataset.values[398:, number_of_columns-1]
import torch
torch.manual_seed(1234)
hidden_units = 5
net = torch.nn.Sequential(
torch.nn.Linear(number_of_columns-1, hidden_units),
torch.nn.ReLU(),
torch.nn.Linear(hidden_units, 2))
# choose optimizer and loss function
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(net.parameters(), lr=0.1,momentum=0.9)
# train
epochs = 50
for epoch in range(epochs):
inputs = torch.autograd.Variable(torch.Tensor(train_input).float())
targets = torch.autograd.Variable(torch.Tensor(train_target).long())
optimizer.zero_grad()
out = net(inputs)
loss = criterion(out, targets)
loss.backward()
optimizer.step()
if epoch == 0 or (epoch + 1) % 10 == 0:
print('Epoch %d Loss: %.4f' % (epoch + 1, loss.item()))
# Epoch 1 Loss: 412063.1250
# Epoch 10 Loss: 0.6628
# Epoch 20 Loss: 0.6639
# Epoch 30 Loss: 0.6592
# Epoch 40 Loss: 0.6587
# Epoch 50 Loss: 0.6588
import numpy as np
inputs = torch.autograd.Variable(torch.Tensor(test_input).float())
targets = torch.autograd.Variable(torch.Tensor(test_target).long())
optimizer.zero_grad()
out = net(inputs)
_, predicted = torch.max(out.data, 1)
error_count = test_target.size - np.count_nonzero((targets == predicted).numpy())
print('Errors: %d; Accuracy: %d%%' % (error_count, 100 * torch.sum(targets == predicted) // test_target.size))
# Errors: 65; Accuracy: 61%

Features Representing samples are in different range. So, First thing you should do is to normalize the data.
You should plot the loss and acc over the training epochs for training and validation/test dataset to understand whether the model overfits on training data or underfit.
Furthermore, you can try with more complex (deeper) model. And since your training dataset has few number of samples, you can consider augmentation and transfer learning as well if possible.

Unexpected behavior in model validation with tf.slim and inception_v1

I am trying to use the inception_v1 module written in tf.slim provided here to train the model on CIFAR 10 dataset.
The code to train and evaluate the model on the dataset is below.
# test_data = (data['images_test'], data['labels_test'])
train_data = (train_x, train_y)
val_data = (val_x, val_y)
# create two datasets, one for training and one for test
train_dataset = tf.data.Dataset.from_tensor_slices(train_data).shuffle(buffer_size=10000).batch(BATCH_SIZE).map(preprocess)
# train_dataset = train_dataset.shuffle(buffer_size=10000).batch(BATCH_SIZE).map(preprocess)
val_dataset = tf.data.Dataset.from_tensor_slices(val_data).batch(BATCH_SIZE).map(preprocess)
# test_dataset = tf.data.Dataset.from_tensor_slices(test_data).batch(BATCH_SIZE).map(preprocess)
# create a _iterator of the correct shape and type
_iter = tf.data.Iterator.from_structure(
train_dataset.output_types,
train_dataset.output_shapes
)
features, labels = _iter.get_next()
# create the initialization operations
train_init_op = _iter.make_initializer(train_dataset)
val_init_op = _iter.make_initializer(val_dataset)
# test_init_op = _iter.make_initializer(test_dataset)
# Placeholders which evaluate in the session
training_mode = tf.placeholder(shape=None, dtype=tf.bool)
dropout_prob = tf.placeholder_with_default(1.0, shape=())
reuse_bool = tf.placeholder_with_default(True, shape=())
# Init the saver Object which handles saves and restores of
# model weights
# saver = tf.train.Saver()
# Initialize the model inside the arg_scope to define the batch
# normalization layer and the appropriate parameters
with slim.arg_scope(inception_v1_arg_scope(use_batch_norm=True)) as scope:
logits, end_points = inception_v1(features,
reuse=None,
dropout_keep_prob=dropout_prob, is_training=training_mode)
# Create the cross entropy loss function
cross_entropy = tf.reduce_mean(
tf.losses.softmax_cross_entropy(tf.one_hot(labels, 10), logits))
train_op = tf.train.AdamOptimizer(1e-2).minimize(loss=cross_entropy)
# train_op = slim.learning.create_train_op(cross_entropy, optimizer, global_step=)
# Define the accuracy metric
preds = tf.argmax(logits, axis=-1, output_type=tf.int64)
acc = tf.reduce_mean(tf.cast(tf.equal(preds, labels), tf.float32))
# Count the iterations for each set
n_train_batches = train_y.shape[0] // BATCH_SIZE
n_val_batches = val_y.shape[0] // BATCH_SIZE
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
# saver = tf.train.Saver([v for v in tf.all_variables()][:-1])
# for v in tf.all_variables():
# print(v.name)
# saver.restore(sess, tf.train.latest_checkpoint('./', latest_filename='inception_v1.ckpt'))
for i in range(EPOCHS):
total_loss = 0
total_acc = 0
# Init train session
sess.run(train_init_op)
with tqdm(total=n_train_batches * BATCH_SIZE) as pbar:
for batch in range(n_train_batches):
_, loss, train_acc = sess.run([train_op, cross_entropy, acc], feed_dict={training_mode: True, dropout_prob: 0.2})
total_loss += loss
total_acc += train_acc
pbar.update(BATCH_SIZE)
print("Epoch: {} || Loss: {:.5f} || Acc: {:.5f} %".\
format(i+1, total_loss / n_train_batches, (total_acc / n_train_batches)*100))
# Switch to validation
total_val_loss = 0
total_val_acc = 0
sess.run(val_init_op)
for batch in range(n_val_batches):
val_loss, val_acc = sess.run([cross_entropy, acc], feed_dict={training_mode: False})
total_val_loss += val_loss
total_val_acc += val_acc
print("Epoch: {} || Validation Loss: {:.5f} || Val Acc: {:.5f} %".\
format(i+1, total_val_loss / n_val_batches, (total_val_acc / n_val_batches) * 100))
The paradox is that I get the following results when training and evaluate the model on the validation set:
Epoch: 1 || Loss: 2.29436 || Acc: 23.61750 %
│Epoch: 1 || Validation Loss: 1158854431554614016.00000 || Val Acc: 10.03000 %
│100%|███████████████████████████████████████████████████| 40000/40000 [03:52<00:00, 173.21it/s]
│Epoch: 2 || Loss: 1.68389 || Acc: 36.49250 %
│Epoch: 2 || Validation Loss: 27997399226326712.00000 || Val Acc: 10.03000 %
│100%|██████████████████████████████████████████████████▋| 39800/40000 [03:51<00:01, 174.11it/s]
I have set the training_mode to true during training and false during the validation. However, regarding the train_op that is only set in the training phase the model seems to be unset in the validation set. My guess is that the is_training variable does not handle the situation very well and does not keep the variables of the batch normalization initialized in the validation. Has anyone experienced a similar situation before?

I found the solution to my problem. Two things were involved in this problem.
The first one was to set a smaller batch norm decay due to a smaller than imagenet dataset i should lower it to 0.99.
batch_norm_decay=0.99
And the other thing was to use the following line in order to keep track of the trainable parameters of batch normalization layer.
train_op = slim.learning.create_train_op(cross_entropy, optimizer)

Neural networks pytorch

I am very new in pytorch and implementing my own network of image classifier. However I see for each epoch training accuracy is very good but validation accuracy is 0.i noted till 5th epoch. I am using Adam optimizer and have learning rate .001. also resampling the whole data set after each epoch into training n validation set. Please help where I am going wrong.
Here is my code:
### where is data?
data_dir_train = '/home/sup/PycharmProjects/deep_learning/CNN_Data/training_set'
data_dir_test = '/home/sup/PycharmProjects/deep_learning/CNN_Data/test_set'
# Define your batch_size
batch_size = 64
allData = datasets.ImageFolder(root=data_dir_train,transform=transformArr)
# We need to further split our training dataset into training and validation sets.
def split_train_validation():
# Define the indices
num_train = len(allData)
indices = list(range(num_train)) # start with all the indices in training set
split = int(np.floor(0.2 * num_train)) # define the split size
#train_idx, valid_idx = indices[split:], indices[:split]
# Random, non-contiguous split
validation_idx = np.random.choice(indices, size=split, replace=False)
train_idx = list(set(indices) - set(validation_idx))
# define our samplers -- we use a SubsetRandomSampler because it will return
# a random subset of the split defined by the given indices without replacement
train_sampler = SubsetRandomSampler(train_idx)
validation_sampler = SubsetRandomSampler(validation_idx)
#train_loader = DataLoader(allData,batch_size=batch_size,sampler=train_sampler,shuffle=False,num_workers=4)
#validation_loader = DataLoader(dataset=allData,batch_size=1, sampler=validation_sampler)
return (train_sampler,validation_sampler)
Training
from torch.optim import Adam
import torch
import createNN
import torch.nn as nn
import loadData as ld
from torch.autograd import Variable
from torch.utils.data import DataLoader
# check if cuda - GPU support available
cuda = torch.cuda.is_available()
#create model, optimizer and loss function
model = createNN.ConvNet(class_num=2)
optimizer = Adam(model.parameters(),lr=.001,weight_decay=.0001)
loss_func = nn.CrossEntropyLoss()
if cuda:
model.cuda()
# function to save model
def save_model(epoch):
torch.save(model.load_state_dict(),'imageClassifier_{}.model'.format(epoch))
print('saved model at epoch',epoch)
def exp_lr_scheduler ( epoch , init_lr = args.lr, weight_decay = args.weight_decay, lr_decay_epoch = cf.lr_decay_epoch):
lr = init_lr * ( 0.5 ** (epoch // lr_decay_epoch))
def train(num_epochs):
best_acc = 0.0
for epoch in range(num_epochs):
print('\n\nEpoch {}'.format(epoch))
train_sampler, validation_sampler = ld.split_train_validation()
train_loader = DataLoader(ld.allData, batch_size=30, sampler=train_sampler, shuffle=False)
validation_loader = DataLoader(dataset=ld.allData, batch_size=1, sampler=validation_sampler)
model.train()
acc = 0.0
loss = 0.0
total = 0
# train model with training data
for i,(images,labels) in enumerate(train_loader):
# if cuda then move to GPU
if cuda:
images = images.cuda()
labels = labels.cuda()
# Variable class wraps a tensor and we can calculate grad
images = Variable(images)
labels = Variable(labels)
# reset accumulated gradients for each batch
optimizer.zero_grad()
# pass images to model which returns preiction
output = model(images)
#calculate the loss based on prediction and actual
loss = loss_func(output,labels)
# backpropagate the loss and compute gradient
loss.backward()
# update weights as per the computed gradients
optimizer.step()
# prediction class
predVal , predClass = torch.max(output.data, 1)
acc += torch.sum(predClass == labels.data)
loss += loss.cpu().data[0]
total += labels.size(0)
# print the statistics
train_acc = acc/total
train_loss = loss / total
print('Mean train acc = {} over epoch = {}'.format(epoch,acc))
print('Mean train loss = {} over epoch = {}'.format(epoch, loss))
# Valid model with validataion data
model.eval()
acc = 0.0
loss = 0.0
total = 0
for i,(images,labels) in enumerate(validation_loader):
# if cuda then move to GPU
if cuda:
images = images.cuda()
labels = labels.cuda()
# Variable class wraps a tensor and we can calculate grad
images = Variable(images)
labels = Variable(labels)
# reset accumulated gradients for each batch
optimizer.zero_grad()
# pass images to model which returns preiction
output = model(images)
#calculate the loss based on prediction and actual
loss = loss_func(output,labels)
# backpropagate the loss and compute gradient
loss.backward()
# update weights as per the computed gradients
optimizer.step()
# prediction class
predVal, predClass = torch.max(output.data, 1)
acc += torch.sum(predClass == labels.data)
loss += loss.cpu().data[0]
total += labels.size(0)
# print the statistics
valid_acc = acc / total
valid_loss = loss / total
print('Mean train acc = {} over epoch = {}'.format(epoch, valid_acc))
print('Mean train loss = {} over epoch = {}'.format(epoch, valid_loss))
if(best_acc<valid_acc):
best_acc = valid_acc
save_model(epoch)
# at 30th epoch we save the model
if (epoch == 30):
save_model(epoch)
train(20)

I think you did not take into account that acc += torch.sum(predClass == labels.data) returns a tensor instead of a float value. Depending on the version of pytorch you are using I think you should change it to:
acc += torch.sum(predClass == labels.data).cpu().data[0] #pytorch 0.3
acc += torch.sum(predClass == labels.data).item() #pytorch 0.4
Although your code seems to be working for old pytorch version, I would recommend you to upgrade to the 0.4 version.
Also, I mentioned other problems/typos in your code.
You are loading the dataset for every epoch.
for epoch in range(num_epochs):
print('\n\nEpoch {}'.format(epoch))
train_sampler, validation_sampler = ld.split_train_validation()
train_loader = DataLoader(ld.allData, batch_size=30, sampler=train_sampler, shuffle=False)
validation_loader = DataLoader(dataset=ld.allData, batch_size=1, sampler=validation_sampler)
...
That should not happen, it should be enough loading it once
train_sampler, validation_sampler = ld.split_train_validation()
train_loader = DataLoader(ld.allData, batch_size=30, sampler=train_sampler, shuffle=False)
validation_loader = DataLoader(dataset=ld.allData, batch_size=1, sampler=validation_sampler)
for epoch in range(num_epochs):
print('\n\nEpoch {}'.format(epoch))
...
In the training part you have (this does not happen in the validation):
train_acc = acc/total
train_loss = loss / total
print('Mean train acc = {} over epoch = {}'.format(epoch,acc))
print('Mean train loss = {} over epoch = {}'.format(epoch, loss))
Where you are printing acc instead of train_acc
Also, in the validation part I mentioned that you are printing print('Mean train acc = {} over epoch = {}'.format(epoch, valid_acc)) when it should be something like 'Mean val acc'.
Changing this lines of code, using a standard model I created and CIFAR dataset the training seems to converge, accuracy increases at every epoch while mean loss value decreases.
I Hope I could help you!

PyTorch : GPU execution time

I am following different tutorials on PyTorch. And I'm trying to use GPU speed but I am encountering a problem between the execution time announced on the web site and my reality!
https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html
You can find here the code at the end of the page, and we can read that the program finished in 1min30 against 6/7min on my computer! Can you try it and tell me how many time have you got please! I am very confuse to see a high difference using the same file code than the web site!
This will allow me to understand if the problem comes from my GPU or not :)
My config :
GTX 1080Ti
Windows10
Cuda 9.1
Pytorch 0.4.0
The code:
if __name__ == "__main__":
plt.ion() # interactive mode
######################################################################
# Load Data
# ---------
#
# We will use torchvision and torch.utils.data packages for loading the
# data.
#
# The problem we're going to solve today is to train a model to classify
# **ants** and **bees**. We have about 120 training images each for ants and bees.
# There are 75 validation images for each class. Usually, this is a very
# small dataset to generalize upon, if trained from scratch. Since we
# are using transfer learning, we should be able to generalize reasonably
# well.
#
# This dataset is a very small subset of imagenet.
#
# .. Note ::
# Download the data from
# `here <https://download.pytorch.org/tutorial/hymenoptera_data.zip>`_
# and extract it to the current directory.
# Data augmentation and normalization for training
# Just normalization for validation
data_transforms = {
'train': transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
'val': transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
}
data_dir = 'hymenoptera_data'
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x),
data_transforms[x])
for x in ['train', 'val']}
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=4,
shuffle=True, num_workers=4)
for x in ['train', 'val']}
dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']}
class_names = image_datasets['train'].classes
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
######################################################################
# Visualize a few images
# ^^^^^^^^^^^^^^^^^^^^^^
# Let's visualize a few training images so as to understand the data
# augmentations.
def imshow(inp, title=None):
"""Imshow for Tensor."""
inp = inp.numpy().transpose((1, 2, 0))
mean = np.array([0.485, 0.456, 0.406])
std = np.array([0.229, 0.224, 0.225])
inp = std * inp + mean
inp = np.clip(inp, 0, 1)
plt.imshow(inp)
if title is not None:
plt.title(title)
plt.pause(0.001) # pause a bit so that plots are updated
# Get a batch of training data
inputs, classes = next(iter(dataloaders['train']))
# Make a grid from batch
out = torchvision.utils.make_grid(inputs)
imshow(out, title=[class_names[x] for x in classes])
######################################################################
# Training the model
# ------------------
#
# Now, let's write a general function to train a model. Here, we will
# illustrate:
#
# - Scheduling the learning rate
# - Saving the best model
#
# In the following, parameter ``scheduler`` is an LR scheduler object from
# ``torch.optim.lr_scheduler``.
def train_model(model, criterion, optimizer, scheduler, num_epochs=25):
since = time.time()
best_model_wts = copy.deepcopy(model.state_dict())
best_acc = 0.0
for epoch in range(num_epochs):
print('Epoch {}/{}'.format(epoch, num_epochs - 1))
print('-' * 10)
# Each epoch has a training and validation phase
for phase in ['train', 'val']:
if phase == 'train':
scheduler.step()
model.train() # Set model to training mode
else:
model.eval() # Set model to evaluate mode
running_loss = 0.0
running_corrects = 0
# Iterate over data.
for inputs, labels in dataloaders[phase]:
inputs = inputs.to(device)
labels = labels.to(device)
# zero the parameter gradients
optimizer.zero_grad()
# forward
# track history if only in train
with torch.set_grad_enabled(phase == 'train'):
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)
# backward + optimize only if in training phase
if phase == 'train':
loss.backward()
optimizer.step()
# statistics
running_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds == labels.data)
epoch_loss = running_loss / dataset_sizes[phase]
epoch_acc = running_corrects.double() / dataset_sizes[phase]
print('{} Loss: {:.4f} Acc: {:.4f}'.format(
phase, epoch_loss, epoch_acc))
# deep copy the model
if phase == 'val' and epoch_acc > best_acc:
best_acc = epoch_acc
best_model_wts = copy.deepcopy(model.state_dict())
print()
time_elapsed = time.time() - since
print('Training complete in {:.0f}m {:.0f}s'.format(
time_elapsed // 60, time_elapsed % 60))
print('Best val Acc: {:4f}'.format(best_acc))
# load best model weights
model.load_state_dict(best_model_wts)
return model
######################################################################
# Visualizing the model predictions
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#
# Generic function to display predictions for a few images
#
def visualize_model(model, num_images=6):
was_training = model.training
model.eval()
images_so_far = 0
fig = plt.figure()
with torch.no_grad():
for i, (inputs, labels) in enumerate(dataloaders['val']):
inputs = inputs.to(device)
labels = labels.to(device)
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
for j in range(inputs.size()[0]):
images_so_far += 1
ax = plt.subplot(num_images//2, 2, images_so_far)
ax.axis('off')
ax.set_title('predicted: {}'.format(class_names[preds[j]]))
imshow(inputs.cpu().data[j])
if images_so_far == num_images:
model.train(mode=was_training)
return
model.train(mode=was_training)
######################################################################
# Finetuning the convnet
# ----------------------
#
# Load a pretrained model and reset final fully connected layer.
#
model_ft = models.resnet18(pretrained=True)
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs, 2)
model_ft = model_ft.to(device)
criterion = nn.CrossEntropyLoss()
# Observe that all parameters are being optimized
optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9)
# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)
######################################################################
# Train and evaluate
# ^^^^^^^^^^^^^^^^^^
#
# It should take around 15-25 min on CPU. On GPU though, it takes less than a
# minute.
#
model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler,
num_epochs=25)
######################################################################
#
visualize_model(model_ft)
######################################################################
# ConvNet as fixed feature extractor
# ----------------------------------
#
# Here, we need to freeze all the network except the final layer. We need
# to set ``requires_grad == False`` to freeze the parameters so that the
# gradients are not computed in ``backward()``.
#
# You can read more about this in the documentation
# `here <http://pytorch.org/docs/notes/autograd.html#excluding-subgraphs-from-backward>`__.
#
model_conv = torchvision.models.resnet18(pretrained=True)
for param in model_conv.parameters():
param.requires_grad = False
# Parameters of newly constructed modules have requires_grad=True by default
num_ftrs = model_conv.fc.in_features
model_conv.fc = nn.Linear(num_ftrs, 2)
model_conv = model_conv.to(device)
criterion = nn.CrossEntropyLoss()
# Observe that only parameters of final layer are being optimized as
# opoosed to before.
optimizer_conv = optim.SGD(model_conv.fc.parameters(), lr=0.001, momentum=0.9)
# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=7, gamma=0.1)
######################################################################
# Train and evaluate
# ^^^^^^^^^^^^^^^^^^
#
# On CPU this will take about half the time compared to previous scenario.
# This is expected as gradients don't need to be computed for most of the
# network. However, forward does need to be computed.
#
model_conv = train_model(model_conv, criterion, optimizer_conv,
exp_lr_scheduler, num_epochs=25)
######################################################################
#
visualize_model(model_conv)
plt.ioff()
plt.show()
Thanks every body!

A bit late I know, but it might be that you are missing
model.to(cuda)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

PyTorch: accuracy of validation set greater than 100% during training - python

Your problem appears to be here: class CustomDataset(Dataset): def init(self, df, transform=None): >>>>> self.dataframe = df_train This should be self.dataframe = df In your case, you are inadvertently setting both the train and val CustomDataset to df_train ...

Related

Understanding model training and evaluation in Pytorch

Low accuracy binary classification with Pytorch

Unexpected behavior in model validation with tf.slim and inception_v1

Neural networks pytorch

PyTorch : GPU execution time

Categories

Resources

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

PyTorch: accuracy of validation set greater than 100% during training - python

Your problem appears to be here: class CustomDataset(Dataset): def __init__(self, df, transform=None): >>>>> self.dataframe = df_train This should be self.dataframe = df In your case, you are inadvertently setting both the train and val CustomDataset to df_train ...

Related

Understanding model training and evaluation in Pytorch

Low accuracy binary classification with Pytorch

Unexpected behavior in model validation with tf.slim and inception_v1

Neural networks pytorch

PyTorch : GPU execution time

Categories

Resources

Your problem appears to be here: class CustomDataset(Dataset): def init(self, df, transform=None): >>>>> self.dataframe = df_train This should be self.dataframe = df In your case, you are inadvertently setting both the train and val CustomDataset to df_train ...