Low accuracy binary classification with Pytorch

Low accuracy binary classification with Pytorch - python

In practicing deep learning for binary classification with Pytorch on Breast-Cancer-Wisconsin-Diagnostic-DataSet.
I've tried different approaches, and the best I can get as below, the accuracy is still low at 61%.
What's the way to improve the accuracy?
Thank you.
import pandas as pd
import io
dataset = pd.read_excel(base_dir + "Breast-Cancer-Wisconsin-Diagnostic.xlsx")
number_of_columns = dataset.shape[1]
# training and testing split of 70:30
dataset['diagnosis'] = pd.Categorical(dataset['diagnosis']).codes
dataset = dataset.sample(frac=1, random_state=1234)
train_input = dataset.values[:398, :number_of_columns-1]
train_target = dataset.values[:398, number_of_columns-1]
test_input = dataset.values[398:, :number_of_columns-1]
test_target = dataset.values[398:, number_of_columns-1]
import torch
torch.manual_seed(1234)
hidden_units = 5
net = torch.nn.Sequential(
torch.nn.Linear(number_of_columns-1, hidden_units),
torch.nn.ReLU(),
torch.nn.Linear(hidden_units, 2))
# choose optimizer and loss function
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(net.parameters(), lr=0.1,momentum=0.9)
# train
epochs = 50
for epoch in range(epochs):
inputs = torch.autograd.Variable(torch.Tensor(train_input).float())
targets = torch.autograd.Variable(torch.Tensor(train_target).long())
optimizer.zero_grad()
out = net(inputs)
loss = criterion(out, targets)
loss.backward()
optimizer.step()
if epoch == 0 or (epoch + 1) % 10 == 0:
print('Epoch %d Loss: %.4f' % (epoch + 1, loss.item()))
# Epoch 1 Loss: 412063.1250
# Epoch 10 Loss: 0.6628
# Epoch 20 Loss: 0.6639
# Epoch 30 Loss: 0.6592
# Epoch 40 Loss: 0.6587
# Epoch 50 Loss: 0.6588
import numpy as np
inputs = torch.autograd.Variable(torch.Tensor(test_input).float())
targets = torch.autograd.Variable(torch.Tensor(test_target).long())
optimizer.zero_grad()
out = net(inputs)
_, predicted = torch.max(out.data, 1)
error_count = test_target.size - np.count_nonzero((targets == predicted).numpy())
print('Errors: %d; Accuracy: %d%%' % (error_count, 100 * torch.sum(targets == predicted) // test_target.size))
# Errors: 65; Accuracy: 61%

Features Representing samples are in different range. So, First thing you should do is to normalize the data.
You should plot the loss and acc over the training epochs for training and validation/test dataset to understand whether the model overfits on training data or underfit.
Furthermore, you can try with more complex (deeper) model. And since your training dataset has few number of samples, you can consider augmentation and transfer learning as well if possible.

Related

how do I get to plot epoch vs accuracy and epoch vs avg loss

I'm trying to build a chatbot and need to display a plot for epochs vs accuracy and epoch vs avg loss and epoch vs final loss.
I'm only able to get epoch vs final loss. I'm not that great at code.
Any help is appreciated
epoc = []
los = []
for epoch in range(num_epochs):
for (words, labels) in train_loader:
words = words.to(device)
labels = labels.to(dtype=torch.long).to(device)
# Forward pass
outputs = model(words)
# if y would be one-hot, we must apply
# labels = torch.max(labels, 1)[1]
loss = criterion(outputs, labels)
# Backward and optimize
optimizer.zero_grad()
loss.backward()
optimizer.step()
epoc.append(epoch)
los.append(loss.item())
if (epoch + 1) % 100 == 0:
print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item():.4f}')
print(f'final loss: {loss.item():.4f}')
from matplotlib import pyplot as plt
plt.plot(epoc, los)
plt.show()
data = {
"model_state": model.state_dict(),
"input_size": input_size,
"hidden_size": hidden_size,
"output_size": output_size,
"all_words": all_words,
"tags": tags
}
FILE = "data.pth"
torch.save(data, FILE)
print(f'training complete. file saved to {FILE}')
I got the epoch vs final loss graph. Trying to get
Epoch vs accuracy
Epoch vs avg loss

how to set a threshold in pytorch

I am working on a very imbalanced data, 15% labeled as 1 and the rest as 0, using BERT.
the code i wrote uses maxing outputs which gives me predictions of 0 for everything.
How do I include thresholds in my code to maximise my predictions of 1.
nsteps=215
nepoch=3
best_val_acc = 0
for epoch in range(nepoch):
model.train()
print(f"epoch n°{epoch+1}:")
av_epoch_loss=0
progress_bar = tqdm(range(nsteps))
for batch in trainloader:
batch = {k:v.cuda() for k,v in batch.items()}
outputs = model(**batch)
loss = criterion(outputs, *batch)
av_epoch_loss += loss
loss.backward()
optim.step()
optim.zero_grad()
predictions=torch.argmax(outputs.logits, dim=-1)
f1.add_batch(predictions=predictions, references=batch["labels"])
acc.add_batch(predictions=predictions, references=batch["labels"])
progress_bar.update(1)
av_epoch_loss /= nsteps
print(f"Training Loss: {av_epoch_loss: .2f}")
acc_res = acc.compute()["accuracy"]
print(f"Training Accuracy: {acc_res:.2f}")
f_res = f1.compute()["f1"]
print(f"Training F1-score: {f_res:.2f}")
model.eval()
val_acc = validate(model)
if val_acc > best_val_acc:
print("Achieved best validation accuracy so far. Saving model.")
best_val_acc = val_acc
best_model_state = deepcopy(model.state_dict())
print("\n\n")
I looked in pytorch documentation but i couldn't figure it out.

Accuracy improves through Epochs but returns to initial accuracy for Evaluation

I am working on a project to recreate the results of a study using Neural Networks on EEG datasets. Throughout my time working on the project, I have been having repeated issues where the model has some accuracy improvements throughout the epochs, but for the evaluation, the accuracy always returns to the initial amount, specifically 1/NUM_CLASSES. Where NUM_CLASSES is the number of classification categories. I am honestly stuck at this point, I believe the model is overfitting and have attempted to adjust my data preprocessing to compensate, but have had little luck.
The code is as follows:
# Filters out warnings
import warnings
warnings.filterwarnings("ignore")
# Imports, as of 3/10, all are necessary
import numpy as np
import tensorflow as tf
from keras import layers
from keras import backend as K
from keras.models import Model
from keras.optimizers import Adam, SGD
from keras.callbacks import Callback
from keras.layers import Conv3D, Input, Dense, Activation, BatchNormalization, Flatten, Add, Softmax
from sklearn.model_selection import StratifiedKFold
from DonghyunMBCNN import MultiBranchCNN
# Global Variables
# The directory of the process data, must have been converted and cropped, reference dataProcessing.py and crop.py
DATA_DIR = "../datasets/BCICIV_2a_cropped/"
# Which trial subject will be trained
SUBJECT = 1
# The number of classification categories, for motor imagery, there are 4
NUM_CLASSES = 4
# The number of timesteps in each input array
TIMESTEPS = 240
# The X-Dimension of the dataset
XDIM = 7
# The Y-Dimension of the dataset
YDIM = 6
# The delta loss requirement for lower training rate
LOSS_THRESHOLD = 0.01
# Initial learning rate for ADAM optimizer
INIT_LR = 0.01
# Define Which NLL (Negative Log Likelihood) Loss function to use, either "NLL1", "NLL2", or "SCCE"
LOSS_FUNCTION = 'NLL2'
# Defines which optimizer is in use, either "ADAM" or "SGD"
OPTIMIZER = 'SGD'
# Whether training output should be given
VERBOSE = 1
# Determines whether K-Fold Cross Validation is used
USE_KFOLD = False
# Number of ksplit validation, must be atleast 2
KFOLD_NUM = 2
# Specifies which model structure will be used, '1' corresponds to the Create_Model function and '2' corresponds to Donghyun's model.
USE_STRUCTURE = '2'
# Number of epochs to train for
EPOCHS = 10
# Receptive field sizes
SRF_SIZE = (2, 2, 1)
MRF_SIZE = (2, 2, 3)
LRF_SIZE = (2, 2, 5)
# Strides for each receptive field
SRF_STRIDES = (2, 2, 1)
MRF_STRIDES = (2, 2, 2)
LRF_STRIDES = (2, 2, 4)
# This is meant to handle the reduction of the learning rate, current is not accurate, I have been unable to access the loss information from each Epoch
# The expectation is that if the delta loss is < threshold, learning rate *= 0.1. Threshold has not been set yet.
class LearningRateReducerCb(Callback):
def __init__(self):
self.history = {}
def on_epoch_end(self, epoch, logs={}):
for k, v in logs.items():
self.history.setdefault(k, []).append(v)
fin_index = len(self.history['loss']) - 1
if (fin_index >= 1):
if (self.history['loss'][fin_index-1] - self.history['loss'][fin_index] > LOSS_THRESHOLD):
old_lr = self.model.optimizer.lr.read_value()
new_lr = old_lr*0.1
print("\nEpoch: {}. Reducing Learning Rate from {} to {}".format(epoch, old_lr, new_lr))
self.model.optimizer.lr.assign(new_lr)
# The Negative Log Likelihood function
def Loss_FN1(y_true, y_pred, sample_weight=None):
return K.sum(K.binary_crossentropy(y_true, y_pred), axis=-1) # This is another loss function that I tried, was less effective
# Second NLL function, generally seems to work better
def Loss_FN2(y_true, y_pred, sample_weight=None):
n_dims = int(int(y_pred.shape[1])/2)
mu = y_pred[:, 0:n_dims]
logsigma = y_pred[:, n_dims:]
mse = -0.5*K.sum(K.square((y_true-mu)/K.exp(logsigma)), axis=1)
sigma_trace = -K.sum(logsigma, axis=1)
log2pi = -0.5*n_dims*np.log(2*np.pi)
log_likelihood = mse+sigma_trace+log2pi
return K.mean(-log_likelihood)
# Loads given data into two arrays, x and y, while also ensuring that all values are formatted as float32s
def load_data(data_dir, num):
x = np.load(data_dir + "A0" + str(num) + "TD_cropped.npy").astype(np.float32)
y = np.load(data_dir + "A0" + str(num) + "TK_cropped.npy").astype(np.float32)
return x, y
def create_receptive_field(size, strides, model, name):
modelRF = Conv3D(kernel_size = size, strides=strides, filters=32, padding='same', name=name+'1')(model)
modelRF1 = BatchNormalization()(modelRF)
modelRF2 = Activation('elu')(modelRF1)
modelRF3 = Conv3D(kernel_size = size, strides=strides, filters=64, padding='same', name=name+'2')(modelRF2)
modelRF4 = BatchNormalization()(modelRF3)
modelRF5 = Activation('elu')(modelRF4)
modelRF6 = Flatten()(modelRF5)
modelRF7 = Dense(32)(modelRF6)
modelRF8 = BatchNormalization()(modelRF7)
modelRF9 = Activation('relu')(modelRF8)
modelRF10 = Dense(32)(modelRF9)
modelRF11 = BatchNormalization()(modelRF10)
modelRF12 = Activation('relu')(modelRF11)
return Dense(NUM_CLASSES, activation='softmax')(modelRF12)
def Create_Model():
# Model Creation
model1 = Input(shape=(1, XDIM, YDIM, TIMESTEPS))
# 1st Convolution Layer
model1a = Conv3D(kernel_size = (3, 3, 5), strides = (2, 2, 4), filters=16, name="Conv1")(model1)
model1b = BatchNormalization()(model1a)
model1c = Activation('elu')(model1b)
# Small Receptive Field (SRF)
modelSRF = create_receptive_field(SRF_SIZE, SRF_STRIDES, model1c, 'SRF')
# Medium Receptive Field (MRF)
modelMRF = create_receptive_field(MRF_SIZE, MRF_STRIDES, model1c, 'MRF')
# Large Receptive Field (LRF)
modelLRF = create_receptive_field(LRF_SIZE, LRF_STRIDES, model1c, 'LRF')
# Add the layers - This sums each layer
final = Add()([modelSRF, modelMRF, modelLRF])
out = Softmax()(final)
model = Model(inputs=model1, outputs=out)
return model
if (LOSS_FUNCTION == 'NLL1'):
loss_function = Loss_FN1
elif (LOSS_FUNCTION == 'NLL2'):
loss_function = Loss_FN2
elif (LOSS_FUNCTION == 'SCCE'):
loss_function = 'sparse_categorical_crossentropy'
# Optimizer is given as ADAM with an initial learning rate of 0.01
if (OPTIMIZER == 'ADAM'):
opt = Adam(learning_rate = INIT_LR)
elif (OPTIMIZER == 'SGD'):
opt = SGD(learning_rate = INIT_LR)
X, Y = load_data(DATA_DIR, SUBJECT)
if (USE_KFOLD):
seed = 4
kfold = StratifiedKFold(n_splits=KFOLD_NUM, shuffle=True, random_state=seed)
cvscores = []
for train, test in kfold.split(X, Y):
if (USE_STRUCTURE == '1'):
MRF_model = Create_Model()
elif (USE_STRUCTURE == '2'):
MRF_model = MultiBranchCNN(TIMESTEPS, YDIM, XDIM, NUM_CLASSES)
# Compiling the model with the negative log likelihood loss function, ADAM optimizer
MRF_model.compile(loss=loss_function, optimizer=opt, metrics=['accuracy'])
# Training for 30 epochs
MRF_model.fit(X[train], Y[train], epochs=30, verbose=VERBOSE)
# Evaluating the effectiveness of the model
scores = MRF_model.evaluate(X[test], Y[test], verbose=VERBOSE)
print("%s: %.2f%%" % (MRF_model.metrics_names[1], scores[1]*100))
cvscores.append(scores[1]*100)
print("%.2f%% (+/- %.2f%%)" % (np.mean(cvscores), np.std(cvscores)))
else:
if (USE_STRUCTURE == '1'):
MRF_model = Create_Model()
elif (USE_STRUCTURE == '2'):
MRF_model = MultiBranchCNN(TIMESTEPS, YDIM, XDIM, NUM_CLASSES)
MRF_model.compile(loss=loss_function, optimizer=opt, metrics=['accuracy'])
MRF_model.fit(X, Y, epochs=EPOCHS, verbose=VERBOSE)
_, acc = MRF_model.evaluate(X, Y, verbose=VERBOSE)
print("Accuracy: %.2f" % (acc*100))
The data is from the BCICIV 2A dataset, which consists of 25 channels. The 3 EOG channels are ignored, leaving 22 channels. These 22 channels are formatted into a 7x6 - 0 padded array to offer a more spatially relevant representation. We are using the sliding window method to compensate for a small dataset and then also are running a channel-wise average on each trial to
further process the data. The training results are as follows.
Epoch 1/10
666/666 [==============================] - 13s 17ms/step - loss: 4.0290 - accuracy: 0.3236
Epoch 2/10
666/666 [==============================] - 12s 18ms/step - loss: 3.9622 - accuracy: 0.3434
Epoch 3/10
666/666 [==============================] - 14s 21ms/step - loss: 3.9747 - accuracy: 0.3481
Epoch 4/10
666/666 [==============================] - 14s 21ms/step - loss: 3.9373 - accuracy: 0.3720
Epoch 5/10
666/666 [==============================] - 14s 21ms/step - loss: 3.9412 - accuracy: 0.3710
Epoch 6/10
666/666 [==============================] - 14s 21ms/step - loss: 3.9191 - accuracy: 0.3829
Epoch 7/10
666/666 [==============================] - 14s 21ms/step - loss: 3.9234 - accuracy: 0.3936
Epoch 8/10
666/666 [==============================] - 14s 21ms/step - loss: 3.8973 - accuracy: 0.3983
Epoch 9/10
666/666 [==============================] - 14s 21ms/step - loss: 3.8780 - accuracy: 0.4022
Epoch 10/10
666/666 [==============================] - 14s 21ms/step - loss: 3.8647 - accuracy: 0.3900
666/666 [==============================] - 5s 8ms/step - loss: 4.1935 - accuracy: 0.2500
Accuracy: 25.00
Disregarding the poor accuracy, the fact that the accuracy falls to 25.00 after training is concerning. I half feel like I am missing something simple, but have been unable to resolve the issue.
Any advice or questions are welcome, thank you very much!

I can think of two potential reasons for the difference you are observing, but I don't have the time now to test them:
Both optimizers you are using, SGD and Adam, use a subset of the rows for training, not the entire dataset. This leads to the inconsistency you are observing.
BatchNorm works differently during train time and evaluation time.
Both cases point in the same direction: the evaluation of the accuracy and loss during training time is an aggregated estimate of per-batch results and in this scenario are being overly optimistic.
For testing 1, you can try setting a batch_size in the fit to len(X). Note you might run out of memory and it will definitely be slow (it might be prohibitively slow).
For testing 2, you can try removing the BatchNorm steps.
Keep me posted if you work on these lines of thought!

Tensorflow CNN for NLP won't converge

I was trying to create a Neural Network based on the model presented on this paper by Yoon Kim (https://arxiv.org/pdf/1408.5882.pdf) for sentence classification. I've built it in TensorFlow Keras and was using padded sentences (with words lemmatized) as input and 3 categories ("positive", "neutral" or "negative") as output.
Below is the model I've built:
def create_CNN_model(window_sizes, feature_maps, sent_size, num_categs, embedding_matrix:np.array):
inputs = Input(shape=(sent_size), dtype='float32', name='text_inputs') # dim = (BATCH_SIZE, sent_size, embedding_dim)
# initialize the embeddings with my own embeddings matrix
embed = Embedding(embedding_matrix.shape[0], embedding_matrix.shape[1],
mask_zero=True, input_length=sent_size,
weights=[embedding_matrix])(inputs)
#create array for max pooled vectors of features
ta = []
# as we have multiple window sizes:
for n_window in window_sizes:
con = Conv1D(feature_maps, n_window, padding='causal',
activation="relu", use_bias=True)(embed) # (BATCH_SIZE, sent_size-window_size+1, feature_maps)
# the convoluted tensor contains, for each window a feature map of dimension feature_maps
pooled = GlobalMaxPool1D(data_format='channels_last')(con) # (BATCH_SIZE, sent_size-windows_size+1)
# then, the max pooling operation extracts the maximum of each feature map, reducing the rank of the tensor
# the max pooled tensor contains a feature for each window
ta.append(pooled)
concat = concatenate(ta, axis=1)
dropped = Dropout(0.5)(concat)
outputs = Dense(num_categs,activation="softmax",use_bias=True, kernel_regularizer=l2(l=3),
kernel_constraint=Dropout(0.5))(dropped)
# create the model
model = Model(inputs=[inputs], outputs=[outputs])
#return the model
return model
I've tried training this model with just 200 sentences just to see if it overfits the data. But instead of overfitting, the loss value just goes up and down between 0 and 1. I've tried changing the learning rate for a value as small as 1e-8, but it did nothing.
Below is the function I've used for training:
def train_model(X_data, y_data, batch_sz, tf_model, max_patience, num_epochs, ln_rate):
# Instantiate an optimizer to train the model.
# optimizer = Adadelta(learning_rate=1e-3)
optimizer = Adam(learning_rate=ln_rate)
# Instantiate a loss function.
loss_fn = CategoricalCrossentropy()
# Prepare the metrics
train_acc_metric = CategoricalAccuracy()
val_acc_metric = CategoricalAccuracy()
buffer_sz = len(X_data)
patience = 0
epochs = num_epochs
last_val_acc = 0
# Start random state for better reprodutibility
np.random.seed(123)
# Create the checkpoints
ckpt = train.Checkpoint(step=tf.Variable(1), optimizer=optimizer,
model=tf_model)
manager = train.CheckpointManager(ckpt, './tf_ckpts', max_to_keep=3)
# Create directory to save the trained model
path = "./saved_model"
print("\n----------------------------------------------")
if not os.path.isdir(path):
try:
os.mkdir(path)
except OSError:
print ("\nCreation of the directory %s failed \n" % path)
else:
print ("\nSuccessfully created the directory %s \n" % path)
else:
print("\nDirectory %s already exists" % path)
print("\n----------------------------------------------")
print("\nStarting run script...\n",
"Model will be saved to ", path,"\n",
"Checkpoints will be restored from and saved to .\tf_ckpts")
# Save model prior to training
tf_model.save("./saved_model/tf_model")
# Restart from last checkpoint, if available
ckpt.restore(manager.latest_checkpoint)
print("\n----------------------------------------------")
if manager.latest_checkpoint:
print("\nRestored from {}".format(manager.latest_checkpoint))
else:
print("\nInitializing from scratch.")
# beggining training loop
for epoch in range(epochs):
print("\n----------------------------------------------")
print('Start of epoch %d' % (epoch,))
# re-shuffle data before each epoch
np.random.shuffle(X_data)
np.random.shuffle(y_data)
# create the training dataset with 10-fold crossvalidation
train_dataset = make_dataset(X_data,y_data,10)
# Iterate over the batches of the dataset.
for x_train, y_train, x_val, y_val in train_dataset:
train_batches = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_batches = train_batches.batch(batch_sz)
for x_batch_train, y_batch_train in train_batches:
with tf.GradientTape() as tape:
# calculate the forward run
logits = tf_model(x_batch_train)
# assert if output and true label tensor shapes are equal
get_shape = y_batch_train.shape
tf.debugging.assert_shapes([
(logits,get_shape),
], data=(y_batch_train, logits),
summarize=3, message="Inconsistent shape (labels,output): ",
name="assert_shapes")
# calculate loss function
loss_value = loss_fn(y_batch_train, logits)
# add 1 step to the stpes variable
ckpt.step.assign_add(1)
# Add extra losses created during this forward pass:
loss_value += sum(tf_model.losses)
# calculate gradients
grads = tape.gradient(loss_value, tf_model.trainable_weights)
# backpropagate the gradients
optimizer.apply_gradients(zip(grads, tf_model.trainable_weights))
# Update training metric.
train_acc_metric(y_batch_train, logits)
# Save & log every 500 batches.
if int(ckpt.step) % 100 == 0:
save_path = manager.save()
print("Saved checkpoint for step {}: {}".format(int(ckpt.step), save_path))
print("loss {:1.2f}".format(loss_value))
print('Seen so far: %s samples' % (int(ckpt.step) * batch_sz))
# Run a cross-validation loop on each 10-fold dataset
val_logits = tf_model(x_val)
# Update val metrics
val_acc_metric(y_val, val_logits)
# Display metrics at the end of each epoch.
train_acc = train_acc_metric.result()
print('Training accuracy: ', float(train_acc))
# Reset training metrics at the end of each epoch
train_acc_metric.reset_states()
print("----------")
val_acc = val_acc_metric.result()
print('Validation accuracy: ', float(val_acc))
print("----------------------------------------------\n")
val_acc_metric.reset_states()
# Early stopping part
if val_acc < last_val_acc:
# If the max_patience is exceeded stop the training
if patience >= max_patience:
print("\n------------------------------------------------")
print("Early stopping training to prevent over-fitting!")
print("------------------------------------------------\n")
break
else:
patience += 1
# update the validation accuracy
last_val_acc = val_acc
# save the trained model
tf_model.save("./saved_model/tf_model")
print("\n------------------------------------------------")
print("\nEnd of Training!\n")
And the result of the training:
----------------------------------------------
Successfully created the directory ./saved_model
----------------------------------------------
Starting run script...
Model will be saved to ./saved_model
Checkpoints will be restored from and saved to . f_ckpts
INFO:tensorflow:Assets written to: ./saved_model/tf_model/assets
----------------------------------------------
Initializing from scratch.
----------------------------------------------
Start of epoch 0
Training accuracy: 0.38999998569488525
----------
Validation accuracy: 0.38999998569488525
----------------------------------------------
----------------------------------------------
Start of epoch 1
Saved checkpoint for step 100: ./tf_ckpts/ckpt-1
loss 1.05
Seen so far: 2000 samples
Training accuracy: 0.4050000011920929
----------
Validation accuracy: 0.4050000011920929
----------------------------------------------
----------------------------------------------
Start of epoch 2
Saved checkpoint for step 200: ./tf_ckpts/ckpt-2
loss 1.10
Seen so far: 4000 samples
Training accuracy: 0.36000001430511475
----------
Validation accuracy: 0.36000001430511475
----------------------------------------------
----------------------------------------------
Start of epoch 3
Saved checkpoint for step 300: ./tf_ckpts/ckpt-3
loss 1.15
Seen so far: 6000 samples
Training accuracy: 0.375
----------
Validation accuracy: 0.375
----------------------------------------------
----------------------------------------------
Start of epoch 4
Saved checkpoint for step 400: ./tf_ckpts/ckpt-4
loss 1.17
Seen so far: 8000 samples
Training accuracy: 0.38999998569488525
----------
Validation accuracy: 0.38999998569488525
----------------------------------------------
----------------------------------------------
Start of epoch 5
Saved checkpoint for step 500: ./tf_ckpts/ckpt-5
loss 1.18
Seen so far: 10000 samples
Training accuracy: 0.3799999952316284
----------
Validation accuracy: 0.3799999952316284
----------------------------------------------
----------------------------------------------
Start of epoch 6
Saved checkpoint for step 600: ./tf_ckpts/ckpt-6
loss 1.09
Seen so far: 12000 samples
Training accuracy: 0.35499998927116394
----------
Validation accuracy: 0.35499998927116394
----------------------------------------------
----------------------------------------------
Start of epoch 7
Saved checkpoint for step 700: ./tf_ckpts/ckpt-7
loss 1.12
Seen so far: 14000 samples
Training accuracy: 0.3700000047683716
----------
Validation accuracy: 0.3700000047683716
----------------------------------------------
Any sugestions on how to make it converge?

mxnet training loss never changes but accuracy oscillates

I am using mxnet to train a VQA model, the input is (6244,) vector and the output is a single label
During my epoch, the loss never change but the accuracy is oscillating in a small range, the first 5 epochs are
Epoch 1. Loss: 2.7262569132562255, Train_acc 0.06867348986554285
Epoch 2. Loss: 2.7262569132562255, Train_acc 0.06955649207304837
Epoch 3. Loss: 2.7262569132562255, Train_acc 0.06853301224162152
Epoch 4. Loss: 2.7262569132562255, Train_acc 0.06799116997792494
Epoch 5. Loss: 2.7262569132562255, Train_acc 0.06887417218543046
This is a multi-class classification problem, with each answer label stands for a class, so I use softmax as final layer and cross-entropy to evaluate the loss, the code of them are as follows
So why the loss never change?... I just directly get if from cross_entropy
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.01})
loss = gluon.loss.SoftmaxCrossEntropyLoss()
epochs = 10
moving_loss = 0.
best_eva = 0
for e in range(epochs):
for i, batch in enumerate(data_train):
data1 = batch.data[0].as_in_context(ctx)
data2 = batch.data[1].as_in_context(ctx)
data = [data1, data2]
label = batch.label[0].as_in_context(ctx)
with autograd.record():
output = net(data)
cross_entropy = loss(output, label)
cross_entropy.backward()
trainer.step(data[0].shape[0])
moving_loss = np.mean(cross_entropy.asnumpy()[0])
train_accuracy = evaluate_accuracy(data_train, net)
print("Epoch %s. Loss: %s, Train_acc %s" % (e, moving_loss, train_accuracy))
The eval function is as follows
def evaluate_accuracy(data_iterator, net, ctx=mx.cpu()):
numerator = 0.
denominator = 0.
metric = mx.metric.Accuracy()
data_iterator.reset()
for i, batch in enumerate(data_iterator):
with autograd.record():
data1 = batch.data[0].as_in_context(ctx)
data2 = batch.data[1].as_in_context(ctx)
data = [data1, data2]
label = batch.label[0].as_in_context(ctx)
output = net(data)
metric.update([label], [output])
return metric.get()[1]

Question asked and answered on the mxnet discussion forum here. No need use the autograd.record scope to record computational graph when computing the accuracy. Try instead:
def evaluate_accuracy(data_iterator, net, ctx=mx.cpu()):
metric = mx.metric.Accuracy()
data_iterator.reset()
for i, batch in enumerate(data_iterator):
data1 = batch.data[0].as_in_context(ctx)
data2 = batch.data[1].as_in_context(ctx)
data = [data1, data2]
label = batch.label[0].as_in_context(ctx)
output = net(data)
metric.update([label], [output])
return metric.get()[1]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Low accuracy binary classification with Pytorch - python

Related

how do I get to plot epoch vs accuracy and epoch vs avg loss

how to set a threshold in pytorch

Accuracy improves through Epochs but returns to initial accuracy for Evaluation

Tensorflow CNN for NLP won't converge

mxnet training loss never changes but accuracy oscillates

Categories

Resources