Making predictions on new images using a CNN in pytorch

Making predictions on new images using a CNN in pytorch - python

I'm new in pytorch, and i have been stuck for a while on this problem. I have trained a CNN for classifying X-ray images. The images can be found in this Kaggle page https://www.kaggle.com/prashant268/chest-xray-covid19-pneumonia/ .
I managed to get good accuracy both on training and test data, but when i try to make predictions on new images i get the same (wrong class) output for every image. Here's my model in detail.
import os
import matplotlib.pyplot as plt
import numpy as np
import torch
import glob
import torch.nn.functional as F
import torch.nn as nn
from torchvision.transforms import transforms
from torch.utils.data import DataLoader
from torch.optim import Adam
from torch.autograd import Variable
import torchvision
import pathlib
from google.colab import drive
drive.mount('/content/drive')
epochs = 20
batch_size = 128
learning_rate = 0.001
#Data Transformation
transformer = transforms.Compose([
transforms.Resize((224,224)),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize([0.5,0.5,0.5], [0.5,0.5,0.5])
])
#Load data with DataLoader
train_path = '/content/drive/MyDrive/Chest X-ray (Covid-19 & Pneumonia)/Data/train'
test_path = '/content/drive/MyDrive/Chest X-ray (Covid-19 & Pneumonia)/Data/test'
train_loader = DataLoader(torchvision.datasets.ImageFolder(train_path,transform = transformer), batch_size= batch_size, shuffle= True)
test_loader = DataLoader(torchvision.datasets.ImageFolder(test_path,transform = transformer), batch_size= batch_size, shuffle= False)
root = pathlib.Path(train_path)
classes = sorted([j.name.split('/')[-1] for j in root.iterdir()])
print(classes)
train_count = len(glob.glob(train_path+'/**/*.jpg')) + len(glob.glob(train_path+'/**/*.png')) + len(glob.glob(train_path+'/**/*.jpeg'))
test_count = len(glob.glob(test_path+'/**/*.jpg')) + len(glob.glob(test_path+'/**/*.png')) + len(glob.glob(test_path+'/**/*.jpeg'))
print(train_count,test_count)
#Create the CNN
class CNN(nn.Module):
def __init__(self):
super(CNN,self).__init__()
'''nout = [(width + 2*padding - kernel_size) / stride] + 1 '''
# [128,3,224,224]
self.conv1 = nn.Conv2d(in_channels = 3, out_channels = 12, kernel_size = 5)
# [4,12,220,220]
self.pool1 = nn.MaxPool2d(2,2) #reduces the images by a factor of 2
# [4,12,110,110]
self.conv2 = nn.Conv2d(in_channels = 12, out_channels = 24, kernel_size = 5)
# [4,24,106,106]
self.pool2 = nn.MaxPool2d(2,2)
# [4,24,53,53] which becomes the input of the fully connected layer
self.fc1 = nn.Linear(in_features = (24 * 53 * 53), out_features = 120)
self.fc2 = nn.Linear(in_features = 120, out_features = 84)
self.fc3 = nn.Linear(in_features = 84, out_features = len(classes)) #final layer, output will be the number of classes
def forward(self, x):
x = self.pool1(F.relu(self.conv1(x)))
x = self.pool2(F.relu(self.conv2(x)))
x = x.view(-1, 24 * 53 * 53)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
# Training the model
model = CNN()
loss_function = nn.CrossEntropyLoss() #includes the softmax activation function
optimizer = torch.optim.Adam(model.parameters(), lr = learning_rate)
n_total_steps = len(train_loader)
for epoch in range(epochs):
n_correct = 0
n_samples = 0
for i, (images, labels) in enumerate(train_loader):
# Forward pass
outputs = model(images)
_, predicted = torch.max(outputs, 1)
n_samples += labels.size(0)
n_correct += (predicted == labels).sum().item()
loss = loss_function(outputs, labels)
# Backpropagation and optimization
optimizer.zero_grad() #empty gradients
loss.backward()
optimizer.step()
acc = 100.0 * n_correct / n_samples
print(f'Epoch [{epoch+1}/{epochs}], Step [{i+1}/{n_total_steps}], Accuracy: {round(acc,2)} %, Loss: {loss.item():.4f}')
print('Done!!')
# Testing the model
with torch.no_grad():
n_correct = 0
n_samples = 0
n_class_correct = [0 for i in range(3)]
n_class_samples = [0 for i in range(3)]
for images, labels in test_loader:
outputs = model(images)
# max returns (value ,index)
_, predicted = torch.max(outputs, 1)
n_samples += labels.size(0)
n_correct += (predicted == labels).sum().item()
acc = 100.0 * n_correct / n_samples
print(f'Accuracy of the network: {acc} %')
torch.save(model.state_dict(),'/content/drive/MyDrive/Chest X-ray (Covid-19 & Pneumonia)/model.model')
For loading the model and trying to make predictions on new images, the code is as follows:
checkpoint = torch.load('/content/drive/MyDrive/Chest X-ray (Covid-19 & Pneumonia)/model.model')
model = CNN()
model.load_state_dict(checkpoint)
model.eval()
#Data Transformation
transformer = transforms.Compose([
transforms.Resize((224,224)),
transforms.ToTensor(),
transforms.Normalize([0.5,0.5,0.5], [0.5,0.5,0.5])
])
#Making preidctions on new data
from PIL import Image
def prediction(img_path,transformer):
image = Image.open(img_path).convert('RGB')
image_tensor = transformer(image)
image_tensor = image_tensor.unsqueeze_(0) #so img is not treated as a batch
input_img = Variable(image_tensor)
output = model(input_img)
#print(output)
index = output.data.numpy().argmax()
pred = classes[index]
return pred
pred_path = '/content/drive/MyDrive/Chest X-ray (Covid-19 & Pneumonia)/Test_images/Data/'
test_imgs = glob.glob(pred_path+'/*')
for i in test_imgs:
print(prediction(i,transformer))
I'm guessing the problem must be in the way that i am preprocessing the data, although i cannot find my mistake. Any help will be deeply appreciated, since i have been stuck on this for a while now.
p.s. i can share my notebook as well, if it is of any help

Regarding your problem, I have a really good way to debug this to target where the problem most likely will be and so it will be really easy to fix your issue.
So, my debugging process would be based on the fact that your CNN performs well on the test set. Firstly set your test loader batch size to 1 temporarily. After that, One thing to do is in your test loop when you calculate the amount correct, you can run the following code:
#Your code
outputs = model(images) # Really only one image and 1 output.
#Altered Code:
correct = (predicted == labels).sum().item() # This will be either 1 or 0 since you have only one image per batch
# My new code:
if correct:
# if value is 1 instead of 0 then turn value into a single image with no batch size
single_correct_image = images.squeeze(0)
# Then convert tensor image into PIL image
pil_image = transforms.ToPILImage()(single_correct_image)
# Save the pil image to any directory specified in quotes.
pil_image = pil_image.save("/content")
#Terminate testing process. Ignore Value Error if it says terminating process
raise ValueError("terminating process")
Now you have an image saved to disk that you know is correct in the test set. The next step would be to open such image and run it to your predict function. Couple of things can happen and thus give info about your situation
If your model returns the wrong answer then there is something wrong with the different code you have within the prediction and testing code. One uses a torch.sum and torch.max the other uses np.argmax.Then you can use print statements to debug what is going on there. Perhaps some conversion error or your expectation of the output's format is different.
If your code return the right answer then your model is just failing to predict on new images. I suggest running more trial cases with the above process.
For additional reference, if you still get very stuck to the point where you feel like you can't solve it, then I suggest using this notebook to guide and give some suggestions on what code to atleast inspect.
https://www.kaggle.com/salvation23/xray-cnn-pytorch
Sarthak Jain

Related

Loss & accuracy don't improve in Xception (image classification)

As a trial, I'm implementing Xception to classify images without using pretrained weight in Tensorflow.
However, the accuracy are too low compared to the original paper.
Could somebody share any advice to address this problem?
I prepared 500 out of 1000 classes from ImageNet and train ready-Xception model with this data from scrach .
I tried the same learning rate and optimizer as used in the original paper.
– Optimizer: SGD
– Momentum: 0.9
– Initial learning rate: 0.045
– Learning rate decay: decay of rate 0.94 every 2 epochs
However, this did not work so well.
I know it is better to use all of 1000 classes rather than only 500, however, I couldn't prepare storage for it.
Did it affect the performance of my code?
Here is my code.
import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras import layers, losses, models, optimizers, callbacks, applications, preprocessing
# scheduler
def scheduler(epoch, lr):
return 0.045*0.94**(epoch/2.0)
lr_decay = callbacks.LearningRateScheduler(scheduler)
# early stopping
EarlyStopping = callbacks.EarlyStopping(monitor='val_loss', min_delta=0, patience=500, verbose=0, mode='auto', restore_best_weights=True)
# build xception
inputs = tf.keras.Input(shape=(224, 224, 3))
x = tf.cast(inputs, tf.float32)
x = tf.keras.applications.xception.preprocess_input(x) #preprocess image
x = applications.xception.Xception(weights=None, include_top=False,)(x, training=True)
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(nb_class)(x)
outputs = layers.Softmax()(x)
model = tf.keras.Model(inputs, outputs)
model.compile(optimizer=optimizers.SGD(momentum=0.9, nesterov=True),
loss = 'categorical_crossentropy',
metrics= ['accuracy'])
# fitting data
history = model.fit(image_gen(df_train_chunk, 224, 224, ), #feed images with a generator
batch_size = 32,
steps_per_epoch = 64,
epochs=1000000000,
validation_data = image_gen(df_valid_chunk, 224, 224, ), #feed images with a generator
validation_steps = 64,
callbacks = [lr_decay, EarlyStopping],
)
My results are below. In the original paper, its accuracy reached around 0.8.
In contrast, the performance of my code is too poor.
P.S.
Some might wonder if my generator got wrong, so I put my generator code and result below.
from PIL import Image, ImageEnhance, ImageOps
def image_gen(df_data, h, w, shuffle=True):
nb_class = len(np.unique(df_data['Class']))
while True:
if shuffle:
df_data = df_data.sample(frac=1)
for i in range(len(df_data)):
X = Image.open((df_data.iloc[i]).loc['Path'])
X = X.convert('RGB')
X = X.resize((w,h))
X = preprocessing.image.img_to_array(X)
X = np.expand_dims(X, axis=0)
klass = (df_data.iloc[i]).loc['Class']
y = np.zeros(nb_class)
y[klass] = 1
y = np.expand_dims(y, axis=0)
yield X, y
train_gen = image_gen(df_train_chunk, 224, 224, )
for i in range(5):
X, y = next(train_gen)
print('\n\n class: ', y.argmax(-1))
display(Image.fromarray(X.squeeze(0).astype(np.uint8)))
the result is below.

When you chose only 500 labels, do you choose the first 500?
softmax output starting from 0, so make sure your labels staring from 0 to 499 either.

RuntimeError: mat1 dim 1 must match mat2 dim 0

I am still grappling with PyTorch, having played with Keras for a while (which feels a lot more intuitive).
Anyway - I have the nn.linear model code below, which works fine for just one input feature, where:
inputDim = 1
I am now trying to expand the same code to include 2 features, and so I have included another column in my feature dataframe and also set:
inputDim = 2
However, when I run the code, I get the dreaded error:
RuntimeError: mat1 dim 1 must match mat2 dim 0
This error references line 63, which is:
outputs = model(inputs)
I have gone through several other posts here relating to this dimensionality error, but I still can't see what is wrong with my code. Any help would be appreciated.
The full code looks like this:
import numpy as np
import pandas as pd
import torch
from torch.autograd import Variable
import matplotlib.pyplot as plt
device = 'cuda' if torch.cuda.is_available() else 'cpu'
df = pd.read_csv('Adjusted Close - BAC-UBS-WFC.csv')
x = df[['BAC', 'UBS']]
y = df['WFC']
# number_of_features = x.shape[1]
# print(number_of_features)
x_train = np.array(x, dtype=np.float32)
x_train = x_train.reshape(-1, 1)
y_train = np.array(y, dtype=np.float32)
y_train = y_train.reshape(-1, 1)
class linearRegression(torch.nn.Module):
def __init__(self, inputSize, outputSize):
super(linearRegression, self).__init__()
self.linear = torch.nn.Linear(inputSize, outputSize)
def forward(self, x):
out = self.linear(x)
return out
inputDim = 2
outputDim = 1
learningRate = 0.01
epochs = 500
# Model instantiation
torch.manual_seed(42)
model = linearRegression(inputDim, outputDim)
if torch.cuda.is_available(): model.cuda()
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learningRate)
# Model training
loss_series = []
for epoch in range(epochs):
# Converting inputs and labels to Variable
inputs = Variable(torch.from_numpy(x_train).cuda())
labels = Variable(torch.from_numpy(y_train).cuda())
# Clear gradient buffers because we don't want any gradient from previous epoch to carry forward, dont want to cummulate gradients
optimizer.zero_grad()
# get output from the model, given the inputs
outputs = model(inputs)
# get loss for the predicted output
loss = criterion(outputs, labels)
loss_series.append(loss.item())
print(loss)
# get gradients w.r.t to parameters
loss.backward()
# update parameters
optimizer.step()
print('epoch {}, loss {}'.format(epoch, loss.item()))
# Calculate predictions on training data
with torch.no_grad(): # we don't need gradients in the testing phase
predicted = model(Variable(torch.from_numpy(x_train).cuda())).cpu().data.numpy()

General advice: For errors with dimension, it usually helps to print out dimensions at each step of the computation.
Most likely in this specific case, you have made mistake in reshaping the input with this x_train = x_train.reshape(-1, 1)
Your input is (N,1) but NN expects (N,2).

I want to use the GPU instead of CPU while performing computations using PyTorch

I'm trying to switch the stress from CPU to GPU as my trusty RTX2070 can do it better than the CPU but I keep running into this problem and I'm quite new to AI so if you are kind enough to share some insights with me regarding any potential solution, it would be highly appreciated, thank you.
**I'm using PyTorch
Here's the code that I'm using :
# to measure run-time
# for csv dataset
import os
# to shuffle data
import random
# to get the alphabet
import string
# import statements for iterating over csv file
import cv2
# for plotting
import matplotlib.pyplot as plt
import numpy as np
# pytorch stuff
import torch
import torch.nn as nn
from PIL import Image
# generate the targets
# the targets are one hot encoding vectors
# print(torch.cuda.is_available())
nvcc_args = [
'-gencode', 'arch=compute_30,code=sm_30',
'-gencode', 'arch=compute_35,code=sm_35',
'-gencode', 'arch=compute_37,code=sm_37',
'-gencode', 'arch=compute_50,code=sm_50',
'-gencode', 'arch=compute_52,code=sm_52',
'-gencode', 'arch=compute_60,code=sm_60',
'-gencode', 'arch=compute_61,code=sm_61',
'-gencode', 'arch=compute_70,code=sm_70',
'-gencode', 'arch=compute_75,code=sm_75'
]
alphabet = list(string.ascii_lowercase)
target = {}
# Initalize a target dict that has letters as its keys and empty one-hot encoding vectors of size 37 as its values
for letter in alphabet:
target[letter] = [0] * 37
# Do the one-hot encoding for each letter now
curr_pos = 0
for curr_letter in target.keys():
target[curr_letter][curr_pos] = 1
curr_pos += 1
# extra symbols
symbols = ["space", "number", "period", "comma", "colon", "apostrophe", "hyphen", "semicolon", "question",
"exclamation", "capitalize"]
# create vectors
for curr_symbol in symbols:
target[curr_symbol] = [0] * 37
# create one-hot encoding vectors
for curr_symbol in symbols:
target[curr_symbol][curr_pos] = 1
curr_pos += 1
# collect all data from the csv file
data = []
for tgt in os.listdir("dataset"):
if not tgt == ".DS_Store":
for folder in os.listdir("dataset/" + tgt + "/Uploaded"):
if not folder == ".DS_Store":
for filename in os.listdir("dataset/" + tgt + "/Uploaded/" + folder):
if not filename == ".DS_Store":
# store the image and label
picture = []
curr_target = target[tgt]
image = Image.open("dataset/" + tgt + "/Uploaded/" + folder + "/" + filename)
image = image.convert('RGB')
# f.show()
image = np.array(image)
# resize image to 28x28x3
image = cv2.resize(image, (28, 28))
# normalize to 0-1
image = image.astype(np.float32) / 255.0
image = torch.from_numpy(image)
picture.append(image)
# convert the target to a long tensor
curr_target = torch.Tensor([curr_target])
picture.append(curr_target)
# append the current image & target
data.append(picture)
# create a dictionary of all the characters
characters = alphabet + symbols
index2char = {}
number = 0
for char in characters:
index2char[number] = char
number += 1
# find the number of each character in a dataset
def num_chars(dataset, index2char):
chars = {}
for _, label in dataset:
char = index2char[int(torch.argmax(label))]
# update
if char in chars:
chars[char] += 1
# initialize
else:
chars[char] = 1
return chars
# Create dataloader objects
# shuffle all the data
random.shuffle(data)
# batch sizes for train, test, and validation
batch_size_train = 30
batch_size_test = 30
batch_size_validation = 30
# splitting data to get training, test, and validation sets
# change once get more data
# 1600 for train
train_dataset = data[:22000]
# test has 212
test_dataset = data[22000:24400]
# validation has 212
validation_dataset = data[24400:]
# create the dataloader objects
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=batch_size_train, shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=batch_size_test, shuffle=False)
validation_loader = torch.utils.data.DataLoader(dataset=validation_dataset, batch_size=batch_size_validation,
shuffle=True)
# to check if a dataset is missing a char
test_chars = num_chars(test_dataset, index2char)
num = 0
for char in characters:
if char in test_chars:
num += 1
else:
break
print(num)
class CNN(nn.Module):
def __init__(self):
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
super(CNN, self).__init__()
self.block1 = nn.Sequential(
# 3x28x28
nn.Conv2d(in_channels=3,
out_channels=16,
kernel_size=5,
stride=1,
padding=2),
# batch normalization
# nn.BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True),
# 16x28x28
nn.MaxPool2d(kernel_size=2),
# 16x14x14
nn.LeakyReLU()
)
# 16x14x14
self.block2 = nn.Sequential(
nn.Conv2d(in_channels=16,
out_channels=32,
kernel_size=5,
stride=1,
padding=2),
# batch normalization
# nn.BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True),
# 32x14x14
nn.MaxPool2d(kernel_size=2),
# 32x7x7
nn.LeakyReLU()
)
# linearly
self.block3 = nn.Sequential(
nn.Linear(32 * 7 * 7, 100),
# batch normalization
# nn.BatchNorm1d(100),
nn.LeakyReLU(),
nn.Linear(100, 37)
)
# 1x37
def forward(self, x):
out = self.block1(x)
out = self.block2(out)
# flatten the dataset
out = out.view(-1, 32 * 7 * 7)
out = self.block3(out)
return out
# convolutional neural network model
model = CNN()
model.cuda()
# print summary of the neural network model to check if everything is fine.
print(model)
print("# parameter: ", sum([param.nelement() for param in model.parameters()]))
# setting the learning rate
learning_rate = 1e-4
# Using a variable to store the cross entropy method
criterion = nn.CrossEntropyLoss()
# Using a variable to store the optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
# list of all train_losses
train_losses = []
# list of all validation losses
validation_losses = []
# for loop that iterates over all the epochs
num_epochs = 20
for epoch in range(num_epochs):
# variables to store/keep track of the loss and number of iterations
train_loss = 0
num_iter_train = 0
# train the model
model.train()
# Iterate over train_loader
for i, (images, labels) in enumerate(train_loader):
# need to permute so that the images are of size 3x28x28
# essential to be able to feed images into the model
images = images.permute(0, 3, 1, 2)
# Zero the gradient buffer
# resets the gradient after each epoch so that the gradients don't add up
optimizer.zero_grad()
# Forward, get output
outputs = model(images)
# convert the labels from one hot encoding vectors into integer values
labels = labels.view(-1, 37)
y_true = torch.argmax(labels, 1)
# calculate training loss
loss = criterion(outputs, y_true)
# Backward (computes all the gradients)
loss.backward()
# Optimize
# loops through all parameters and updates weights by using the gradients
# takes steps backwards to optimize (to reach the minimum weight)
optimizer.step()
# update the training loss and number of iterations
train_loss += loss.data
num_iter_train += 1
print('Epoch: {}'.format(epoch + 1))
print('Training Loss: {:.4f}'.format(train_loss / num_iter_train))
# append training loss over all the epochs
train_losses.append(train_loss / num_iter_train)
# evaluate the model
model.eval()
# variables to store/keep track of the loss and number of iterations
validation_loss = 0
num_iter_validation = 0
# Iterate over validation_loader
for i, (images, labels) in enumerate(validation_loader):
# need to permute so that the images are of size 3x28x28
# essential to be able to feed images into the model
images = images.permute(0, 3, 1, 2)
# Forward, get output
outputs = model(images)
# convert the labels from one hot encoding vectors to integer values
labels = labels.view(-1, 37)
y_true = torch.argmax(labels, 1)
# calculate the validation loss
loss = criterion(outputs, y_true)
# update the training loss and number of iterations
validation_loss += loss.data
num_iter_validation += 1
print('Validation Loss: {:.4f}'.format(validation_loss / num_iter_validation))
# append all validation_losses over all the epochs
validation_losses.append(validation_loss / num_iter_validation)
num_iter_test = 0
correct = 0
# Iterate over test_loader
for images, labels in test_loader:
# need to permute so that the images are of size 3x28x28
# essential to be able to feed images into the model
images = images.permute(0, 3, 1, 2)
# Forward
outputs = model(images)
# convert the labels from one hot encoding vectors into integer values
labels = labels.view(-1, 37)
y_true = torch.argmax(labels, 1)
# find the index of the prediction
y_pred = torch.argmax(outputs, 1).type('torch.FloatTensor')
# convert to FloatTensor
y_true = y_true.type('torch.FloatTensor')
# find the mean difference of the comparisons
correct += torch.sum(torch.eq(y_true, y_pred).type('torch.FloatTensor'))
print('Accuracy on the test set: {:.4f}%'.format(correct / len(test_dataset) * 100))
print()
# learning curve function
def plot_learning_curve(train_losses, validation_losses):
# plot the training and validation losses
plt.ylabel('Loss')
plt.xlabel('Number of Epochs')
plt.plot(train_losses, label="training")
plt.plot(validation_losses, label="validation")
plt.legend(loc=1)
# plot the learning curve
plt.title("Learning Curve (Loss vs Number of Epochs)")
plot_learning_curve(train_losses, validation_losses)
torch.save(model.state_dict(), "model1.pth")

I'm also using a trusty RTX 2070 and this is how I do GPU acceleration (for 1 GPU):
cuda_ = "cuda:0"
device = torch.device(cuda_ if torch.cuda.is_available() else "cpu")
model = CNN()
model.to(device)
This is the most up-to-date and recommended way to do GPU acceleration, as it gives more flexibility (don't need to amend code even when GPU isn't available). You would do the same to pass your images into the GPU vram, via images = images.to(device).

How to use smac for hyper-parameter optimization of Convolution Neural Network?

Note: Long Post. Please bear with me
I have implemented a convolution neural network in PyTorch on KMNIST dataset. I need to use SMAC to optimize the learning rate and the momentum of Stochastic Gradient Descent of the CNN. I am new in hyperparameter optimization and what I learnt from the smac documentation is,
SMAC evaluates the algorithm to be optimized by invoking it through a Target Algorithm Evaluator (TAE).
We need a Scenario-object to configure the optimization process.
run_obj parameter in Scenario object specifies what SMAC is supposed to optimize.
My Ultimate goal is to get a good accuracy or low loss
This is what I have done so far:
Convolution Neural Network
import numpy as np
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
import torchvision.datasets as datasets
from torch.autograd import Variable
from datasets import *
import torch.utils.data
import torch.nn.functional as F
import matplotlib.pyplot as plt
# Create the model class
class CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__() # to inherent the features of nn.Module
self.cnn1 = nn.Conv2d(in_channels = 1, out_channels = 8, kernel_size = 3, stride = 1, padding =1)
# in_channels =1 because of grey scale image
# kernel_size = feature_size
# padding = 1 because for same padding = [(filter_size -1)/2]
# the output size of the 8 feature maps is [(input_size - filter_size +2(padding)/stride)+1]
#Batch Normalization
self.batchnorm1 = nn.BatchNorm2d(8)
# RELU
self.relu = nn.ReLU()
self.maxpool1 = nn.MaxPool2d(kernel_size =2)
# After maxpooling, the output of each feature map is 28/2 =14
self.cnn2 = nn.Conv2d(in_channels = 8, out_channels = 32, kernel_size = 5, stride = 1, padding =2)
#Batch Normalization
self.batchnorm2 = nn.BatchNorm2d(32)
# RELU
#self.relu = nn.ReLU()
self.maxpool2 = nn.MaxPool2d(kernel_size =2)
# After maxpooling , the output of each feature map is 14/2 =7of them is of size 7x7 --> 32*7*7=1568
# Flatten the feature maps. You have 32 feature maps, each
self.fc1 = nn.Linear(in_features=1568, out_features = 600)
self.dropout = nn.Dropout(p=0.5)
self.fc2 = nn.Linear(in_features=600, out_features = 10)
def forward(self,x):
out = self.cnn1(x)
#out = F.relu(self.cnn1(x))
out = self.batchnorm1(out)
out = self.relu(out)
out = self.maxpool1(out)
out = self.cnn2(out)
out = self.batchnorm2(out)
out = self.relu(out)
out = self.maxpool2(out)
#Now we have to flatten the output. This is where we apply the feed forward neural network as learned
#before!
#It will the take the shape (batch_size, 1568) = (100, 1568)
out = out.view(-1, 1568)
#Then we forward through our fully connected layer
out = self.fc1(out)
out = self.relu(out)
out = self.dropout(out)
out = self.fc2(out)
return out
def train(model, train_loader, optimizer, epoch, CUDA, loss_fn):
model.train()
cum_loss=0
iter_count = 0
for i, (images, labels) in enumerate(train_load):
if CUDA:
images = Variable(images.cuda())
images = images.unsqueeze(1)
images = images.type(torch.FloatTensor)
images = images.cuda()
labels = Variable(labels.cuda())
labels = labels.type(torch.LongTensor)
labels = labels.cuda()
else:
images = Variable(images)
images = images.unsqueeze(1)
images = images.type(torch.DoubleTensor)
labels = Variable(labels)
labels = labels.type(torch.DoubleTensor)
optimizer.zero_grad()
outputs = model(images)
loss = loss_fn(outputs, labels)
loss.backward()
optimizer.step()
cum_loss += loss
if (i+1) % batch_size == 0:
correct = 0
total = 0
acc = 0
_, predicted = torch.max(outputs.data,1)
total += labels.size(0)
if CUDA:
correct += (predicted.cpu()==labels.cpu()).sum()
else:
correct += (predicted==labels).sum()
accuracy = 100*correct/total
if i % len(train_load) == 0:
iter_count += 1
ave_loss = cum_loss/batch_size
return ave_loss
batch_size = 100
epochs = 5
e = range(epochs)
#print(e)
#Load datasets
variable_name=KMNIST()
train_images = variable_name.images
train_images = torch.from_numpy(train_images)
#print(train_images.shape)
#print(type(train_images))
train_labels = variable_name.labels
train_labels = torch.from_numpy(train_labels)
#print(train_labels.shape)
#print(type(train_labels))
train_dataset = torch.utils.data.TensorDataset(train_images, train_labels)
# Make the dataset iterable
train_load = torch.utils.data.DataLoader(dataset = train_dataset, batch_size = batch_size, shuffle = True)
print('There are {} images in the training set' .format(len(train_dataset)))
print('There are {} images in the loaded training set' .format(len(train_load)))
def net(learning_rate, Momentum):
model = CNN()
CUDA = torch.cuda.is_available()
if CUDA:
model = model.cuda()
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr = learning_rate,momentum = Momentum, nesterov= True)
iteration = 0
total_loss=[]
for epoch in range(epochs):
ave_loss = train(model, train_load, optimizer, epoch, CUDA, loss_fn)
total_loss.append(ave_loss)
return optimizer, loss_fn, model, total_loss
optimizer, loss_fn, model, total_loss = net(learning_rate= 0.01, Momentum = 0.09)
# Print model's state_dict
print("---------------")
print("Model's state_dict:")
for param_tensor in model.state_dict():
print(param_tensor, "\t", model.state_dict()[param_tensor].size())
print("---------------")
#print("Optimizer's state_dict:")
#for var_name in optimizer.state_dict():
# print(var_name, "\t", optimizer.state_dict()[var_name])
torch.save(model.state_dict(), "kmnist_cnn.pt")
plt.plot(e, (np.array(total_loss)))
plt.xlabel("# Epoch")
plt.ylabel("Loss")
plt.show()
print('Done!')
smac hyperparameter optimization:
from smac.configspace import ConfigurationSpace
from ConfigSpace.hyperparameters import CategoricalHyperparameter, \
UniformFloatHyperparameter, UniformIntegerHyperparameter
from smac.configspace.util import convert_configurations_to_array
#from ConfigSpace.conditions import InCondition
# Import SMAC-utilities
from smac.tae.execute_func import ExecuteTAFuncDict
from smac.scenario.scenario import Scenario
from smac.facade.smac_facade import SMAC
# Build Configuration Space which defines all parameters and their ranges
cs = ConfigurationSpace()
# We define a few possible types of SVM-kernels and add them as "kernel" to our cs
lr = UniformFloatHyperparameter('learning_rate', 1e-4, 1e-1, default_value='1e-2')
momentum = UniformFloatHyperparameter('Momentum', 0.01, 0.1, default_value='0.09')
cs.add_hyperparameters([lr, momentum])
def kmnist_from_cfg(cfg):
cfg = {k : cfg[k] for k in cfg if cfg[k]}
print('Config is', cfg)
#optimizer, loss_fn, model, total_loss = net(**cfg)
#optimizer, loss_fn, model, total_loss = net(learning_rate= cfg["learning_rate"], Momentum= cfg["Momentum"])
optimizer, loss_fn, model, total_loss = net(learning_rate= 0.02, Momentum= 0.05)
return optimizer, loss_fn, model, total_loss
# Scenario object
scenario = Scenario({"run_obj": "quality", # we optimize quality (alternatively runtime)
"runcount-limit": 200, # maximum function evaluations
"cs": cs, # configuration space
"deterministic": "true"
})
#def_value = kmnist_from_cfg(cs.get_default_configuration())
#print("Default Value: %.2f" % (def_value))
# Optimize, using a SMAC-object
print("Optimizing! Depending on your machine, this might take a few minutes.")
smac = SMAC(scenario=scenario,tae_runner=kmnist_from_cfg) #rng=np.random.RandomState(42)
smac.solver.intensifier.tae_runner.use_pynisher = False
print("SMAC", smac)
incumbent = smac.optimize()
inc_value = kmnist_from_cfg(incumbent)
print("Optimized Value: %.2f" % (inc_value))
When I give loss as the run_obj parameter, I get the error message
ArgumentError: argument --run-obj/--run_obj: invalid choice: 'total_loss' (choose from 'runtime', 'quality')
To be honest, I do not know what does "quality" means. Anyways, when I give quality as the run_obj parameter, I get the error message
TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
If I understood it correctly, the above error message is obtained when an int is expected but str is given. To check whether the problem was with configuration space, I tried
optimizer, loss_fn, model, total_loss = net(learning_rate= 0.02, Momentum= 0.05)
instead of these:
optimizer, loss_fn, model, total_loss = net(**cfg)
optimizer, loss_fn, model, total_loss = net(learning_rate= cfg["learning_rate"], Momentum= cfg["Momentum"])
the error remains the same.
Any ideas on how to use smac to optimize hyperparameters of CNN and why do I get this error message? I tried looking for similar problems online. This post was a little helpful. Unfortunately, since there is no implementation of smac on NN (at least I did not find it), I cannot figure out the solution. I ran out of all ideas.
Any help, ideas or useful link is appreciated.
Thank you!

I believe the tae_runner (kmnist_from_cfg in your case) has to be a callable that takes a configuration space point, which you correctly provide, and outputs a single number. You output a tuple of things. Perhaps only return the total_loss on the validation set? I am basing this on the svm example in the smac github at https://github.com/automl/SMAC3/blob/master/examples/svm.py.

CTC loss goes down and stops

I’m trying to train a captcha recognition model. Model details are resnet pretrained CNN layers + Bidirectional LSTM + Fully Connected. It reached 90% sequence accuracy on captcha generated by python library captcha. The problem is that these generated captcha seems to have similary location of each character. When I randomly add spaces between characters, the model does not work any more. So I wonder is LSTM learning segmentation during learning? Then I try to use CTC loss. At first, loss goes down pretty quick. But it stays at about 16 without significant drop later. I tried different layers of LSTM, different number of units. 2 Layers of LSTM reach lower loss, but still not converging. 3 layers are just like 2 layers. The loss curve:
#encoding:utf8
import os
import sys
import torch
import warpctc_pytorch
import traceback
import torchvision
from torch import nn, autograd, FloatTensor, optim
from torch.nn import functional as F
from torch.utils.data import DataLoader
from torch.optim.lr_scheduler import MultiStepLR
from tensorboard import SummaryWriter
from pprint import pprint
from net.utils import decoder
from logging import getLogger, StreamHandler
logger = getLogger(__name__)
handler = StreamHandler(sys.stdout)
logger.addHandler(handler)
from dataset_util.utils import id_to_character
from dataset_util.transform import rescale, normalizer
from config.config import MAX_CAPTCHA_LENGTH, TENSORBOARD_LOG_PATH, MODEL_PATH
class CNN_RNN(nn.Module):
def __init__(self, lstm_bidirectional=True, use_ctc=True, *args, **kwargs):
super(CNN_RNN, self).__init__(*args, **kwargs)
model_conv = torchvision.models.resnet18(pretrained=True)
for param in model_conv.parameters():
param.requires_grad = False
modules = list(model_conv.children())[:-1] # delete the last fc layer.
for param in modules[8].parameters():
param.requires_grad = True
self.resnet = nn.Sequential(*modules) # CNN with fixed parameters from resnet as feature extractor
self.lstm_input_size = 512 * 2 * 2
self.lstm_hidden_state_size = 512
self.lstm_num_layers = 2
self.chracter_space_length = 64
self._lstm_bidirectional = lstm_bidirectional
self._use_ctc = use_ctc
if use_ctc:
self._max_captcha_length = int(MAX_CAPTCHA_LENGTH * 2)
else:
self._max_captcha_length = MAX_CAPTCHA_LENGTH
if lstm_bidirectional:
self.lstm_hidden_state_size = self.lstm_hidden_state_size * 2 # so that hidden size for one direction in bidirection lstm is the same as vanilla lstm
self.lstm = self.lstm = nn.LSTM(self.lstm_input_size, self.lstm_hidden_state_size // 2, dropout=0.5, bidirectional=True, num_layers=self.lstm_num_layers)
else:
self.lstm = nn.LSTM(self.lstm_input_size, self.lstm_hidden_state_size, dropout=0.5, bidirectional=False, num_layers=self.lstm_num_layers) # dropout doen't work for one layer lstm
self.ouput_to_tag = nn.Linear(self.lstm_hidden_state_size, self.chracter_space_length)
self.tensorboard_writer = SummaryWriter(TENSORBOARD_LOG_PATH)
# self.dropout_lstm = nn.Dropout()
def init_hidden_status(self, batch_size):
if self._lstm_bidirectional:
self.hidden = (autograd.Variable(torch.zeros((self.lstm_num_layers * 2, batch_size, self.lstm_hidden_state_size // 2))),
autograd.Variable(torch.zeros((self.lstm_num_layers * 2, batch_size, self.lstm_hidden_state_size // 2)))) # number of layers, batch size, hidden dimention
else:
self.hidden = (autograd.Variable(torch.zeros((self.lstm_num_layers, batch_size, self.lstm_hidden_state_size))),
autograd.Variable(torch.zeros((self.lstm_num_layers, batch_size, self.lstm_hidden_state_size)))) # number of layers, batch size, hidden dimention
def forward(self, image):
'''
:param image: # batch_size, CHANNEL, HEIGHT, WIDTH
:return:
'''
features = self.resnet(image) # [batch_size, 512, 2, 2]
batch_size = image.shape[0]
features = [features.view(batch_size, -1) for i in range(self._max_captcha_length)]
features = torch.stack(features)
self.init_hidden_status(batch_size)
output, hidden = self.lstm(features, self.hidden)
# output = self.dropout_lstm(output)
tag_space = self.ouput_to_tag(output.view(-1, output.size(2))) # [MAX_CAPTCHA_LENGTH * BATCH_SIZE, CHARACTER_SPACE_LENGTH]
tag_space = tag_space.view(self._max_captcha_length, batch_size, -1)
if not self._use_ctc:
tag_score = F.log_softmax(tag_space, dim=2) # [MAX_CAPTCHA_LENGTH, BATCH_SIZE, CHARACTER_SPACE_LENGTH]
else:
tag_score = tag_space
return tag_score
def train_net(self, data_loader, eval_data_loader=None, learning_rate=0.008, epoch_num=400):
try:
if self._use_ctc:
loss_function = warpctc_pytorch.warp_ctc.CTCLoss()
else:
loss_function = nn.NLLLoss()
# optimizer = optim.SGD(filter(lambda p: p.requires_grad, self.parameters()), momentum=0.9, lr=learning_rate)
# optimizer = MultiStepLR(optimizer, milestones=[10,15], gamma=0.5)
# optimizer = optim.Adadelta(filter(lambda p: p.requires_grad, self.parameters()))
optimizer = optim.Adam(filter(lambda p: p.requires_grad, self.parameters()))
self.tensorboard_writer.add_scalar("learning_rate", learning_rate)
tensorbard_global_step=0
if os.path.exists(os.path.join(TENSORBOARD_LOG_PATH, "resume_step")):
with open(os.path.join(TENSORBOARD_LOG_PATH, "resume_step"), "r") as file_handler:
tensorbard_global_step = int(file_handler.read()) + 1
for epoch_index, epoch in enumerate(range(epoch_num)):
for index, sample in enumerate(data_loader):
optimizer.zero_grad()
input_image = autograd.Variable(sample["image"]) # batch_size, 3, 255, 255
tag_score = self.forward(input_image)
if self._use_ctc:
tag_score, target, tag_score_sizes, target_sizes = self._loss_preprocess_ctc(tag_score, sample)
loss = loss_function(tag_score, target, tag_score_sizes, target_sizes)
loss = loss / tag_score.size(1)
else:
target = sample["padded_label_idx"]
tag_score, target = self._loss_preprocess(tag_score, target)
loss = loss_function(tag_score, target)
print("Training loss: {}".format(float(loss)))
self.tensorboard_writer.add_scalar("training_loss", float(loss), tensorbard_global_step)
loss.backward()
optimizer.step()
if index % 250 == 0:
print(u"Processing batch: {} of {}, epoch: {}".format(index, len(data_loader), epoch_index))
self.evaluate(eval_data_loader, loss_function, tensorbard_global_step)
tensorbard_global_step += 1
self.save_model(MODEL_PATH + "_epoch_{}".format(epoch_index))
except KeyboardInterrupt:
print("Exit for KeyboardInterrupt, save model")
self.save_model(MODEL_PATH)
with open(os.path.join(TENSORBOARD_LOG_PATH, "resume_step"), "w") as file_handler:
file_handler.write(str(tensorbard_global_step))
except Exception as excp:
logger.error(str(excp))
logger.error(traceback.format_exc())
def predict(self, image):
# TODO ctc version
'''
:param image: [batch_size, channel, height, width]
:return:
'''
tag_score = self.forward(image)
# TODO ctc
# if self._use_ctc:
# tag_score = F.softmax(tag_score, dim=-1)
# decoder.decode(tag_score)
confidence_log_probability, indexes = tag_score.max(2)
predicted_labels = []
for batch_index in range(indexes.size(1)):
label = ""
for character_index in range(self._max_captcha_length):
if int(indexes[character_index, batch_index]) != 1:
label += id_to_character[int(indexes[character_index, batch_index])]
predicted_labels.append(label)
return predicted_labels, tag_score
def predict_pil_image(self, pil_image):
try:
self.eval()
processed_image = normalizer(rescale({"image": pil_image}))["image"].view(1, 3, 255, 255)
result, tag_score = self.predict(processed_image)
self.train()
except Exception as excp:
logger.error(str(excp))
logger.error(traceback.format_exc())
return [""], None
return result, tag_score
def evaluate(self, eval_dataloader, loss_function, step=0):
total = 0
sequence_correct = 0
character_correct = 0
character_total = 0
loss_total = 0
batch_size = eval_data_loader.batch_size
true_predicted = {}
self.eval()
for sample in eval_dataloader:
total += batch_size
input_images = sample["image"]
predicted_labels, tag_score = self.predict(input_images)
for predicted, true_label in zip(predicted_labels, sample["label"]):
if predicted == true_label: # dataloader is making label a list, use batch_size=1
sequence_correct += 1
for index, true_character in enumerate(true_label):
character_total += 1
if index < len(predicted) and predicted[index] == true_character:
character_correct += 1
true_predicted[true_label] = predicted
if self._use_ctc:
tag_score, target, tag_score_sizes, target_sizes = self._loss_preprocess_ctc(tag_score, sample)
loss_total += float(loss_function(tag_score, target, tag_score_sizes, target_sizes) / batch_size)
else:
tag_score, target = self._loss_preprocess(tag_score, sample["padded_label_idx"])
loss_total += float(loss_function(tag_score, target)) # averaged over batch index
print("True captcha to predicted captcha: ")
pprint(true_predicted)
self.tensorboard_writer.add_text("eval_ture_to_predicted", str(true_predicted), global_step=step)
accuracy = float(sequence_correct) / total
avg_loss = float(loss_total) / (total / batch_size)
character_accuracy = float(character_correct) / character_total
self.tensorboard_writer.add_scalar("eval_sequence_accuracy", accuracy, global_step=step)
self.tensorboard_writer.add_scalar("eval_character_accuracy", character_accuracy, global_step=step)
self.tensorboard_writer.add_scalar("eval_loss", avg_loss, global_step=step)
self.zero_grad()
self.train()
def _loss_preprocess(self, tag_score, target):
'''
:param tag_score: value return by self.forward
:param target: sample["padded_label_idx"]
:return: (processed_tag_score, processed_target) ready for NLLoss function
'''
target = target.transpose(0, 1)
target = target.contiguous()
target = target.view(target.size(0) * target.size(1))
tag_score = tag_score.view(-1, self.chracter_space_length)
return tag_score, target
def _loss_preprocess_ctc(self, tag_score, sample):
target_2d = [
[int(ele) for ele in sample["padded_label_idx"][row, :] if int(ele) != 0 and int(ele) != 1]
for row in range(sample["padded_label_idx"].size(0))]
target = []
for ele in target_2d:
target.extend(ele)
target = autograd.Variable(torch.IntTensor(target))
# tag_score = F.softmax(F.sigmoid(tag_score), dim=-1)
tag_score_sizes = autograd.Variable(torch.IntTensor([self._max_captcha_length] * tag_score.size(1)))
target_sizes = autograd.Variable(sample["captcha_length"].int())
return tag_score, target, tag_score_sizes, target_sizes
# def visualize_graph(self, dataset):
# '''Since pytorch use dynamic graph, an input is required to visualize graph in tensorboard'''
# # warning: Do not run this, the graph is too large to visualize...
# sample = dataset[0]
# input_image = autograd.Variable(sample["image"].view(1, 3, 255, 255))
# tag_score = self.forward(input_image)
# self.tensorboard_writer.add_graph(self, tag_score)
def save_model(self, model_path):
self.tensorboard_writer.close()
self.tensorboard_writer = None # can't be pickled
torch.save(self, model_path)
self.tensorboard_writer = SummaryWriter(TENSORBOARD_LOG_PATH)
#classmethod
def load_model(cls, model_path=MODEL_PATH, *args, **kwargs):
net = cls(*args, **kwargs)
if os.path.exists(model_path):
model = torch.load(model_path)
if model:
model.tensorboard_writer = SummaryWriter(TENSORBOARD_LOG_PATH)
net = model
return net
def __del__(self):
if self.tensorboard_writer:
self.tensorboard_writer.close()
if __name__ == "__main__":
from dataset_util.dataset import dataset, eval_dataset
data_loader = DataLoader(dataset, batch_size=2, shuffle=True)
eval_data_loader = DataLoader(eval_dataset, batch_size=2, shuffle=True)
net = CNN_RNN.load_model()
net.train_net(data_loader, eval_data_loader=eval_data_loader)
# net.predict(dataset[0]["image"].view(1, 3, 255, 255))
# predict_pil_image test code
# from config.config import IMAGE_PATHS
# import glob
# from PIL import Image
#
# image_paths = glob.glob(os.path.join(IMAGE_PATHS.get("EVAL"), "*.png"))
# for image_path in image_paths:
# pil_image = Image.open(image_path)
# predicted, score = net.predict_pil_image(pil_image)
# print("True value: {}, predicted: {}".format(os.path.split(image_path)[1], predicted))
print("Done")
The above codes are main part. If you need other components that makes it running, leave a comment. Got stuck here for quite long. Any advice for training crnn + ctc is appreciated.

I've been training with ctc loss and encountered the same problem. I know this is a rather late answer but hopefully it'll help someone else who's researching on this. After trial and error and a lot of research there are a few things that's worth knowing when it comes to training with ctc (if your model is set up correctly):
The quickest way for the model to lower cost is to predict only blanks. This is noted in a few papers and blogs: see http://www.tbluche.com/ctc_and_blank.html
The model learns to predict only blanks first, then it starts picking up on the error signal in regards to the correct underlying labels. This is also explained in the above link. In practice, I noticed that my model starts to learn the real underlying labels/targets after a couple hundred epochs and the loss starts decreasing dramatically again. Similar to what is shown for the toy example here: https://thomasmesnard.github.io/files/CTC_Poster_Mesnard_Auvolat.pdf
These parameters have a great impact on whether your model converges or not - learning rate, batch size and epoch number.

You have a few questions, so I will try to answer them one by one.
First, why does adding spaces to the captcha break the model?
A neural network learns to deal with the data it is trained on. If you change the distribution of the data (by for example adding spaces between characters) there is no guarantee that the network will generalize. As you hint at in your question. It is possible that the captchas you train on always have the characters in the same positions, or at the same distance from one another, thus your model learns that and learns to exploit this by looking in those positions. If you want your network to generalize a specific scenario, you should explicitly train on that scenario. So in your case, you should add random spaces also during training.
Second, why does the loss not go below 16?
Clearly, from the fact that your training loss is also stalled at 16 (like your validation loss), the problem is that your model simply doesn't have the capacity to deal with the complexity of the problem. In other words, your model is underfitting. You had the correct reflex to try to increase the capacity of your network. You tried to increase the capacity of the LSTM and it didn't help. Thus, the next logical step is that the convolution part of your network is not powerful enough. So here are a few things that you might want to try, from most likely to succeed in my opinion to least likely:
Make convnet trainable: I notice that you are using a pretrained convnet and that you are not fine-tuning the weights of that convnet. That could be a problem. Whatever your convnet was trained on, it might not develop the required features to deal with captchas. You should try learning the weights of the convnet too, in order to develop useful features for captchas.
Use deeper convnet: This is the naive thing to do. Your convnet doesn't have good enough features, try a more powerful deeper one. (But you should definitely use this only after you've made the convnet trainable).

From my experience, training RNN model with CTC loss is not an EASY task. The model may not converge at all if the training is not carefully setup-ed. Here are my suggestions:
Check the CTC loss output along training. For a model would converge, the CTC loss at each batch fluctuates notably. If you observed that the CTC loss shrinks almost monotonically to a stable value, then the model is most likely stuck at a local minima
Use short samples to pretrain your model. Though we have advanced RNN strucures like LSTM and GRU, it's still hard to back-propagate the RNN for long steps.
Enlarge sample variety. You can even add artificial samples to help your model escape from local minima.
F.Y.I., we've just open-sourced a new deep learning framework Dandelion which has built-in CTC objective, and interface pretty much like pytorch. You can try your model with Dandelion and compare it with your current implementation.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.