CNN accuracy frozen - python

So I am training a CNN network to detect certain features from input images. The structure is as follow:
Input Image -> Conv2D -> Relu -> Dense -> Softmax -> Result
The dataset contains 180 black&white images in 3 classes, with exactly 60 images for each class.
My problem is both validation accuracy and training accuracy does not change after only about 6-7 epochs, as shown in the picture below:
I tried Googling for a solution but no result so far. At first I thought that my model got overfitted since it does not "learn" anymore and training loss keep decreasing while validation loss does not. Therefore I tried adding learning rate decay, Nesterov momentum, increasing batch size to reduce overfitting, but they did not change things much (well, the overall accuracy did improve from 0.90 to 0.92 though), and my accuracy is stuck at exactly 1.00, 0.88 and 0.87 every time (before it was 1.00, 0.85 and 0.85).
The features I want to recognize are fairly simple and I need to use the results in a control loop so I want to have a light weight model with an accuracy of at least 95% if possible. Do you guys have any idea on what should be done or at least the researching direction to improve this model?

Your model is over-fitting. The training loss is near zero (and train accuracy is already at 100%); whereas, the validation loss is much higher than the training loss. An ideal model will drop the training and validation loss together. You haven't provided any code; so assuming your model isn't wrong, I would suggest regularization techniques such as dropout, weight-decay (L2), and batch normalization.

Use Data Augmentation tehcniques is another possible solution to avoid overfitting.
That is, apply some simple image transformations like horizontal flip, verticial flip, rotate, translate, etc. Here some examples.
That will make your data more varied and when your model is training, it will see more samples of what it need to learn. If you just fetch this with the same images and labels, it will just memorize that and wont be able to perform good results on validation/test data.

Related

Training Accuracy Increasing but Validation Accuracy Remains as Chance of Each Class (1/number of classes)

I am training a classifier using CNNs in Pytorch. My classifier has 6 labels. There are 700 training images for each label and 10 validation images for each label. The batch size is 10 and the learning rate is 0.000001. Each class has 16.7% of the whole dataset images. I have trained 60 epochs and the architecture has 3 main layers:
Conv2D->ReLU->BatchNorm2D->MaxPool2D>Dropout2D
Conv2D->ReLU->BatchNorm2D->Flattening->Dropout2D
Linear->ReLU->BatchNorm1D->Dropout And finally a fully connected and
a softmax.
My optimizer is AdamW and the loss function is crossentropy. The network is training well as the training accuracy is increasing but the validation accuracy remains almost fixed and equal as the chance of each class(1/number of classes). The accuracy is shown in the image below:
Accuracy of training and test
And the loss is shown in:
Loss for training and validation
Is there any idea why this is happening?How can I improve the validation accuracy? I have used L1 and L2 Regularization as well and also the Dropout Layers. I have also tried adding more data but these didn't help.
Problem Solved: First, I looked at this problem as overfitting and spend so much time on methods to solve this such as regularization and augmentation. Finally, after trying different methods, I couldn't improve the validation accuracy. Thus, I went through the data. I found a bug in my data preparation which was resulting in similar tensors being generated under different labels. I generated the correct data and the problem was solved to some extent (The validation accuracy increased around 60%). Then finally I improved the validation accuracy to 90% by adding more "conv2d + maxpool" layers.
This is not so much a programming related question so maybe ask it again in cross-validated
and it would be easier if you would post your architecture code.
But here are things that I would suggest:
you wrote that you "tried adding more data", if you can, always use all data you have. If thats still not enough (and even if it is) use augmentation (e.g. flip, crop, add noise to the image)
your learning rate should not be so small, start with 0.001 and decay while training or try ~ 0.0001 without decaying
remove the dropout after the conv layers and the batchnorm after the dense layers and see if that helps, it is not so common to use cropout after conv but normally that shouldnt have a negative effect. try it anyways

How should I interpret or intuitively explain the following results for my CNN model?

I am training a CNN model that has to classify 4 objects. 3 mugs (white, black, blue) and 1 glass. When I train my model for only 10 epochs, I get a validation accuracy of 25%, where everything is being labeled as the white mug. However when I would train the same model for longer, it eventually diverts from this 25% accuracy and climbs up to 80%, the only problem it has, is classifying the white mug.
In other words, if I am able to find why my classifier classifies the white mug wrongly, then I could potentially reach validation_accuracy of 90%. My question thus is, what are some things I could try to find out why it mispredicts, or things to improve. I have already used LIME to check why my model classifies something, but I could not get any wiser from it.
Some specs of the model:
No data augmentation
5 convolutional layers (32, 64, 128, 256, 512) -> into GlobalMaxPooling, Flatten, and 2 dense layers (128, 4)
Activation layers (relu)
2000 training images, 1000 validation images (classes are balanced)
Extra: The model gets 100% accuracy on the training data after 2 epochs, and slowly climbs up to 80% on the validation data (after about 40-50 epochs).
Extra 2: Sometimes the model gets 80%, sometimes only 74%
The model is reaching 100% training accuracy when validation accuracy is still only 25%. A 75% gap between training and validation accuracy is enormous and indicates the model is overfitting to the training data, likely due to the small size of the training data set (2000). Data augmentation would likely significantly reduce overfitting and improve validation accuracy - I would start with random cropping, brightness, and saturation. Collecting more training data with varied backgrounds, orientations, and lighting conditions would also help.
There could be multiple issues here. It looks like your network is overfitting, so first off you may want to add regularization or dropout to your training process. Secondly, you'll also want to make sure your images come from the same source, i.e. if your training/test examples all come from Google images or have vastly different qualities, angles, colors, etc, this might harm the network's ability to classify the mug correctly.
Finally, you might want to add a softmax layer at the end of your network since you're doing a multiclass classification. By doing this, you'll be able to see what the probability is for the white mug compared to the other objects, which can help you with debugging.

Why validation accuracy remains at 75% while train accuracy is 100 %?

I used my own data set to train a model using retrain.py file from Tensorflow site. However, with my first set of images, I am seeing test accuracy of 100% while validation accuracy is at 70%. I see that validation entropy is increasing which tells overfitting. I am new to this field and got to this stage by following online tutorials.
I did not enable random brightness, crop and flip yet for training. I am trying to understand why is this behaviour? I tried flower example and it worked as expected. Cross-entropy got lowest instead of increasing with my data set.
Could some one explain whats going on inside the CNN here ?
Your model has over-fitted on the training data. If its a large model, you should consider using transfer learning where you train the model on a large dataset like ImageNet and then fine-tune on your data. You can also try adding some form of regularization to prevent overfitting specially Dropout and L2 regularization.
This simply means your model is overfitting. Overfitting means your model is not generalizing well to unseen data (ie the validation data). What you can do is add some form of regularization (L2 is used normally). What this does is it penalizes weights from getting very high values which would thereby lead to overfitting. This will also act against the model trying to fit outliers which again leads to less generalization and more overfitting.

Best model for english alphabet and digits recognition

I have a fairly big dataset(over 0.5 mln 50x50 images) consisting of 62 classes in total. Images represent english alphabet and digits(all handwritten). Each class has at least 2000 samples.
I've been training Convolutional Neural Network to recognize these images with Tensorflow.
Problem: After quickly(about 200 training iterations) getting very close to local optima(loss values like 0.01), my classifier is stuck at around 82% accuracy on test set.
Question: How can I possibly get more accuracy? Is there something I do wrong with CNN? Also, is trying SVM worth it? I give details according to my CNN model below.
Random dataset entry:
Question 2: Besides rotating patterns, is my pre-processing methodology good? Shall I stretch patterns or leave it as it is now: with left and right margin regions with redundant white pixels.
Details and hyperparams:
Tensorflow optimizer: AdamOptimizer
learning rate alpha: 0.001
dropout: 1.0(no dropout)
mini batch size: 1500
number of convolutional layers: 2
number of pooling layers: 2
fully connected layers: 1
stride: 2 pixels
filter size: 5 pixels
test/train set proportion: 0.2/0.8
NOTE: Paterrns aren't skeletons, they have their original widths. Images are binary with pattern being 0 value and background 1.
UPDATE
Here's my code responsible for training and a tiny subset of images:
https://drive.google.com/open?id=0B5kuwbyrKVqnTm1PMGZGMUxUNFU
Due to my slow internet connection I cannot afford to upload enough data. However, you can plot these images to decide if further pre-processing is needed.
It seems overfit. Your loss is 0.01, and on test set accuracy around 82%, which loss greater than 0.01(accuracy 99% about loss 0.04).
This is a specific problem. I think:
reduce learning rate, like 1e-4
add dropout, dropout will help reduce overfit
reduce filter size, I think 5x5 is too big for 50x50 pic, and you could add more one convolutional layer
your activation method, relu is a good one to reduce overfit
Not tested, this is just suggestions, BTW, if you like, give a dataset url, I'd like to train it. ;-)

Keras CNN training accuracy is good but test accuracy is very low

Please give me any comment for these CNN results.
I have used 2000 training images and 400 test images.
Training accuracy is perfect but test accuracy is very low.
I think it because there is much variation between training and test images.
Anyone have a good idea for this case?
[]
This is clear case of over-fitting. How many learneable parameters you have? For example VGGnet has 138M parameters and in this case its not very hard to see some neuron in the network must have sort of memorized a training image as it is and thus your network is not generalizing well.
To fix that, first of all you can try a simpler model if task is simple like discriminating between shapes. Also you can increase training data via transformations like swapping color channels (if it doesnt impact the output class), flipping or rotating image to make your net generalize better. Include L1/L2 regularization in your loss function and try dropouts.

Categories