CNN stuck at 0% accuracy - python

I'm learning CNN and wondering why is my network stuck at 0% accuracy even after multiple epochs? I'm sharing the entire code as it's really simple.
I have a dataset with faces and respective ages. I'm using keras and tf to train a convolution neural network to determine age.
However, my accuracy is always reporting as 0%. I'm very new to neural networks and I'm hoping you could tell me what I am doing wrong?
path = "dataset"
pixels = []
age = []
for img in os.listdir(path):
ages = img.split("_")[0]
img = cv2.imread(str(path)+"/"+str(img))
img = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
age = np.array(age,dtype=np.int64)
pixels = np.array(pixels)
x_train,x_test,y_train,y_test = train_test_split(pixels,age,random_state=100)
input = Input(shape=(200,200,3))
conv1 = Conv2D(70,(3,3),activation="relu")(input)
conv2 = Conv2D(65,(3,3),activation="relu")(conv1)
batch1 = BatchNormalization()(conv2)
pool3 = MaxPool2D((2,2))(batch1)
conv3 = Conv2D(60,(3,3),activation="relu")(pool3)
batch2 = BatchNormalization()(conv3)
pool4 = MaxPool2D((2,2))(batch2)
flt = Flatten()(pool4)
age_l = Dense(128,activation="relu")(flt)
age_l = Dense(64,activation="relu")(age_l)
age_l = Dense(32,activation="relu")(age_l)
age_l = Dense(1,activation="relu")(age_l)
model = Model(inputs=input,outputs=age_l)
save =,y_train,validation_data=(x_test,y_test),epochs=2)

Well you have to decide if you want to do a classification model or a regression model. As it stands now it looks like you are trying to do a regression model.
Lets start at the outset. Apparently you have a dataset of image files and within the files path is text that defines the age so it is something like
say 27_01.jpg I assume. So you split the path based on the _ to get the age associated with the image file. You then read in the image using cv2 and then convert it to rgb. Now cv2 reads in the image and return it as an array so you don't need to convert it to an np array just use
now the variable ages is a string which you want to convert into an integer. So just
use the code
ages =int( img.split("_")[0])
this is now a scaler integer value, not an array so just use
you now have two lists, pixels and age. To use them in a model you need to convert them to np arrays so use
Now the next thing you want to do is to create a train set and a test set using the train_test_split function. Lets assume you want 90% of the data set to be used for training and 10% for testing. so use
x_train,x_test,y_train,y_test = train_test_split(pixels,age,train_size=.9, shuffle=True, random_state=100)
Now lets look at your model. This is what decides if you are doing regression or
classification. You want to do regression. Your model is OK but needs some changes
You have 4 dense layers. I suspect that this will lead to a case where your model
is over-fitting so I recommend that prior to the last layer you add a dropout layer
Use the code
drop=Dropout(rate=.4, seed=123)(age_1)
age_l = Dense(1,activation="linear")(age_l)
Note the activation is set to linear. That way the output can take a range of values
that can be compared to the integer values of the age array.
Now when you compile your model you want your loss to be mse. So it is measuring the error between the models output and the ages. Sparse_categorical crossentropy is used when you are doing classification which is NOT what you are doing. As for the metrics accuracy is used for classification models so you only want to use mae So you compile code should be
now looks ok but you should run for more epochs like say 20. Now when you run your model look at the training loss and the validation loss. As the training loss decreases, on AVERAGE the validation loss should trend to decrease. If it starts to trend upward your model is over-fitting. In that case you may want to add an additional dropout layer.
At some point your model will stop improving if you run a sufficient number of epochs. You can usually get an improvement in performance if you use an adjustable learning rate. Since you are new to this you may not have experience using callbacks. Callbacks are used within and there are many types. Documentation for callbacks can be found here. To implement an adjustable learning rate you can use the callback ReduceLROnPlateau. The documentation for that is here. What it does is to set it up to monitor the validation loss. If the validation loss fails to reduce for a "patience" number of epochs the callback will reduce the learning rate by the parameter "factor" where
new_learning_rate=current_learning rate * factor
where factor is a float between 0 and 1.0. May recommended code for this callback is
shown below
patience=2, verbose=1)
I also recommend you use the callback EarlyStopping. The documentation for that is here. Set it up to monitor validation loss. If the loss fails to reduce for 'patience number of consecutive epochs training will be halted. Set the parameter restore_best_weights=True. That way if the callback halts training it leaves your model set with the weights for the epoch that had the lowest validation loss. My recommended code for the callback is shown below
estop=tf.keras.callbacks.EarlyStopping(monitor="val_loss", patience=4,
verbose=1, restore_best_weights=True)
To use the callback for include the code
save =,y_train,validation_data=(x_test,y_test),epochs=20,
By the way I think I am familar with this dataset or a similar one. Do not expect
great root mean squared error as I have seen many models for this and none had a small error margin. Incidentally if you want to learn machine learning there is an excellent set of about 200 tutorials on this by a guy called Gabriel Atkin. He can see his tutorials called Data Everyday here. The specific tutorial dealing with this kind of age dataset is located here.


How can I train this multiclass RNN?

I am trying to train the following RNN in tensorflow. It takes an 11-D numeric vector as input and it outputs a sequence of 10 multiclass probability vectors, with 14 exclusive classes.
model = keras.models.Sequential([
keras.layers.SimpleRNN(30, return_sequences=False, input_shape=[1, 11]),
keras.layers.SimpleRNN(30, return_sequences=True),
keras.layers.SimpleRNN(14, return_sequences=True, activation="softmax")
history =, y_train, epochs=50, batch_size=32,
However, even for a small dataset of 10 points, it takes hundreds of epochs to fit. As you can see in the figure, the loss barely goes down with the training epochs:
When I try to train the real training set, the loss simply does not move. Any idea of how to successfully train this model?
You can find the first 10 datapoints here
And the first 100 datapoints here
To load the data just use:
with open('train10.pickle', 'rb') as f:
X_train, y_train = pickle.load(f)
Thank you very much for your help
To provide additional context, what I have in this problem is a continuous numeric embedding in 11-D to start with, and the output is a sequence of one-hot encodings, so you can think of this problem as training a decoder or doing a decompression to get a sort of "words" back from points in the numeric space (each one-hot vector in the output could be thought of a "letter"). I previously tried to train a non-recurrent network outputting the full list of one-hot encodings (whole "word") at once, but the performance was also very poor. I just do not see what the bottleneck is: if the dimensionality of the numeric embedding, the training algorithm, etc. My tinkering so far with types of layers, numbers of layers, or learning rates did not produce substantial improvements. I am open to sharing the whole dataset if you think that can help. Thank you very much!
Each machine learning problem is unique and it is very difficult to say exactly what the issue is without having access to the full data set. Some possibilities are:
The model specification is suboptimal - try varying the number of hidden layers, the number of neurons in each layer, using GRU/LSTM layers instead of RNN, adding add some dropout layers, etc.
The training algorithm needs to be adjusted - try using a different optimizer, a different batch size, a different train-test split ratio etc.
The input data needs more (or less) preprocessing - try normalizing/standardizing the input features if you haven't already.
You need to do more work on feature engineering - think deeply about all potential relationships between the input data and the target, and try combining columns to create ratios etc. While the NN can theoretically figure this out for itself, it is often effective to try and reduce the work it has to do in this respect.
Your problem may just be difficult or even unsolvable. There may just be no strong relationship between the input and the target.

Why test loss fluctuates so much using Resnet?

Here is a typical plot of train/test losses behaviour as epoch increases.
I'm not an expert but I have read several topics on similar problems. Well, let me explain what I'm doing.
First, I have used implementation given by for resnet18 & resnet50, and by for resnet32, resnet56. For all these nets I got the same kind of test-loss hieratic behaviour.
Second, my inputs are images 5x64x64, so I have adapted the first Convolutional Layer, and the output of the last Full-connected consist of 180 neurons. I have used either 64, 128, 256 batch sizes for the training, and 128 for the test: the same behaviour persists. I have also both used 300k or 100k images in input training (100k for the test): same behaviour persists too. The images are not of "standard" RGB photos: first, as you probably have already, , remarked there are 5 channels, second the pixel values can be negative (eg. spanning the range (-0.01, 500))
Third, I am aware of the model.train() statement for the training phase, as well as the model.eval() statement (coupled with the with torch.no_grad():) for the testing phase. It is clear that if I do not use model.eval() during the test phase, the test loss is gently decrasing as the traing loss. But, this is not allowed, isn't it?
I have tried several things after reading post concerning Batch Norm behaviour wo any success
I have used SGD, Adam (& SWATS)
I have tryied lr = 0.1 to lr= 1e-5
I have modified the BN momentum (default = 0.1) : 0.5 and 0.01; as well as the eps parameter.
Now, I have managed to get nice results (ie; good training & testing losses) with a classical CNN (ie. wo any Batch normalization, & short-cuts) but I would like to study Resnet behaviour against adversarial attack. So, I would like to get Resnet fit my images :slight_smile:
Any idea ?
After making some tests, I have found something: I have used the standard resnet20 (h=1). Then, I have used as test set the same samples (100,000 images) as for the train set. BUT, for the test set 1) I do not use the shuffling, and 2) I do not make any Horizontal/vertical flip or Rot90deg, Rot180deg or Rot270deg. I observe the same kind of fluctuations for the test loss.
Moreover, when I switch OFF complety the transformations of the train set, and uses the same set for train & test, I got the same behviour:
And finaly, if I switch off the suffling and random transforms (flips & Rotations) of train set, and I use the same set for test, then I get:
Seems that the test loss is converging towards a value, but different from the train loss. Why ???

CNN overfitting on validation set increase test set performance

I'm actually using CNN to classify image. I got 16 classes and around 3000 images(very small dataset). This is an unbalance data set. I do a 60/20/20 split, with same percentage of each class in all set. I use weights regularization. I made test with data augmentation (keras augmenteur, SMOTE, ADSYN) which help to prevent overfitting
When I overfit (epoch=350, loss=2) my model perform better (70+%) accuracy (and other metrics like F1 score) than when I don't overfit (epoch=50, loss=1) accuracy is around 60%. Accuracy is for TEST set when loss is the validation set loss.
Is it really a bad thing to use the overfitted model as best model? Since performance are better on the test set?
I have run same model with another test set (which was previously on the train set) performance are still better (tried 3 different split)
EDIT: About what i have read, validation loss is not always the best metric to affirm model is overfiting. In my situation, it's better to use validation F1 score and recall, when it's start to decrease then model is probably overfiting.
I still don't understand why validation loss is a bad metric for model evaluation, still training loss is used by the model to learn
Yes, it is a bad thing to use over fitted model as best model. By definition, the model which over fits don't really perform well in real world scenarios ie on images that are not in the training or test set.
To avoid over fitting, use image augmentation to balance and increase the number of samples to train. Also try to increase the fraction of dropout to avoid over fitting. I personally use ImageGenerator of Keras to augment the images and save it.
from keras.preprocessing.image import ImageDataGenerator,img_to_array, load_img
import glob
import numpy as np
#There are other parameters too. Check the link given at the end of the answer
datagen = ImageDataGenerator(
brightness_range = (0.4, 0.6),
horizontal_flip = True,
for i, image_path in enumerate(glob.glob(path_to_images)):
img = load_img(image_path)
x = img_to_array(img) # creating a Numpy array
x = x.reshape((1,) + x.shape)
i = 0
num_of_samples_per_image_augmentation = 8
for batch in datagen.flow(x, save_to_dir='augmented-images/preview/fist', save_prefix='fist', save_format='jpg'):
i += 1
if i > num_of_samples_per_image_augmentation : #
Here is the link to image augmentation parameters using Keras,
Feel free to use other libraries of your comfort.
Few other methods to reduce over fitting :
1) Tweak your CNN model by adding more training parameters.
2) Reduce Fully Connected Layers.
3) Use Transfer Learning (Pre-Trained Models)

Getting started with Keras for machine learning

I'm getting started with machine learning tools and I'd like to learn more about what the heck I'm doing. For instance, the script:
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, BatchNormalization
from keras.initializers import RandomUniform
import numpy
model = Sequential()
model.add(Dense(6, input_dim=6))
model.compile(optimizer='sgd', loss='mean_absolute_error', metrics=['accuracy'])
data = numpy.loadtxt('train', delimiter=' ')
X = data[:, 0:6]
Y = data[:, 6], Y, batch_size=1, epochs=1000)
data = numpy.loadtxt('test', delimiter=' ')
X = data[:, 0:6]
Y = data[:, 6]
score = model.evaluate(X, Y, verbose=1)
print ('\n\nThe error is:\n', score, "\n")
Y = model.predict(X, batch_size=1, verbose=1)
print('\nResult:\n', Y, '\n')
It's a Frankenstein I made from some examples I found on the internet and I have many unanswered questions about it:
The file train has 60 rows. Is 1000 epochs too little? Is it too much? Can I get an Underfit/Overfit?
What does the result I get from model.evaluate() mean? I know it's the loss but, if I get a [7.0506157875061035, 0.0], does it mean that my model has a 7% error?
And last, I'm getting a prediction of 0.99875391, 0.99875391, 0.9362126, 0.99875391, 0.99875391, 0.99875391, 0.93571019 when the expected values were anything close to 7.86, 3.57, 8.93, 6.57, 11.7, 8.53, 9.06, which means it's a real bad prediction. Clearly there's a lot of things I'm doing wrong. Could you guys give me a few pointers?
I know it all depends on the type of data I'm using, but is there anything I shouldn't do at all? Or maybe something I should be doing?
There is never a ready answer for how many epochs is a good number. It varies wildly depending on the size of your data, your model, and what you want to achieve. Normally, small models require less epochs, bigger models require more. Yours seem small enough and 1000 epochs seems way too much.
It also depends on the learning rate, a parameter given to the optimizer that defines how long are the steps your model takes to update its weights. Bigger learning rates mean less epochs, but there is a chance that you simply never find a good point because you're adjusting weights beyond what is good. Smaller learning rates mean more epochs and better learning.
Normally, if the loss reaches a limit, you're approaching a point where training is not useful anymore. (Of course, there may be problems with the model too, there is really no simple answer for this one).
To detect overfitting, you need besides the training data (X and Y), another group with test data (say Xtest and Ytest, for instance).
Then you use it in,Y, validation_data=(Xtest,Ytest), ...)
Test data is not given for training, it's kept separate just to see if your model can predict good things from data it has never seen in training.
If the training loss goes down, but the validation loss doesn't, you're overfitting (roughly, your model is capable of memorizing the training data without really understanding it).
An underfit, on the contrary, happens when you never achieve the accuracy you expect (of course we always expect a 100% accuracy, no mistakes, but good models get around the 90's, some applicatoins go better 99%, some worse, again, it's very subjective).
model.evaluate() gives you the losses and the metrics you added in the compile method.
The loss value is something your model will always try to decrease during training. It roughly means how distant your model is from the exact values. There is no rule for what the loss value means, it could even be negative (but usually keras uses positive losses). The point is: it must decrease during training, that means your model is evolving.
The accuracy value means how many right predictions your model outputs compared to the true values (Y). It seems your accuracy is 0%, your model is getting everything wrong. (You can see that from the values you typed).
In your model, you used activation functions. These normalize the results so they don't get too big. This avoids overflowing problems, numeric errors propagating, etc.
It's very very usual to work with values within such bounds.
tanh - outputs values between -1 and 1
sigmoid - outputs values between 0 and 1
Well, if you used a sigmoid activation in the last layer, your model will never output 3 for instance. It tries, but the maximum value is 1.
What you should do is prepare your data (Y), so it's contained between 0 and 1. (This is the best to do in classification problems, often done with images too)
But if you actually want numerical values, then you should just remove the activation and let the output be free to reach higher values. (It all depends on what you want to achieve with your model)
Epoch is a single pass through the full training set. I my mind it seems a lot, but you'd have to check for overfitting and evaluate the predictions. There are many ways of checking and controlling for overfitting in a model. If you understand the methods of doing so from here, coding them in Keras should be no problem.
According to the documentation .evaluate returns:
Scalar test loss (if the model has no metrics) or list of scalars (if the model computes other metrics)
so these are the evaluation metrics of your model, they tell you how good your model is given some notion of good. Those metrics depend on the model and type of data that you've used. Some explanation on those can be found here and here. As mentioned in the documentation,
The attribute model.metrics_names will give you the display labels for the scalar outputs.
So you can know what metric you are looking at. It is easier to do that interactively through the console (ipython, bpython) or Jupyter notebook.
I can't see your data, but a if you are doing a classification problem as suggested by metrics=['accuracy'], the loss=mean_absolute_error doesn't make sense, since it is made for regression problems. To learn more about those I refer you to here and here which discuss classification and regression problems with Keras.
PS: question 3 is not related to software per se, but to the theoretical construct supporting the software. In such cases, I'd recommend asking them at Cross Validated.

Efficient way to know if an image related to a dataset that was used to train convolutional neural network

Currently I'm using VGG16 + Keras + Theano thought the Transfer Learning methodology to recognize plants classes. It works just fine and gives me a good accuracy. But the next problem I'm trying to solve - is to find a way of identifying if an input image contains plant at all. I don't want to have another one classifier that will do it, because it's not really efficiently.
So I did some search and have found that we can get activations from the latest model layer (before activation layer) and analyze it.
from keras import backend as K
model = util.load_model() # VGG16 model
def get_activations(m, layer, X_batch):
x = [m.layers[0].input, K.learning_phase()]
y = [m.get_layer(layer).output]
get_activations = K.function(x, y)
activations = get_activations([X_batch, 0])
# trying to get some features from activations
# to understand how can we identify if an image is relevant
for l in activations[0]:
not_nulls = [x for x in l if x > 0]
# shows percentage of activated neurons
c1 = float(len(not_nulls)) / len(l)
n_activated = len(not_nulls)
print 'c1:{}, n_activated:{}'.format(c1, n_activated)
return activations
get_activations(model, 'the_latest_layer_name', inputs)
From the above code I've noticed that when we have very irrelevant image, the number of activated neurons is bigger than for images that contain plants:
For images that was using for model training, number of activated neurons 19%-23%
For images that contain unknown plants species 20%-26%
For irrelevant images 24%-28%
It's not really a good feature to understand if an image relevant as percentage values are intersect.
So, is there a good way to resolve this issue?
Thanks to Feras's idea in the comment above. After some trials, I've come up with the ultimate solution that allows solving this problem with accuracy up to 99.99%.
Steps are:
Train your model on a dataset;
Store activations (see method above how to get them) by predicting relevant and non-relevant images using trained model from the previous step. You should get activations from the penultimate layer. For VGG16 it's the last of two Dense(4096), for InceptionV3 - an extra penultimate Dense(1024) layer, for resnet50 - an extra penultimate Dense(2048) layer.
Solve a binary problem using stored activations data. I've tried a simple flat NN and Logistic Regression. Both were good in accuracy (flat NN was a bit more accurate), but I've chosen the Logistic Regression as it's simpler, faster and consumes less memory and CPU/GPU.
This process should be repeated each time after your model retrained as each time the final weights for CNN are different and what was working previously, will be different next time.
So as result we have another small model for solving the problem.
