I am training a gan and I am the accuracy doesn't change over epoch meanwhile the loss is deacresing.
Is there something wrong or it is normal because it's a gan?
Thank you in advance.
In order to fully answer this question (specific to your case) we'd need to know what loss function you are using and how you measure accuracy.
In general, this can certainly happen for a variety of reasons. The easiest reason to illustrate is with a simple classifier. Suppose you have a 2-class classification problem (for simplicity) and an input $x$ and label (1, 0), i.e. the label says it belongs to class 1 and not class 2. When you feed $x$ through your network you get an output: $y=(p_1, p_2)$. If $p_1 > p_2$ then the prediction is correct (i.e. you chose the right class). The loss function can continue to go down until $p_1=1$ and $p_2=0$ (the target). So, you can have lots of correct predictions (high accuracy) but still have room to improve the output to better match the labels (room for improved loss).
Related
In many ML applications a weighted loss may be desirable since some types of incorrect predictions might be worse outcomes than other errors. E.g. in medical binary classification (healthy/ill) a false negative, where the patient doesn't get further examinations is a worse outcome than a false positive, where a follow-up examination will reveal the error.
So if I define a weighted loss function like this:
def weighted_loss(prediction, target):
if prediction == target:
return 0 # correct, no loss
elif prediction == 0: # class 0 is healthy
return 100 # false negative, very bad
else:
return 1 # false positive, incorrect
How can I pass something equivalent to this to scikit-learn classifiers like Random Forests or SVM classifiers?
I am afraid your question is ill-posed, stemming from a fundamental confusion between the different notions of loss and metric.
Loss functions do not work with prediction == target-type conditions - this is what metrics (like accuracy, precision, recall etc) do - which, however, play no role during loss optimization (i.e. training), and serve only for performance assessment. Loss does not work with hard class predictions; it only works with the probabilistic outputs of the classifier, where such equality conditions never apply.
An additional layer of "insulation" between loss and metrics is the choice of a threshold, which is necessary for converting the probabilistic outputs of a classifier (only thing that matters during training) to "hard" class predictions (only thing that matters for the business problem under consideration). And again, this threshold plays absolutely no role during model training (where the only relevant quantity is the loss, which knows nothing about thresholds and hard class predictions); as nicely put in the Cross Validated thread Reduce Classification Probability Threshold:
the statistical component of your exercise ends when you output a probability for each class of your new sample. Choosing a threshold beyond which you classify a new observation as 1 vs. 0 is not part of the statistics any more. It is part of the decision component.
Although you can certainly try to optimize this (decision) threshold with extra procedures outside of the narrowly-defined model training (i.e. loss minimization), as you briefly describe in the comments, your expectation that
I am pretty sure that I'd get better results if the decision boundaries drawn by the RBFs took that into account, when fitting to the data
with something similar to your weight_loss function is futile.
So, no function similar to your weight_loss shown here (essentially a metric, and not a loss function, despite its name), that employs equality conditions like prediction == target, can be used for model training.
The discusion in the following SO threads might also be useful in clarifying the issue:
Loss & accuracy - Are these reasonable learning curves?
What is the difference between loss function and metric in Keras? (despite the title, the definitions are generally applicable and not only for Keras)
Cost function training target versus accuracy desired goal
How to interpret loss and accuracy for a machine learning model
I'm trying to build a convolutional based model. I trained two different structures as following. As you can see for single layer there isn't any obvious change along number of epochs. Bi-layer Conv2D presents improving in accuracy and losses for train dataset, but validation characteristics are going to be a tragedy.
According to the fact that I can't increase my data-set what should I do to improve validation characteristics?
I've examined regularizer L1 & L2 but they didn't affect my model.
1) You can use adaptive learning rate (exponential decay or step dependent may work for you) Furthermore, you can try extreme high learning rates when your model goes into local minimum.
2) If you are training with images, you can flip, rotate or other stuff to increase your dataset size and maybe some other augmentation techniques might work for your case.
3) Try to change the model like deeper, shallower, wider, narrower.
4) If you are doing a classification model, please ensure that you are not using sigmoid as your activation function in the end unless you are doing binary classification.
5) Always check your dataset's situation before training session.
Your train-test split may not be suitable for your case.
There might be extreme noises in your data.
Some amount of your data might be corrupted.
Note: I will update them whenever a new idea comes to my mind. Furthermore, I didn't want to repeat the comments and other answers, both of them are having valuable information for your case.
The validation becomes a tragedy because model is overfitting on the training data you can try if any of this works,
1)Batch normalisation would be a good option to go with.
2)Try reducing batch size.
I tried a variety of models known to work well on small datasets, but as I suspected, and as is my ultimate verdict - it is a lost cause.
You don't have nearly enough data to train a good DL model, or even an ML model like SVM - as matter's exacerbated by having eight separate classes; your dataset would stand some chance with an SVM for binary classification, but none for 8-class. As a last resort, you can try XGBoost, but I wouldn't bet on it.
What can you do? Get more data. There's no way around it. I don't have an exact number, but for 8-class classification, I'd say you need anywhere from 50-200x your current data to get reasonable results. Mind also that your validation performance is bound to be much worse on a bigger validation set, accounted for in this number.
For readers, OP shared his dataset with me; shapes are: X = (1152, 1024, 1), y = (1152, 8)
So I'am using tensorflow 1.10, and training on subset of retinopathy dataset. The problem is that it always predicts the class that occurs the most. In my case it is the class 0. so, I did a little digging and came across something called under-sampling. I left out all the 0(just to see what happens) it only predicts the class 2 after 0. apparently class 2 has highest frequency.
Here is the code for optimization:
def data_pipe_line(data,checkpoint_path,i_data=None,epoch=5):
place_X=tf.placeholder(tf.float32,[None,500,400,3],name='p1')
place_Y=tf.placeholder(tf.int32,[None],name='p2')
infer_data=tf.data.Dataset.from_tensor_slices((place_X,place_Y))
infer_data=infer_data.batch(100)
iterator=tf.data.Iterator.from_structure(data.output_types,data.output_shapes)
next_image,next_label=iterator.get_next()
Y=tf.one_hot(next_label,5)
Y=tf.cast(Y,tf.float32)
logits=model(next_image,0.7)
print(logits)
print(Y)
train_iterator=iterator.make_initializer(data,name='train_op')
inference_iterator_op=iterator.make_initializer(infer_data,name='inference_op')
with tf.name_scope("loss"):
loss=tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits_v2(labels=Y,logits=logits),name='cost')
#the learning rate is so low because the batch-size is very small and has a lot of noise
optimizer=tf.train.AdamOptimizer(learning_rate=0.0005).minimize(loss)
#getting the accuracy
prediction=tf.argmax(logits,1,name='pred')
equality=tf.equal(prediction,tf.argmax(Y,1))
accuracy=tf.reduce_mean(tf.cast(equality,tf.float32))
init_op=tf.global_variables_initializer()
tf.summary.scalar("loss",loss)
tf.summary.scalar("accuracy",accuracy)
merged=tf.summary.merge_all()
saver=tf.train.Saver()
j=0
with tf.Session() as sess:
writer=tf.summary.FileWriter("./nn_logs",sess.graph)
sess.run(init_op)
for _ in range(epoch):
sess.run(train_iterator)
while True:
try:
#print(sess.run(logits))
j=j+1
summary = sess.run(merged)
_,acc,l=sess.run([optimizer,accuracy,loss])
if(j%20==0 or j==1):
print("iters: {}, loss: {:.10f}, training accuracy: {:.2f}".format(j, l, acc*100))
writer.add_summary(summary,j)
except tf.errors.OutOfRangeError:
break
saver.save(sess,checkpoint_path)
The model is training well, loss goes down for sometime and then it just fluctuates there(in a range of 5). the accuracy of course fluctuates a lot as it is only predicting 1 class.
By what you said, I can conclude that your dataset is highly unbalanced and then the model that you trained didn't generalized your data at all. The model then predicts the class with most samples, because at a random guess there's more chance that it would be the correct label. To try to solve this, one approach is to balance your data.
Under Sampling
One way to do that is using the technique called under sampling, as you have finded out. In your training set you verify the class that has less samples, and then you pass to the training phase that same number of samples of each class. The selection is done at random. The test set remains intact.
Over Sampling
Here you take the contrary way: take the class with more samples and make all other classes have that same amount of samples. The simplest way to do that is repeating at random your samples. Another way would be augmenting your data.
You can also can weight the loss function. For this, G. Anderson already linked a good question about.
The intuiton behind weighted the loss
When the classes are unbalanced, the model understands that if it get the wrong label for the class with less samples it will be poorly penalized for that mistake. Once they have less samples of that class, that mistake will occur less often and when it occur it will have the same impact of a mistake of one class with more samples. So what the weighted loss do is give more importance for a mistake in the class with less samples and less importance for a mistake in the class with more samples.
Answering the question in the commentary
Remember: the loss function is used to correct the learned model weights and, when you add another dense layer in your model, you are doing changes that will affect the learning step. So your model will still attributing the same importance level to the mistakes in every class.
My model throws up learning curves as I have shown below. Are these fine? I am a beginner and all across the internet I see that as training examples increase the Training score should decrease and then converge. But here the training score is increasing and then converging. Therefore I would like to know does this indicate a bug in my code / something wrong with my input?
Okay I figured out what was wrong with my code.
train_sizes , train_accuracy , cv_accuracy = lc(linear_model.LogisticRegression(solver='lbfgs',penalty='l2',multi_class='ovr'),trainData,multiclass_response_train,train_sizes=np.array([0.1,0.33,0.5,0.66,1.0]),cv=5)
I had not entered a regularization parameter for Logistic Regression.
But now,
train_sizes , train_accuracy , cv_accuracy = lc(linear_model.LogisticRegression(C=1000,solver='lbfgs',penalty='l2',multi_class='ovr'),trainData,multiclass_response_train,train_sizes=np.array([0.1,0.33,0.5,0.66,1.0]),cv=5)
The learning curve looks alright.
Can anybody tell me why this is so? i.e. with default reg term the training score increases and with lower reg it decreases?
Data details: 10 classes. Images of varying sizes. (Digit Classification - street view digits)
You need to be more precise regarding your metrics. What metrics are used here?
Loss in general means: lower is better, while Score usually means: higher is better.
This also means, that the interpretation of your plot is dependent on the used metrics during training and cross-validation.
Have a look at the related webpage of scipy:
http://scikit-learn.org/stable/modules/learning_curve.html
The score is typically some measure that needs to be maximized (ROCAUC, accuracy,...). Intuitively you could expect that the more training examples you see the better your model gets and hence the higher the score is. There are however some subtleties regarding overfitting and underfitting that you should keep in mind.
Building off of Alex's answer, it looks like the default regularization parameter for your model underfits the data a bit, because when you relaxed regularization, you see 'more appropriate' learning curves. It doesn't matter how many examples you throw at a model that underfits.
As for your concern of why the training score increases in the first case rather than decreases -- it's probably a consequence of the multiclass data you're using. With fewer training examples, you have fewer numbers of images of each class (because lc tries to keep the same class distribution in each fold of the cv), so with regularization (if you call C=1 regularization, that is), it may be harder for your model to accurately guess some of the classes.
I have been playing with Lasagne for a while now for a binary classification problem using a Convolutional Neural Network. However, although I get okay(ish) results for training and validation loss, my validation and test accuracy is always constant (the network always predicts the same class).
I have come across this, someone who has had the same problem as me with Lasagne. Their solution was to setregression=True as they are using Nolearn on top of Lasagne.
Does anyone know how to set this same variable within Lasagne (as I do not want to use Nolearn)? Further to this, does anyone have an explanation as to why this needs to happen?
Looking at the code of the NeuralNet class from nolearn, it looks like the parameter regression is used in various places, but most of the times it affects how the output value and loss are computed.
In case of regression=False (the default), the network outputs the class with the maximum probability, and computes the loss with the categorical crossentropy.
On the other hand, in case of regression=True, the network outputs the probabilities of each class, and computes the loss with the squared error on the output vector.
I am not an expert in deep learning and CNN, but the reason this may have worked is that in case of regression=True and if there is a small error gradient, applying small changes to the network parameters may not change the predicted class and the associated loss, and may lead the algorithm to "think" that it has converged. But if instead you look at the class probabilities, small parameter changes will affect the probabilities and the resulting mean squared error, and the network will continue down this path which may eventually change the predictions.
This is just a guess, it is hard to tell without seeing the code and the dataset.