I have several implementation of the same neural network, but each one with different starting parameter.
This is one of my plot comparing the training loss of the base experiment with the training loss of another experiment.
I have also other exaples:
May anyone point me to some instruction on how understand these output from the keras fit()? Note that I don't have any validation set.
Thanks
This is weird, your loss have weirs spikes and even increases in value....
I can imagine a few reasons:
The functions you created are not continuous or have weird behavior, like spikes and other things that might trick the idea decreasing the loss. This includes big contrasts between flat and steep regions.
You're using a weird custom optimizer
Your learning rate is too big
Related
I implement training and evaluating for binary classification with image data through transfer learning from keras API. I'd like to compare performance each models(ResNet, Inception, Xception, VGG, Efficient Net). The datasets are composed by train(approx.2000ea), valid(approx.250ea), test(approx.250ea).
But I faced unfamiliar situation for me so I'm asking couple of questions here.
As shown below, Valid Accuracy or Loss has a very high up and down deviation.
I wonder which one is the problem and what needs to be changed.
epoch_acc_loss
loss_epoch
acc_epoch
If I want to express validation accuracy with number, what should I say in the above case?
Average or maximum or minimum?
It is being performed using Keras (tensorflow), and there are many examples in the API for
train, valid but the code for Test(evaluation?) is hard to find. When figuring performance,
normally implement until valid? or Do I need to show evaluation result?
Now I use Keras API for transfer learning and set this.
include_top=False
conv_base.trainable=False
Summary
I wonder if there is an effect of transfer learning without includint from top, or if it's not,
is there a way to freeze or learn from a specific layer of conv_base.
I'm a beginner and have not many experience so it could be ridiculous questions but please give kind advice.
Thanks a lot in advance.
It's hard to figure out the problem without any given code/model structure. From your loss graph I can see that your model is facing underfitting (or it has a lots of dropout). Common mistakes, that make models underfit are: very high lr and primitive structure (so model can't figure out the dependencies in your data). And you should never forget about the principle "garbage in - garbage out", so double-check tour data for any structure roughness.
Well, validation accuracy in you model training logs is mean accuracy for validation set. Validation technique is based on statistics - you take random N% out of your set for validation, so average is always better if we're talking about multiple experimets (or cross validation).
I'm not sure if I've understood your question correct here, but if you want to evaluate your model with the metric, that you've specified for it after the training process (fit() function call) you should use model.evaluate(val_x, val_y). Or you may use model.predict(val_x) and compare its results to val_y via your metric function.
If you are using default weights for keras pretrained models (imagenet weights) and you want to use your own fully-connected part with it, you may use ONLY pretrained feature extractor (conv blocks). So you specify include_top=False. Of course there will be some positive effect (I'd say it will be significant in comparison with randomly initialized weights) because conv blocks have params that were trained to extract correct features from image. Also would recommend here to use so called "fine-tuning" technique - freeze all layers in pretrained part except a few in its end (may be few layers or even 2-3 conv blocks). Here's the example of fine-tuning of EfficientNetB0:
effnet = EfficientNetB0(weights="imagenet", include_top=False, input_shape=(540, 960, 3))
effnet.trainable = True
for layer in effnet.layers:
if 'block7a' not in layer.name and 'top' not in layer.name:
layer.trainable = False
Here I freeze all pretrained weights except last conv block ones. I've looked into the model with effnet.summary() and selected names of blocks that I want to unfreeze.
I had implemented a CNN with 3 Convolutional layers with Maxpooling and dropout after each layer
I had noticed that when I trained the model for the first time it gave me 88% as testing accuracy but after retraining it for the second time successively, with the same training dataset it gave me 92% as testing accuracy.
I could not understand this behavior, is it possible that the model had overfitting in the second training process?
Thank you in advance for any help!
It is quite possible if you have not provided the seed number set.seed( ) in the R language or tf.random.set_seed(any_no.) in python
Well I am no expert when it comes to machine learning but I do know the math behind it. What you are doing when you train a neural network you basicly find the local minima to the loss function. What this means is that the end result will heavily depend on the initial guess of all of the internal varaibles.
Usually the variables are randomized as a initial estimation and you could therefore reach quite different results from running the training process multiple times.
That being said, from when I studied the subject I was told that you usually reach similar regardless of the initial guess of the parameters. However it is hard to say if 0.88 and 0.92 would be considered similar or not.
Hope this gives a somewhat possible answer to your question.
As mentioned in another answer, you could remove the randomization, both in the parameter initialization of the parameters and the randomization of the data used for each epoch of training by introducing a seed. This would insure that when you run it twice, everything will get "randomized" in the exact same order. In tensorflow this is done using for example tf.random.set_seed(1), the number 1 can be changed to any number to get a new seed.
I am using LSTM for time-series prediction using Keras. I am using 3 LSTM layers with dropout=0.3, hence my training loss is higher than validation loss. To monitor convergence, I using plotting training loss and validation loss together. Results looks like the following.
After researching about the topic, I have seen multiple answers for example ([1][2] but I have found several contradictory arguments on various different places on the internet, which makes me a little confused. I am listing some of them below :
1) Article presented by Jason Brownlee suggests that validation and train data should meet for the convergence and if they don't, I might be under-fitting the data.
https://machinelearningmastery.com/diagnose-overfitting-underfitting-lstm-models/
https://machinelearningmastery.com/learning-curves-for-diagnosing-machine-learning-model-performance/
2) However, following answer on here suggest that my model is just converged :
How do we analyse a loss vs epochs graph?
Hence, I am just bit confused about the whole concept in general. Any help will be appreciated.
Convergence implies you have something to converge to. For a learning system to converge, you would need to know the right model beforehand. Then you would train your model until it was the same as the right model. At that point you could say the model converged! ... but the whole point of machine learning is that we don't know the right model to begin with.
So when do you stop training? In practice, you stop when the model works well enough to do what you want it to do. This might be when validation error drops below a certain threshold. It might just be when you can't afford any more computing power. It's really up to you.
I am using TensorFlow for training model which has 1 output for the 4 inputs. The problem is of regression.
I found that when I use RandomForest to train the model, it quickly converges and also runs well on the test data. But when I use a simple Neural network for the same problem, the loss(Random square error) does not converge. It gets stuck on a particular value.
I tried increasing/decreasing number of hidden layers, increasing/decreasing learning rate. I also tried multiple optimizers and tried to train the model on both normalized and non-normalized data.
I am new to this field but the literature that I have read so far vehemently asserts that the neural network should marginally and categorically work better than the random forest.
What could be the reason behind non-convergence of the model in this case?
If your model is not converging it means that the optimizer is stuck in a local minima in your loss function.
I don't know what optimizer you are using but try increasing the momentum or even the learning rate slightly.
Another strategy employed often is the learning rate decay, which reduces your learning rate by a factor every several epochs. This can also help you not get stuck in a local minima early in the training phase, while achieving maximum accuracy towards the end of training.
Otherwise you could try selecting an adaptive optimizer (adam, adagrad, adadelta, etc) that take care of the hyperparameter selection for you.
This is a very good post comparing different optimization techniques.
Deep Neural Networks need a significant number of data to perform adequately. Be sure you have lots of training data or your model will overfit.
A useful rule for beginning training models, is not to begin with the more complex methods, for example, a Linear model, which you will be able to understand and debug more easily.
In case you continue with the current methods, some ideas:
Check the initial weight values (init them with a normal distribution)
As a previous poster said, diminish the learning rate
Do some additional checking on the data, check for NAN and outliers, the current models could be more sensitive to noise. Remember, garbage in, garbage out.
I have been playing with Lasagne for a while now for a binary classification problem using a Convolutional Neural Network. However, although I get okay(ish) results for training and validation loss, my validation and test accuracy is always constant (the network always predicts the same class).
I have come across this, someone who has had the same problem as me with Lasagne. Their solution was to setregression=True as they are using Nolearn on top of Lasagne.
Does anyone know how to set this same variable within Lasagne (as I do not want to use Nolearn)? Further to this, does anyone have an explanation as to why this needs to happen?
Looking at the code of the NeuralNet class from nolearn, it looks like the parameter regression is used in various places, but most of the times it affects how the output value and loss are computed.
In case of regression=False (the default), the network outputs the class with the maximum probability, and computes the loss with the categorical crossentropy.
On the other hand, in case of regression=True, the network outputs the probabilities of each class, and computes the loss with the squared error on the output vector.
I am not an expert in deep learning and CNN, but the reason this may have worked is that in case of regression=True and if there is a small error gradient, applying small changes to the network parameters may not change the predicted class and the associated loss, and may lead the algorithm to "think" that it has converged. But if instead you look at the class probabilities, small parameter changes will affect the probabilities and the resulting mean squared error, and the network will continue down this path which may eventually change the predictions.
This is just a guess, it is hard to tell without seeing the code and the dataset.