A train loss and a validation loss are same - python

I used fully connected layers via keras api to predict regression values.
To generalize the model, I plotted train and validation loss. I wanted the plot to show me a point that the validation loss is higher than train loss.
However, both loss values were almost similar and no change.
I was wondering.
Is that model trained well?
If the model trained well, how can i interpret the loss plot?
the model performance isn't good. so what should i do for improving the performance about the model?

As we all learned, the validation error are often higher than training loss. So you might feel strange when both of them are the same. In my opinion, your model probably hasn't been well trained. You will still have to try more parameters and have them (and corresponding error on test data set) recorded in a file to help you find the best parameter.
One simple way of finding better parameters is to use a for-loop over one certain parameter with the others fixed, and then record them like below
List = [parameter_1, parameter_2, parameter_3]
Name = 'parameter_1, parameter_2, parameter_3 \n'
f = open('training-log.csv','a')
f.write(Name)
for i in List:
f.write('{},'.format(str(i)))
f.write('\n')
f.close()
Hope this can help you.

Related

How to fine-tune a CGAN?

I am currently building a Conditional GAN to apply data augmentation on a small audio dataset.
My problem is that I don't really know how to calibrate my models and the parameters, I feel like there is a need to fine-tune the hyperparameters in a certain way but I don't know in which direction to go.
First of all, here is a plot of my losses through the epochs, please don't bother with the names of the axis, they are wring becase I reused a function without modifying the name of the axis:
plot of the losses per epochs
As we can see, the two losses cross each other and I believe they should stay balanced and approximately equal for the rest of the training, but in my case, they diverge and never meet again. I was wondering if this is normal behavior, maybe I should stop the training when they cross?
Please tell me if you have any leads, clues, or criticism that would allow me to improve my models.
For further information, here are some of the hyper-parameters I am using:
# I used custom loss functions for both models, each function uses this cross_entropy,
# but I am quite confident that is part is correct.
cross_entropy = tf.keras.losses.BinaryCrossentropy(from_logits=False)
# different learning rates because I felt that the discriminator model was too chaotic
generator_optimizer = Adam(8e-5)
discriminator_optimizer = Adam(2e-5)
BATCH_SIZE = 20
epochs = 1000
I am conscious that 1000 epochs are way too much for this but I wanted to observe the behavior on a large scale.
I built my generator like that:
generator model
And my discriminator model is like that:
discriminator model
The architecture is done using the functional API of Tensorflow
Thanks for reading and please tell me if you see anything funny or if you have any leads.

Questions that in case of fluctuating the validation accuracy and loss curve for image binary classification, ask the way of analysis and solution

I implement training and evaluating for binary classification with image data through transfer learning from keras API. I'd like to compare performance each models(ResNet, Inception, Xception, VGG, Efficient Net). The datasets are composed by train(approx.2000ea), valid(approx.250ea), test(approx.250ea).
But I faced unfamiliar situation for me so I'm asking couple of questions here.
As shown below, Valid Accuracy or Loss has a very high up and down deviation.
I wonder which one is the problem and what needs to be changed.
epoch_acc_loss
loss_epoch
acc_epoch
If I want to express validation accuracy with number, what should I say in the above case?
Average or maximum or minimum?
It is being performed using Keras (tensorflow), and there are many examples in the API for
train, valid but the code for Test(evaluation?) is hard to find. When figuring performance,
normally implement until valid? or Do I need to show evaluation result?
Now I use Keras API for transfer learning and set this.
include_top=False
conv_base.trainable=False
Summary
I wonder if there is an effect of transfer learning without includint from top, or if it's not,
is there a way to freeze or learn from a specific layer of conv_base.
I'm a beginner and have not many experience so it could be ridiculous questions but please give kind advice.
Thanks a lot in advance.
It's hard to figure out the problem without any given code/model structure. From your loss graph I can see that your model is facing underfitting (or it has a lots of dropout). Common mistakes, that make models underfit are: very high lr and primitive structure (so model can't figure out the dependencies in your data). And you should never forget about the principle "garbage in - garbage out", so double-check tour data for any structure roughness.
Well, validation accuracy in you model training logs is mean accuracy for validation set. Validation technique is based on statistics - you take random N% out of your set for validation, so average is always better if we're talking about multiple experimets (or cross validation).
I'm not sure if I've understood your question correct here, but if you want to evaluate your model with the metric, that you've specified for it after the training process (fit() function call) you should use model.evaluate(val_x, val_y). Or you may use model.predict(val_x) and compare its results to val_y via your metric function.
If you are using default weights for keras pretrained models (imagenet weights) and you want to use your own fully-connected part with it, you may use ONLY pretrained feature extractor (conv blocks). So you specify include_top=False. Of course there will be some positive effect (I'd say it will be significant in comparison with randomly initialized weights) because conv blocks have params that were trained to extract correct features from image. Also would recommend here to use so called "fine-tuning" technique - freeze all layers in pretrained part except a few in its end (may be few layers or even 2-3 conv blocks). Here's the example of fine-tuning of EfficientNetB0:
effnet = EfficientNetB0(weights="imagenet", include_top=False, input_shape=(540, 960, 3))
effnet.trainable = True
for layer in effnet.layers:
if 'block7a' not in layer.name and 'top' not in layer.name:
layer.trainable = False
Here I freeze all pretrained weights except last conv block ones. I've looked into the model with effnet.summary() and selected names of blocks that I want to unfreeze.

is it incorrect to change a model's parameters after training it?

i was trying to use average ensembling on a group of models i trained earlier (i'm creating a new model in the ensemble for each pre-trained model i'm using and then loading the trained weights onto it, it's inefficient this way i know but i'm just learning about it so it doesn't really matter). and I mistakenly changed some of the network's parameters when loading the models in the ensemble code like using Relu instead of leakyRelu which i used in training the models and a different value for an l2 regularizer in the dense layer in one of the models. this however gave me a better testing accuracy for the ensemble. can you please explain to me if/how this is incorrect, and if it's normal can i use this method to further enhance the accuracy of the ensemble.
I believe it is NOT correct to chnage model's parameters after training it. parameters here I mean the trainable-parameters like the weights in Dense node but not hyper-parameters like learning rate.
What is training?
Training essentially is a loop that keeps changing, or update, the parameters. It updates the parameter in such a way that it believes it can reduce the loss. It is also like moving your point in a hyper-spaces that the loss function gives a small loss on that point.
Smaller loss means higher accruacy in general.
Changing Weights
So now, changing your parameters values, by mistake or by purpose, is like moving that point to somewhere BUT you have no logical reason behind that such move will give you a smaller loss. You are just randomly wandering around that hyper-space and in your case you are just lucky that you land to some point that so happened give you a smaller loss or a better testing accuracy. It is just purely luck.
Changing activation function
Also, altering the activation function from leakyRelu to relu is similar you randomly alter the shape of your hype-space. Even though you are at the some point the landscape changes, you are still have no logical reason to believe by such change of landscape you can have a smaller loss staying at the same point
When you change the model manually, you need to retrain.
Though you changed the network's parameters when loading the models. It is not incorrect to alter the hyper-parameters of your ensemble's underlying models. In some cases, the models that are used in an ensemble method require unique tunings which can, as you mentioned, give "you a better testing accuracy for the ensemble model."
To answer your second question, you can use this method to further enhance the accuracy of the ensemble, you can also use Bayesian optimization, GridSearch, and RandomSearch if you prefer more automated means of tuning your hyperparameters.

How to confirm convergence of LSTM network?

I am using LSTM for time-series prediction using Keras. I am using 3 LSTM layers with dropout=0.3, hence my training loss is higher than validation loss. To monitor convergence, I using plotting training loss and validation loss together. Results looks like the following.
After researching about the topic, I have seen multiple answers for example ([1][2] but I have found several contradictory arguments on various different places on the internet, which makes me a little confused. I am listing some of them below :
1) Article presented by Jason Brownlee suggests that validation and train data should meet for the convergence and if they don't, I might be under-fitting the data.
https://machinelearningmastery.com/diagnose-overfitting-underfitting-lstm-models/
https://machinelearningmastery.com/learning-curves-for-diagnosing-machine-learning-model-performance/
2) However, following answer on here suggest that my model is just converged :
How do we analyse a loss vs epochs graph?
Hence, I am just bit confused about the whole concept in general. Any help will be appreciated.
Convergence implies you have something to converge to. For a learning system to converge, you would need to know the right model beforehand. Then you would train your model until it was the same as the right model. At that point you could say the model converged! ... but the whole point of machine learning is that we don't know the right model to begin with.
So when do you stop training? In practice, you stop when the model works well enough to do what you want it to do. This might be when validation error drops below a certain threshold. It might just be when you can't afford any more computing power. It's really up to you.

Why validation accuracy remains at 75% while train accuracy is 100 %?

I used my own data set to train a model using retrain.py file from Tensorflow site. However, with my first set of images, I am seeing test accuracy of 100% while validation accuracy is at 70%. I see that validation entropy is increasing which tells overfitting. I am new to this field and got to this stage by following online tutorials.
I did not enable random brightness, crop and flip yet for training. I am trying to understand why is this behaviour? I tried flower example and it worked as expected. Cross-entropy got lowest instead of increasing with my data set.
Could some one explain whats going on inside the CNN here ?
Your model has over-fitted on the training data. If its a large model, you should consider using transfer learning where you train the model on a large dataset like ImageNet and then fine-tune on your data. You can also try adding some form of regularization to prevent overfitting specially Dropout and L2 regularization.
This simply means your model is overfitting. Overfitting means your model is not generalizing well to unseen data (ie the validation data). What you can do is add some form of regularization (L2 is used normally). What this does is it penalizes weights from getting very high values which would thereby lead to overfitting. This will also act against the model trying to fit outliers which again leads to less generalization and more overfitting.

Categories